All Stories

  1. MLAM: A Machine Learning-Aided Architectural BottleneckAnalysis Model for x86 Architectures
  2. AnyKey: A Key-Value SSD for All Workload Types
  3. Load and MLP-Aware Thread Orchestration for Recommendation Systems Inference on CPUs
  4. Pirate: No Compromise Low-Bandwidth VR Streaming for Edge Devices
  5. Towards SLO-Compliant and Cost-Effective Serverless Computing on Emerging GPU Architectures
  6. SpotVerse: Optimizing Bioinformatics Workflows with Multi-Region Spot Instances in Galaxy and Beyond
  7. SmartGraph: A Framework for Graph Processing in Computational Storage
  8. FAAStloop: Optimizing Loop-Based Applications for Serverless Computing
  9. Foveated HDR: Efficient HDR Content Generation on Edge Devices Leveraging User's Visual Attention
  10. GameStreamSR: Enabling Neural-Augmented Game Streaming on Commodity Mobile Platforms
  11. Thorough Characterization and Analysis of Large Transformer Model Training At-Scale
  12. Thorough Characterization and Analysis of Large Transformer Model Training At-Scale
  13. Minimizing Coherence Errors via Dynamic Decoupling
  14. Thorough Characterization and Analysis of Large Transformer Model Training At-Scale
  15. Impact of Write-Allocate Elimination on Fujitsu A64FX
  16. MBFGraph: An SSD-based External Graph System for Evolving Graphs
  17. Hardware Support for Constant-Time Programming
  18. Architecture-Aware Currying
  19. Quantifying and Mitigating Cache Side Channel Leakage with Differential Set
  20. Optimizing CPU Performance for Recommendation Systems At-Scale
  21. EdgePC: Efficient Deep Learning Analytics for Point Clouds on Edge Devices
  22. Cypress
  23. Multi-resource fair allocation for consolidated flash-based caching systems
  24. Fine-Granular Computation and Data Layout Reorganization for Improving Locality
  25. An architecture interface and offload model for low-overhead, near-data, distributed accelerators
  26. Skipper: Enabling efficient SNN training through activation-checkpointing and time-skipping
  27. Pushing Point Cloud Compression to the Edge
  28. End-to-end Characterization of Game Streaming Applications on Mobile Platforms
  29. Memory Space Recycling
  30. Data Convection
  31. Data Convection
  32. End-to-end Characterization of Game Streaming Applications on Mobile Platforms
  33. Memory Space Recycling
  34. A Scheduling Framework for Decomposable Kernels on Energy Harvesting IoT Edge Nodes
  35. End-to-end Characterization of Game Streaming Applications on Mobile Platforms
  36. Memory Space Recycling
  37. Data Convection
  38. Kraken
  39. SpecSafe: detecting cache side channels in a speculative world
  40. Mix and Match: Reorganizing Tasks for Enhancing Data Locality
  41. Distance-in-time versus distance-in-space
  42. Fluid: a framework for approximate concurrency via controlled dependency relaxation
  43. Mix and Match: Reorganizing Tasks for Enhancing Data Locality
  44. Mix and Match: Reorganizing Tasks for Enhancing Data Locality
  45. Ghost Thread
  46. SplitServe
  47. Fifer
  48. Implications of Public Cloud Resource Heterogeneity for Inference Serving
  49. Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks
  50. Déjà View: Spatio-Temporal Compute Reuse for‘ Energy-Efficient 360° VR Video Streaming
  51. Computing with Near Data
  52. Quantifying Data Locality in Dynamic Parallelism in GPUs
  53. Architecture-Aware Approximate Computing
  54. Distilling the Essence of Raw Video to Reduce Memory Usage and Energy at Edge Devices
  55. Architecture-Aware Approximate Computing
  56. Co-optimizing memory-level parallelism and cache-level parallelism
  57. Quantifying Data Locality in Dynamic Parallelism in GPUs
  58. NEOFog
  59. ReveNAND
  60. Enhancing computation-to-core assignment with physical location information
  61. Enhancing computation-to-core assignment with physical location information
  62. Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory
  63. Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory
  64. A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems
  65. Hardware-Software Co-design to Mitigate DRAM Refresh Overheads
  66. Exploiting Intra-Request Slack to Improve SSD Performance
  67. Exploiting Intra-Request Slack to Improve SSD Performance
  68. Hardware-Software Co-design to Mitigate DRAM Refresh Overheads
  69. VIP
  70. A case for core-assisted bottleneck acceleration in GPUs
  71. Anatomy of GPU Memory System for Multi-Application Execution
  72. Optimizing off-chip accesses in multicores
  73. EECache
  74. Memory Row Reuse Distance and its Role in Optimizing Application Performance
  75. Memory Row Reuse Distance and its Role in Optimizing Application Performance
  76. VIP
  77. A case for core-assisted bottleneck acceleration in GPUs
  78. Network footprint reduction through data access and computation placement in NoC-based manycores
  79. Optimizing off-chip accesses in multicores
  80. TaPEr
  81. Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy
  82. Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications
  83. Trading cache hit rate for memory performance
  84. Orchestrated scheduling and prefetching for GPGPUs
  85. Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores
  86. Physically addressed queueing (PAQ)
  87. A compiler framework for extracting superword level parallelism
  88. A compiler framework for extracting superword level parallelism