All Stories

  1. M 2 XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization
  2. Cambricon-R: A Fully Fused Accelerator for Real-Time Learning of Neural Scene Representation
  3. RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration
  4. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
  5. OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization