All Stories

  1. CMANNS: GPU-Accelerated Graph Index Construction for ANNS via Compute–Memory Disaggregation
  2. From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
  3. TrEnv-X: Transparently Share Serverless Execution Environments Across Different Functions and Nodes
  4. Accelerating Stream Processing Engines via Hardware Offloading
  5. Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
  6. KTransformers: Unleashing the Full Potential of CPU/GPU Hybrid Inference for MoE Models
  7. Scaling Up Memory Disaggregated Applications with SMART
  8. Partial Failure Resilient Memory Management System for (CXL-based) Distributed Shared Memory
  9. Falcon: Fast OLTP Engine for Persistent Cache and Non-Volatile Memory
  10. Efficiently Answering Path Queries on Evolving Graphs