All Stories

  1. From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
  2. TrEnv-X: Transparently Share Serverless Execution Environments Across Different Functions and Nodes
  3. Accelerating Stream Processing Engines via Hardware Offloading
  4. Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
  5. KTransformers: Unleashing the Full Potential of CPU/GPU Hybrid Inference for MoE Models
  6. Scaling Up Memory Disaggregated Applications with SMART
  7. Partial Failure Resilient Memory Management System for (CXL-based) Distributed Shared Memory
  8. Falcon: Fast OLTP Engine for Persistent Cache and Non-Volatile Memory
  9. Efficiently Answering Path Queries on Evolving Graphs