All Stories

  1. EFRC is Adapted to Access-rate Imbalance: A Class of Efficient and Fair Replacement Algorithms for Cache Sharing
  2. A Survey on Failure Analysis and Fault Injection in AI Systems
  3. L4: Diagnosing Large-scale LLM Training Failures via Automated Log Analysis
  4. COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge
  5. Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysis
  6. FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless Workflows
  7. CTuner: Automatic NoSQL Database Tuning with Causal Reinforcement Learning
  8. ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems
  9. TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State
  10. Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-modal Observability Data
  11. DiagConfig: Configuration Diagnosis of Performance Violations in Configurable Software Systems
  12. MARS: Fault Localization in Programmable Networking Systems with Low-cost In-Band Network Telemetry
  13. DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems
  14. LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly
  15. Fighting against Incidents in Large-Scale Online Systems
  16. MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments