All Stories

  1. Load and MLP-Aware Thread Orchestration for Recommendation Systems Inference on CPUs
  2. Towards SLO-Compliant and Cost-Effective Serverless Computing on Emerging GPU Architectures
  3. Optimizing CPU Performance for Recommendation Systems At-Scale