TinyServe: Query-Aware Cache Selection for Efficient LLM Serving

Dong Liu; Yanxuan Yu

doi:10.1145/3746027.3758181

What is it about?

This paper introduces TinyServe, a cache-aware system that learns which parts of a language-model query should be stored and reused. By selecting only the most valuable cached responses, it speeds up large-language-model inference while reducing memory and energy use.

Photo by Markus Spiske on Unsplash

Why is it important?

Serving large language models is expensive and slow. Our method cuts cost and latency without losing accuracy, helping researchers and companies deploy AI more sustainably.

Perspectives

AI system engineers, data-center architects, and researchers developing efficient foundation-model infrastructure.
Yanxuan Yu
Columbia University

This page is a summary of: TinyServe: Query-Aware Cache Selection for Efficient LLM Serving, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3746027.3758181.
You can read the full text:

Read

Resources

URL
Official GitHub Repository for TinyServe
This repository contains the official implementation of TinyServe: Query-Aware Cache Selection for Efficient LLM Serving (ACM MM 2025). It provides source code, experiments, and documentation for reproducing the results presented in the paper.

Contributors

The following have contributed to this page

How to Make Large Language Models Faster and More Efficient with Smart Caching

What is it about?

Why is it important?

Perspectives

Resources

Official GitHub Repository for TinyServe

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

How to Make Large Language Models Faster and More Efficient with Smart Caching

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Official GitHub Repository for TinyServe

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management