SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference

Hiari Pizzini Cavagna; Andrea Proia; Giacomo Madella; Giovanni Battista Esposito; Francesco Antici; Daniele Cesarini; Zeynep Kiziltan; Andrea Bartolini

doi:10.1145/3777884.3797011

What is it about?

LLM inference pricing treats every token like it costs the same. Input tokens, output tokens, a flat rate or a linear combo: simple, very simple, actually too simple. The Transformer's autoregressive structure means energy per token is deeply nonlinear. Long input? Quadratic attention cost hits hard at prefill. Very short output? That cost never gets amortized. You just paid a lot for very little. We mapped this out properly. The model is called SweetSpot. It predicts the energy-per-output-token curve as a function of (input tokens, output tokens) from first principles, FLOPs and memory access complexity, and hits 1.79% MAPE across 13 LLMs (1B to 9B params, OPT / LLaMA / Gemma / Falcon / Qwen2 / Granite) on NVIDIA H100s. The sweet spot: short-to-moderate inputs, medium outputs. The nightmare: 4096 token prompt → 64 token reply. Up to 33x less efficient than optimal. Also: GQA models consistently beat MHA at the same scale. Architecture matters, not just parameter count.

Photo by Paul Hanaoka on Unsplash

Why is it important?

Current energy estimates for LLM inference assume a simple linear relationship with sequence length — but the actual Transformer architecture makes this fundamentally wrong. SweetSpot matters because it: Corrects a broken assumption baked into how the entire industry estimates inference costs Quantifies the stakes: up to 33x energy difference between efficient and inefficient usage patterns, at datacenter scale Is actionable, as the model is accurate enough (1.79% MAPE) to directly inform real production decisions like prompt truncation, summarization, and adaptive generation strategies Essentially, it gives engineers a principled tool to stop wasting energy they didn't even know they were wasting.

This page is a summary of: SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference, May 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3777884.3797011.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page

SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference

What is it about?

Why is it important?

Resources

LLM Inference Energy Efficiency Benchmark Dataset

GitHub Repository

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference

What is it about?

Featured Image

Why is it important?

Read the Original

Resources

LLM Inference Energy Efficiency Benchmark Dataset

GitHub Repository

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management