What is it about?
Huge pages—large contiguous blocks of physical memory—help reduce costly TLB misses and boost performance for data‑intensive applications. However, when only part of a huge page is used, the unused portion becomes “memory bloat,” wasting valuable RAM. EMD addresses this challenge by recognizing that bloat is unevenly distributed across an application’s address space. At runtime, it selectively breaks only those huge pages where the most unused memory can be reclaimed with minimal performance impact. Implemented in the Linux kernel, EMD outperforms the previous state‑of‑the‑art by up to 69 % while ensuring fair resource sharing among co‑running workloads—even under tight memory pressure—making huge pages practical for a broader range of applications. Parth Gangar (currently at Fujitsu Research), Ashish Panwar (Microsoft Research), and K. Gopinath (Rishihood University) contributed to this work. It was carried out at the Department of Computer Science and Automation, Indian Institute of Science (IISc), as part of Parth Gangar’s Master’s thesis, advised by the latter two authors during their time at IISc.
Featured Image
Photo by Possessed Photography on Unsplash
Why is it important?
Efficient use of memory is critical for sustaining high performance in modern data‑intensive applications. While huge pages can significantly reduce address translation overhead, they also risk wasting large amounts of memory when only partially used—an issue known as memory bloat. This trade‑off between performance and memory efficiency is common in real‑world systems, especially under memory pressure, yet most existing solutions either lack fine‑grained control or fail to ensure fairness across applications. EMD is important because it bridges this gap. By dynamically identifying where memory bloat is concentrated and reclaiming it with minimal performance loss, EMD allows systems to retain the speed benefits of huge pages without the memory waste. Moreover, its fairness‑aware design ensures that no single application is disproportionately penalized, making it well‑suited for multi‑tenant and shared‑resource environments. The result is a practical, OS‑level solution that can be deployed today without requiring hardware changes or application modifications.
Perspectives
Availability of memory is often a bottleneck for high‑performance applications, whether the memory is accessed by CPUs or GPUs. In addition, the fungibility (i.e., “interchangeability”) of memory between different types—such as kernel vs. user memory in this work, or CPU vs. GPU memory in AI applications—is an important consideration that can constrain performance. Since application processing and memory access characteristics can change rapidly at fine time scales, we need low‑cost mechanisms that can predict and adapt the memory subsystem accordingly. This calls for quick and effective mechanisms at the architecture/OS levels, and “slower,” more involved mechanisms (such as ML) at higher levels, including the application level. This work demonstrates that using hardware counters and simple algorithms in the OS kernel can deliver good memory performance by leveraging huge pages effectively. It builds on earlier research titled HawkEye: Efficient Fine‑grained OS Support for Huge Pages by Ashish Panwar, Sorav Bansal, and K. Gopinath (ASPLOS 2019).
Parth Gangar
Indian Institute of Science
Read the Original
This page is a summary of: EMD: Fair and Efficient Dynamic Memory De-bloating of Transparent Huge Pages, June 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3735950.3735952.
You can read the full text:
Contributors
The following have contributed to this page







