A Case For Intra-rack Resource Disaggregation in HPC

George Michelogiannakis; Benjamin Klenk; Brandon Cook; Min Yee Teh; Madeleine Glick; Larry Dennison; Keren Bergman; John Shalf

doi:10.1145/3514245

What is it about?

Modern HPC systems today are designed for a few important and demanding applications, but the majority of the time the applications that do execute do not use all or most of available resources. In addition, resources are allocated in units of statically configured nodes. This results in vast underutilization of expensive resources. In this paper, we study and quantify this effect. We then use different approaches to quantify how much fewer resources we can deploy in an HPC system similar to Cori if resources within a rack could be allocated in a fine-grain manner.

Photo by Kirill Sh on Unsplash

Why is it important?

In the future, HPC systems will become more heterogeneous and larger-scale. This will intensify the resource underutilization problem and unless we take action this may result in overly expensive systems that deliver a fraction of this capacity. Allocating resources in a fine-grain manner (resource disaggregation) is a potential solution, but until now there was no concrete and data-driven study to show what range of disaggregation is appropriate.

Perspectives

This study does not rely on models and simulation, and instead measures the actual usage of a top production and open-science HPC system over a period of three weeks. This allows to look into reality and not predictions. The conclusions that this study makes are powerful in terms of quantifying the expected benefit of intra-rack resource disaggregation and therefore we think it makes a convincing case for an open-science system. Finally, this study also shows some new or uncommon metrics we used to extract insight from real system data.
George Michelogiannakis
Lawrence Berkeley National Laboratory

This page is a summary of: A Case For Intra-rack Resource Disaggregation in HPC, ACM Transactions on Architecture and Code Optimization, June 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3514245.
You can read the full text:

Read

Resources

Other
PINE: Photonic Integrated Networked Energy efficient datacenters (ENLITENED Program)
This paper provides an overview of our project and our vision for resource disaggregation.

Contributors

The following have contributed to this page

George Michelogiannakis
Lawrence Berkeley National Laboratory

How well resources are utilized in modern HPC systems and what opportunity this creates

What is it about?

Why is it important?

Perspectives

Resources

PINE: Photonic Integrated Networked Energy efficient datacenters (ENLITENED Program)

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

How well resources are utilized in modern HPC systems and what opportunity this creates

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

PINE: Photonic Integrated Networked Energy efficient datacenters (ENLITENED Program)

Contributors

Share this page:

You might also like

LLMEffiChecker:Understanding and Testing Efficiency Degradation of Large Language Models

Antenna Array Performance Diagnostics Using Theory of Collapsed Distributions

Exergetic and financial parametric analyses and multi-objective optimization of a novel geothermal-driven cogeneration plant; adopting a modified dual binary technique

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management