What is it about?
Modern HPC systems today are designed for a few important and demanding applications, but the majority of the time the applications that do execute do not use all or most of available resources. In addition, resources are allocated in units of statically configured nodes. This results in vast underutilization of expensive resources. In this paper, we study and quantify this effect. We then use different approaches to quantify how much fewer resources we can deploy in an HPC system similar to Cori if resources within a rack could be allocated in a fine-grain manner.
Photo by Kirill Sh on Unsplash
Why is it important?
In the future, HPC systems will become more heterogeneous and larger-scale. This will intensify the resource underutilization problem and unless we take action this may result in overly expensive systems that deliver a fraction of this capacity. Allocating resources in a fine-grain manner (resource disaggregation) is a potential solution, but until now there was no concrete and data-driven study to show what range of disaggregation is appropriate.
Read the Original
This page is a summary of: A Case For Intra-rack Resource Disaggregation in HPC, ACM Transactions on Architecture and Code Optimization, June 2022, ACM (Association for Computing Machinery), DOI: 10.1145/3514245.
You can read the full text:
The following have contributed to this page