What is it about?
In this work follow a model-assisted approach to the BLAS multi-GPU optimization problem: we introduce a variety of models for tiling size, performance and energy efficiency and integrate them into an end-to-end BLAS framework named PARALiA. This framework couples autotuning with an optimized task scheduler, leading to near-optimal data distribution and performance-aware resource utilization. PARALiA provides state-of-the-art performance and energy efficiency and incorporates the ability to adapt to heterogeneous systems and scenarios via model-based decisions.
Photo by Henry & Co. on Unsplash
Why is it important?
Dense linear algebra operations appear very frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieve optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are frequently offloaded on GPUs, necessitating the use of optimized libraries to ensure good performance. Unfortunately, multi-GPU systems are accompanied by multiple optimization challenges like data decomposition, data transfer, communication-computation overlap, problem splitting, scheduling in multiple workers (GPUs) and, ultimately, determining which devices should be used for each routine invocation. With PARALiA we aim to overcome these limitations.
Read the Original
This page is a summary of: PARALiA : A Performance Aware Runtime for Auto-tuning Linear Algebra on heterogeneous systems, ACM Transactions on Architecture and Code Optimization, September 2023, ACM (Association for Computing Machinery),
You can read the full text:
The following have contributed to this page