Unleashing Parallelism with Elastic-Barriers

Amit Tiwari; V. Krishna Nandivada

doi:10.1145/3727639

What is it about?

Parallel programs often suffer from load imbalance at synchronisation points such as barriers, where faster threads are forced to wait for slower ones, resulting in underutilisation of computational resources. To address this inefficiency, we propose "elastic barriers"—a synchronisation mechanism that enables faster threads to "safely" and "profitably" execute work beyond the barrier, thereby improving overall performance (i.e., execution time of the program).

Photo by Jonathan Chng on Unsplash

Why is it important?

This is a crucial problem to address, as synchronisation via barriers is common across a wide range of parallel programming languages. The approach presented in this paper is generic and broadly applicable, with the potential to enhance performance in large-scale high-performance computing (HPC) applications. In particular, it is well-suited for workloads involving graph-structured data, such as power-law graphs found in machine learning tasks (e.g., PageRank, Perceptron) and social network datasets (e.g., Facebook, Twitter, Skitter).

Perspectives

We believe this work breaks new ground by enabling threads to execute useful work from within the barrier region—even in the presence of data dependencies. As discussed in the related work section, prior approaches have explored speculative execution, complete barrier elimination, or naive post-barrier work assignment. However, these often fail to address load imbalance with sufficient precision. In contrast, our approach enables safe and profitable execution by incorporating multiple runtime checks that adapt the work assigned to each thread based on its individual workload profile. This ensures that execution beyond the barrier avoids bad or sub-optimal performance, even in corner cases.
Amit Tiwari
Indian Institute of Technology Madras

This page is a summary of: Unleashing Parallelism with Elastic-Barriers, ACM Transactions on Architecture and Code Optimization, June 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3727639.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page

Improving Parallel Program Efficiency by Letting Threads Work Beyond the Barrier

What is it about?

Why is it important?

Perspectives

Resources