What is it about?

GraphBLAS is a recent standard for the expression of graph algorithms in the language of linear algebra. GraphBLAS operations may be executed either in blocking or nonblocking mode, but the investigation of nonblocking execution in earlier work is limited. The main challenges of performing nonblocking execution is to fully automatically achieve good performance for different situations and combine data locality optimizations with efficient parallel execution on a shared-memory architecture . The proposed design relies on: a) lazy evaluation to perform data locality optimizations at run-time, b) dynamic data dependence analysis to identify a sequence of operations that may benefit from nonblocking execution, and c) an analytic model that selects proper values for all performance parameters, e.g., number of threads, to fully automate the nonblocking execution. The evaluation using different matrices for three algorithms shows that the users writes a GraphBLAS program for sequential execution, and the program is executed faster in a fully automatic way thanks to the nonblocking execution.

Featured Image

Why is it important?

Nonblocking execution enables optimizations that may significantly improve the performance of programs. GraphBLAS operations are memory bound, and the evaluation shows that parallel nonblocking execution achieves up to 4.11x speedup over the corresponding parallel blocking execution as a result of better cache utilization.

Read the Original

This page is a summary of: Design and implementation for nonblocking execution in GraphBLAS: tradeoffs and performance, ACM Transactions on Architecture and Code Optimization, September 2022, ACM (Association for Computing Machinery), DOI: 10.1145/3561652.
You can read the full text:



The following have contributed to this page