A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware

Marcel Breyer; Alexander Van Craen; Dirk Pflüger

doi:10.1145/3529538.3529980

What is it about?

In scientific computing and Artificial Intelligence (AI), which both rely on massively parallel tasks, frameworks like the Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL) are widely used to harvest the computational power of accelerator cards, in particular of Graphics Processing Units (GPUs). A few years ago, GPUs from NVIDIA were used almost exclusively for these tasks but meanwhile, AMD and Intel are increasing their shares of the GPUs market. This introduces many new challenges for code development, as the prevailing CUDA code can only run on NVIDIA hardware and must be adapted or even completely rewritten to run on GPUs from AMD or Intel. In this paper, we compare the different competing programming frameworks OpenMP, CUDA, OpenCL, and SYCL, paying special attention to the two SYCL implementations hipSYCL and DPC++. Thereby, we investigate the different frameworks with respect to their usability, performance, and performance portability on a variety of hardware platforms from different vendors, i.e., GPUs from NVIDIA, AMD, and Intel and Central Processing Units (CPUs) from AMD and Intel. Besides discussing the runtimes of these frameworks on the different hardware platforms, we also focus our comparison on the differences between the nd_range kernel formulation and the SYCL specific hierarchical kernels. Our Parallel Least Squares Support Vector Machine (PLSSVM) library implements backends for the four previously mentioned programming frameworks for a Least Squares Support Vector Machines (LS-SVMs). At its example, we show which of the frameworks is best suited for a standard workload that is frequently employed in scientific computing and AI, depending on the target hardware: The most computationally intensive part of our PLSSVM library is solving a system of linear equations using the Conjugate Gradient (CG) method. Specifically, we parallelize the implicit matrix-vector multiplication inside the CG method, a workload common in many scientific codes.

Photo by Nana Dua on Unsplash

Why is it important?

We hope that our findings will help developers to decide on which framework to use when targeting specific hardware platforms.

This page is a summary of: A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware, May 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3529538.3529980.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page

A Comparison of SYCL, OpenCL, CUDA, and OpenMP for a Massively Parallel SVM on Multi-Vendor Hardware

What is it about?

Why is it important?

Resources

PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine

PLSSVM GitHub Repository

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

A Comparison of SYCL, OpenCL, CUDA, and OpenMP for a Massively Parallel SVM on Multi-Vendor Hardware

What is it about?

Featured Image

Why is it important?

Read the Original

Resources

PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine

PLSSVM GitHub Repository

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management