What is it about?

Compiling is the process of transcribing programming code into machine code. There is a long history of research on how distinct instructions of a hardware must be scheduled to optimize a code regarding speed and efficiency. Scheduling single instructions have well-known limits given by the mathematical combinational complexity. Nowadays hardware usually supports parallel execution of instructions targeting different applications: pipelined CPUs and multicore CPUs for general purpose code, GPU for graphic and data intensive computation and computer clusters for high performance computing. Exploiting the parallel capabilities of a platform by parallelizing software code is still a complex and time-demanding task, which also needs a lot of knowledge and experience. This article introduces an approach called Latency Optimized Code Segmentation (LACOS). This novel method groups instructions in the physically most efficient way (true dependencies) and introduces data transfers between these groups when needed. By applying the method, the parallel groups of instructions become differentiable and therefore the groups can be scheduled to different units by numerical methods. This approach is applicable to code without special annotation and without using parallel frameworks during coding. LACOS makes use of otherwise neglected data dependencies between instructions (Read-after-Read), which provide additional information about Instruction Level Parallelism (ILP). In first Proof-of-Concept studies, the method was able to exploit more parallelism in codes during compile time than any other known method. LACOS is therefore a novel compiling approach to a generic form of auto-parallelization of software codes. Furthermore, when LACOS is applied to a code, the resulting segments build always a sequence of “compute -> transfer -> compute -> transfer”-segments. This makes it possible to compile the input code to different hardware types, which enhances the portability of code and enables the exploitation of different processor types (e.g. cores on a multicore-CPU and GPUs) without additional effort during coding.

Featured Image

Read the Original

This page is a summary of: Using Read-After-Read Dependencies to Control Task-Granularity, June 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3659914.3659921.
You can read the full text:

Read

Contributors

The following have contributed to this page