What is it about?

More specialized chips are exploiting available high transistor density to expose parallelism at a large scale with more intricate instruction sets. This paper reports on a compilation system GCD 2 , developed to support complex Deep Neural Network (DNN) workloads on mobile DSP chips. We observe several challenges in fully exploiting this architecture, related to SIMD width, more complex SIMD/vector instructions, and VLIW pipeline with the notion of soft dependencies. GCD 2 comprises the following contributions: 1) development of matrix layout formats that support the use of different novel SIMD instructions, 2) formulation and solution of a global optimization problem related to choosing the best instruction (and associated layout) for implementation of each operator in a complete DNN, and 3) SDA, an algorithm for packing instructions with consideration for soft dependencies. These solutions are incorporated in a complete compilation system that is extensively evaluated against other systems using 10 large DNN models. Evaluation results show that GCD 2 outperforms two product-level state-of-the-art end-to-end DNN execution frameworks (TFLite and Qualcomm SNPE) that support mobile DSPs by up to 6.0× speedup, and outperforms three established compilers (Halide, TVM, and RAKE) by up to 4.5×,3.4× and 4.0× speedup, respectively. GCD 2 is also unique in supporting, real-time execution of certain DNNs, while its implementation enables two major DNNs to execute on a mobile DSP for the first time.

Featured Image

Read the Original

This page is a summary of: GCD2: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs, October 2022, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/micro56248.2022.00044.
You can read the full text:

Read

Contributors

The following have contributed to this page