What is it about?
Currently most TinyML devices only focus on inference, as trainingrequires much more hardware resources. In this paper, we introduceSPARK, an efficient hybrid acceleration architecture with run-timesparsity-aware scheduling for TinyML learning. Besides a stand-alone accelerator, an in-pipeline acceleration unit is integratedwithin the CPU pipeline to support simultaneous forward and back-ward propagation. To better utilize sparsity and improve hardwareutilization, a sparsity-aware acceleration scheduler is implementedto schedule the workload between two acceleration units. A unifiedmemory system is also constructed to support transposable datafetch, reducing memory access. We implement SPARK using TSMC22nm technology and evaluate different TinyML tasks. Comparedwith the baseline accelerator, SPARK achieves 4.1× performanceimprovement in average with only 2.27% area overhead. SPARKalso outperforms off-shelf edge devices in performance by 9.4×with 446.0× higher efficiency.
Featured Image
Photo by Vishnu Mohanan on Unsplash
Read the Original
This page is a summary of: SPARK: An Efficient Hybrid Acceleration Architecture with Run-Time Sparsity-Aware Scheduling for TinyML Learning, June 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3649329.3657369.
You can read the full text:
Contributors
The following have contributed to this page







