What is it about?
Today’s AI applications - such as object detection in photos or videos - need to move huge amounts of data in and out of memory very quickly. If the memory system can’t keep up, the processor sits idle and wastes time and energy. Our work introduces DCMA, a new way of fetching and reusing data that combines the best of two established methods (DMA and cache). We evaluated the DCMA on an FPGA board and tested it with a image-recognition CNN networks, showing up to 17x faster processing.
Featured Image
Photo by muxin alkayis on Unsplash
Why is it important?
What makes our work special is that it brings together two memory-access approaches that have traditionally lived in separate worlds - image-processing DMAs (with their 2D transfers and padding features) and processor caches (with their automatic data reuse). DCMA is an architecture that combines these strengths in a multi-port design tailored specifically for modern AI workloads.
Read the Original
This page is a summary of: DCMA: Accelerating Parallel DMA Transfers with a Multi-Port Direct Cached Memory Access in a Massive-Parallel Vector Processor, ACM Transactions on Architecture and Code Optimization, June 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3730582.
You can read the full text:
Contributors
The following have contributed to this page







