What is it about?

The rapid development of Large Language Models (LLMs) requires highly efficient hardware to run them directly on edge devices, like smartphones or IoT devices, for tasks such as real-time inference and continuous learning. However, creating single, powerful chips for these diverse tasks is challenging. We introduce 3D-CIMlet, a co-design framework that uses multiple small, specialized chips (chiplets) integrated in 2.5D or 3D packages. This approach allows us to combine different memory technologies into a single system. Our framework helps designers explore and map the different LLM tasks—like inference and continual learning—to the most suitable chiplets in a "reliability-aware" and "thermal-aware" manner. By using this heterogeneous, multi-chiplet approach, our designs achieve significantly better energy efficiency and performance compared to traditional single-chip solutions.

Featured Image

Why is it important?

This work addresses the urgent and underexplored challenge of designing hardware for Large Language Model (LLM) inference and continual learning at the edge. Prior efforts often overlooked the distinct hardware requirements for these different LLM modes. The significant findings and contributions are that: a) We present the comprehensive, thermal-aware co-design framework that enables large-scale design space exploration for multi-die architectures supporting both inference and continual learning. b) 2.5D/3D designs with heterogeneous RRAM and eDRAM chiplets dramatically improve energy efficiency and latency. c) The framework develops memory-reliability-aware chiplet mapping strategies to optimally partition weights and attention activations to support both inference and continual learning. This research is vital for the future deployment of adaptable and energy-efficient LLMs on edge devices, paving the way for scalable, high-performance AI systems in resource-constrained environments.

Perspectives

developing the 3D-CIMlet framework was especially rewarding as it directly tackles the core challenge of deploying powerful Large Language Models (LLMs) on resource-constrained edge devices. This work allowed us to systematically explore the optimal integration of heterogeneous memory technologies, such as RRAM and eDRAM, within advanced 2.5D/3D multi-chiplet systems. Our findings strongly validate the necessity of a heterogeneous, multi-chiplet approach for scalable and energy-efficient edge AI acceleration. I am confident that this co-design framework provides the essential, practical guidelines that will be adopted by hardware architects in designing the next generation of smart, adaptive edge systems.

Shuting Du
Purdue University

Read the Original

This page is a summary of: 3D-CIMlet: A Chiplet Co-Design Framework for Heterogeneous In-Memory Acceleration of Edge LLM Inference and Continual Learning, June 2025, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/dac63849.2025.11133077.
You can read the full text:

Read

Contributors

The following have contributed to this page