Chiplet-Gym: Optimizing Chiplet-based AI Accelerator Design with Reinforcement Learning

Kaniz Mishty,Mehdi Sadi
2024-06-03
Abstract:Modern Artificial Intelligence (AI) workloads demand computing systems with large silicon area to sustain throughput and competitive performance. However, prohibitive manufacturing costs and yield limitations at advanced tech nodes and die-size reaching the reticle limit restrain us from achieving this. With the recent innovations in advanced packaging technologies, chiplet-based architectures have gained significant attention in the AI hardware domain. However, the vast design space of chiplet-based AI accelerator design and the absence of system and package-level co-design methodology make it difficult for the designer to find the optimum design point regarding Power, Performance, Area, and manufacturing Cost (PPAC). This paper presents Chiplet-Gym, a Reinforcement Learning (RL)-based optimization framework to explore the vast design space of chiplet-based AI accelerators, encompassing the resource allocation, placement, and packaging architecture. We analytically model the PPAC of the chiplet-based AI accelerator and integrate it into an OpenAI gym environment to evaluate the design points. We also explore non-RL-based optimization approaches and combine these two approaches to ensure the robustness of the optimizer. The optimizer-suggested design point achieves 1.52X throughput, 0.27X energy, and 0.01X die cost while incurring only 1.62X package cost of its monolithic counterpart at iso-area.
Hardware Architecture
What problem does this paper attempt to address?
The paper attempts to address the problem of optimizing chiplet-based AI accelerator design in the context of advanced packaging technology. Specifically, the researchers are confronted with the demands of modern AI workloads for large-scale silicon area, but face challenges such as high manufacturing costs, yield limitations, and the nearing limits of wafer size. To overcome these obstacles, the study proposes the Chiplet-Gym framework, a design space exploration framework based on Reinforcement Learning (RL), aimed at optimizing the design space of chiplet-based AI accelerators, covering aspects such as resource allocation, layout, and packaging architecture. Specifically, the main contributions of the paper include: 1. **Proposing a co-design methodology**: This includes resource allocation (such as the number of AI chips, memory capacity, and bandwidth), partitioning and layout of chiplets, selection of different packaging technologies and their attributes (such as bandwidth, bump density, cost, and complexity), to optimize the system-level power, performance, area, and cost (PPAC) of chiplet-based AI accelerators. 2. **Establishing an analytical model**: A cost model for evaluating chiplet-based architectures is developed, enabling rapid assessment of AI accelerator design schemes in time and resource-constrained environments. 3. **Optimizing design parameters**: The interdependencies between design space parameters are identified, and the optimization problem is formulated as a reinforcement learning problem. Additionally, non-RL-based optimization methods (such as simulated annealing) are explored and combined with RL methods to ensure the robustness of the optimizer. Through the above work, the research team validated the performance improvements of their optimized design over state-of-the-art monolithic GPUs in MLPerf benchmark tests, demonstrating the effectiveness and practicality of the Chiplet-Gym framework.