MFIT: Multi-Fidelity Thermal Modeling for 2.5D and 3D Multi-Chiplet Architectures

Lukas Pfromm,Alish Kanani,Harsh Sharma,Parth Solanki,Eric Tervo,Jaehyun Park,Janardhan Rao Doppa,Partha Pratim Pande,Umit Y. Ogras
2024-10-12
Abstract:Rapidly evolving artificial intelligence and machine learning applications require ever-increasing computational capabilities, while monolithic 2D design technologies approach their limits. Heterogeneous integration of smaller chiplets using a 2.5D silicon interposer and 3D packaging has emerged as a promising paradigm to address this limit and meet performance demands. These approaches offer a significant cost reduction and higher manufacturing yield than monolithic 2D integrated circuits. However, the compact arrangement and high compute density exacerbate the thermal management challenges, potentially compromising performance. Addressing these thermal modeling challenges is critical, especially as system sizes grow and different design stages require varying levels of accuracy and speed. Since no single thermal modeling technique meets all these needs, this paper introduces MFIT, a range of multi-fidelity thermal models that effectively balance accuracy and speed. These multi-fidelity models can enable efficient design space exploration and runtime thermal management. Our extensive testing on systems with 16, 36, and 64 2.5D integrated chiplets and 16x3 3D integrated chiplets demonstrates that these models can reduce execution times from days to mere seconds and milliseconds with negligible loss in accuracy.
Hardware Architecture
What problem does this paper attempt to address?
The problem this paper attempts to address is: With the rapid development of artificial intelligence and machine learning applications, the demand for computing power is continuously increasing, while traditional monolithic 2D chip design technology is approaching its limits. Heterogeneous integration of chiplets through 2.5D silicon interposer and 3D packaging technologies has emerged as a promising solution to overcome this limitation and meet performance demands. However, this compact layout and high computational density exacerbate thermal management challenges, which may affect system performance. Therefore, effectively addressing these thermal modeling issues is crucial, especially as the system scales up and different design stages require varying levels of accuracy and speed. Since no single thermal modeling technique can meet all these needs, this paper proposes MFIT (Multi-Fidelity Thermal Modeling), a series of multi-fidelity thermal models that can effectively balance accuracy and speed. These multi-fidelity models can support efficient design space exploration and runtime thermal management. Specifically, the paper proposes a new thermal modeling method that systematically abstracts fine-grained Finite Element Models (FEM) to generate abstract FEM, thermal RC models, and Discrete State Space (DSS) models to achieve different speed and accuracy trade-offs. Through this method, the paper provides a series of open-source multi-fidelity thermal models that cover a wide range of accuracy and speed, and are suitable for different stages of the entire design cycle, from system specification definition to architectural exploration, logic design, physical design and verification, manufacturing, and post-silicon optimization/verification.