IntAct: A 96-Core Processor With Six Chiplets 3D-Stacked on an Active Interposer With Distributed Interconnects and Integrated Power Management
Pascal Vivet,Eric Guthmuller,Yvain Thonnart,Gael Pillonnet,César Fuguet,Ivan Miro-Panades,Guillaume Moritz,Jean Durupt,Christian Bernard,Didier Varreau,Julian Pontes,Sébastien Thuries,David Coriat,Michel Harrand,Denis Dutoit,Didier Lattard,Lucile Arnaud,Jean Charbonnier,Perceval Coudrain,Arnaud Garnier,Frédéric Berger,Alain Gueugnot,Alain Greiner,Quentin L. Meunier,Alexis Farcy,Alexandre Arriordaz,Séverine Chéramy,Fabien Clermidy,Cesar Fuguet,Sebastien Thuries,Frederic Berger,Severine Cheramy
DOI: https://doi.org/10.1109/jssc.2020.3036341
2021-01-01
Abstract:In the context of high-performance computing, the integration of more computing capabilities with generic cores or dedicated accelerators for artificial intelligence (AI) application is raising more and more challenges. Due to the increasing costs of advanced nodes and the difficulties of shrinking analog and circuit input output signals (IOs), alternative architecture solutions to single die are becoming mainstream. Chiplet-based systems using 3D technologies enable modular and scalable architecture and technology partitioning. Nevertheless, there are still limitations due to chiplet integration on passive interposers—silicon or organic. In this article we present the first CMOS active interposer, integrating: 1) power management without any external components; 2) distributed interconnects enabling any chiplet-to-chiplet communication; and3) system infrastructure, design-for-test, and circuit IOs. The IntAct circuit prototype integrates six chiplets in FDSOI 28-nm technology, which are 3D-stacked onto this active interposer in 65-nm process, offering a total of 96 computing cores. Full scalability of the computing system is achieved using an innovative scalable cache-coherent memory hierarchy, enabled by distributed network-on-chips, with 3-Tbit/s/mm<sup>2</sup> high bandwidth 3D-plug interfaces using 20- $mu text{m}$ pitch micro-bumps, 0.6-ns/mm low latency asynchronous interconnects, while the six chiplets are locally power-supplied with 156-mW/mm2 at 82%-peak-efficiency dc–dc converters through the active interposer. Thermal dissipation is studied showing the feasibility of such approach.
engineering, electrical & electronic