22.1 A 12.4TOPS/W @ 136GOPS AI-IoT System-on-Chip with 16 RISC-V, 2-to-8b Precision-Scalable DNN Acceleration and 30%-Boost Adaptive Body Biasing

L. Benini,O. Montfort,Alfio Di Mauro,G. Ottavi,M. Louvat,M. Eggimann,Pascal Gouedo,Francesco Conti,D. Rossi,Nils Exibard,G. Paulin,Hayate Okuhara,Georg Rutishauser,Angelo Garofalo,Emmanuel Botte,Lionel Jure,V. Huard
DOI: https://doi.org/10.1109/ISSCC42615.2023.10067643
2023-02-19
Abstract:Emerging Artificial Intelligence-enabled Internet-of-Things (Al-loT) SoCs [1–4] for augmented reality, personalized healthcare and nano-robotics need to run a large variety of tasks within a power envelope of a few tens of mW: compute-intensive but bit-precision-tolerant Deep Neural Networks (DNNs), as well as signal processing and control requiring high-precision floating-point. Performance and energy constraints vary greatly between different applications and even within different stages of the same application. We present Marsellus (Fig. 22.1.1), an all-digital Al-loT end-node heterogeneous $\mathsf{SoC}$ fabricated in GlobalFoundries $22\mathsf{nm}$ FDX that combines three key contributions to enable aggressive scaling of performance and energy: 1) a generalpurpose cluster of 16 RISC-V DSP cores attuned for execution of a diverse range of workloads exploiting $4\mathsf{b}$ and $2\mathsf{b}$ arithmetic extensions (XpulpNN), combined with fused MAC $\&$ LOAD (M&L) operations and floating-point support; 2) a 2-8b reconfigurable binary engine to accelerate $3\times 3$ and $1\times 1$ (pointwise) convolutions in DNNs; 3) a set of On-Chip Monitoring (OCM) blocks connected to an Adaptive Body Bias (ABB) generator and a hardware control loop, enabling on-the-fly adaptation of transistor threshold voltages.
Medicine,Engineering,Computer Science
What problem does this paper attempt to address?