Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication

Jannis Schönleber,Lukas Cavigelli,Renzo Andri,Matteo Perotti,Luca Benini
2023-11-17
Abstract:From classical HPC to deep learning, MatMul is at the heart of today's computing. The recent Maddness method approximates MatMul without the need for multiplication by using a hash-based version of product quantization (PQ) indexing into a look-up table (LUT). Stella Nera is the first Maddness accelerator and it achieves 15x higher area efficiency (GMAC/s/mm^2) and more than 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators implemented in the same technology. The hash function is a decision tree, which allows for an efficient hardware implementation as the multiply-accumulate operations are replaced by decision tree passes and LUT lookups. The entire Maddness MatMul can be broken down into parts that allow an effective implementation with small computing units and memories, allowing it to reach extreme efficiency while remaining generically applicable for MatMul tasks. In a commercial 14nm technology and scaled to 3nm, we achieve an energy efficiency of 161 TOp/s/W@0.55V with a Top-1 accuracy on CIFAR-10 of more than 92.5% using ResNet9.
Hardware Architecture,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the efficiency of matrix multiplication (MatMul) in deep - learning computations, especially the energy efficiency and area efficiency on hardware accelerators. With the continuous growth of deep - learning models, the demand for computing speed and efficiency is increasing day by day, but traditional multiplication - based matrix - multiplication accelerators have encountered bottlenecks in energy efficiency. The paper proposes a method named Maddness, which approximates matrix multiplication by using hash functions and look - up tables (LUT) to avoid multiplication operations, achieving higher energy efficiency and area efficiency. Specifically, the main contributions of the paper include: 1. **Stella Nera**: This is an open - source and fully parameterized implementation of the Maddness hardware accelerator, which can achieve an energy efficiency of 161 tera - operations per watt (TOp/s/W) in commercial 14 - nanometer technology and is further improved in 3 - nanometer technology. 2. **Differentiable Maddness**: The paper proposes the first differentiable Maddness method, allowing this method to be used for the training of deep neural networks (DNN). 3. **PyTorch implementation**: It provides a well - tested PyTorch implementation of the differentiable Maddness linear layer and convolutional layer. 4. **Experimental results**: A top - 1 accuracy of 92.6% was achieved on the CIFAR - 10 dataset using the ResNet9 model, which is only 1.2% different from the FP32 baseline. Through these contributions, the paper aims to provide an efficient and general - purpose matrix - multiplication approximation method suitable for large - scale deep - learning tasks and capable of achieving extremely high energy efficiency and area efficiency on hardware.