Look-Up Table based Neural Network Hardware

Ovishake Sen,Chukwufumnanya Ogbogu,Peyman Dehghanzadeh,Janardhan Rao Doppa,Swarup Bhunia,Partha Pratim Pande,Baibhab Chatterjee
2024-10-01
Abstract:Traditional digital implementations of neural accelerators are limited by high power and area overheads, while analog and non-CMOS implementations suffer from noise, device mismatch, and reliability issues. This paper introduces a CMOS Look-Up Table (LUT)-based Neural Accelerator (LUT-NA) framework that reduces the power, latency, and area consumption of traditional digital accelerators through pre-computed, faster look-ups while avoiding noise and mismatch of analog circuits. To solve the scalability issues of conventional LUT-based computation, we split the high-precision multiply and accumulate (MAC) operations into lower-precision MACs using a divide-and-conquer-based approach. We show that LUT-NA achieves up to $29.54\times$ lower area with $3.34\times$ lower energy per inference task than traditional LUT-based techniques and up to $1.23\times$ lower area with $1.80\times$ lower energy per inference task than conventional digital MAC-based techniques (Wallace Tree/Array Multipliers) without retraining and without affecting accuracy, even on lottery ticket pruned (LTP) models that already reduce the number of required MAC operations by up to 98%. Finally, we introduce mixed precision analysis in LUT-NA framework for various LTP models (VGG11, VGG19, Resnet18, Resnet34, GoogleNet) that achieved up to $32.22\times$-$50.95\times$ lower area across models with $3.68\times$-$6.25\times$ lower energy per inference than traditional LUT-based techniques, and up to $1.35\times$-$2.14\times$ lower area requirement with $1.99\times$-$3.38\times$ lower energy per inference across models as compared to conventional digital MAC-based techniques with $\sim$1% accuracy loss.
Hardware Architecture
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the high power consumption, large chip area, and low energy efficiency faced by traditional neural network accelerators when processing deep neural networks (DNN). Specifically: 1. **Limitations of traditional digital implementation**: - Neural accelerators implemented in the traditional digital way have difficulty in efficiently handling complex DNN workloads due to high power consumption and large chip area overhead. - Although analog and non - CMOS implementations can improve energy efficiency, they are vulnerable to noise, device mismatch, and reliability issues. 2. **Scalability and energy - efficiency challenges**: - As the complexity of neural network models increases, the need for faster processing speed and efficient memory usage becomes more crucial. - It is especially important to achieve efficient neural network computing on resource - constrained nodes such as biomedical implants and wearable devices. To solve these problems, the author proposes a neural accelerator framework based on look - up tables (LUT) (LUT - NA), aiming to reduce power consumption, latency, and chip area consumption through pre - calculated fast look - up tables while avoiding noise and mismatch problems in analog circuits. ### Main contributions 1. **A programmable and scalable LUT - NA framework**: - A novel divide - and - conquer method (D&C) is proposed to implement LUT - NA, making the LUT architecture scalable across multiple DNN models and bit resolutions. 2. **Mixed - precision analysis and approximate computing**: - The concepts of mixed - precision analysis and approximate computing are introduced, further reducing energy and area consumption while only sacrificing about 1% of the accuracy. 3. **Lottery ticket mechanism pruning (LTP) combined with LUT - NA**: - On models where the number of MAC operations has been significantly reduced by LTP, the scalability of LUT - NA is further improved. 4. **Hardware efficiency analysis**: - Hardware efficiency (energy consumption and area consumption per inference) analysis of LUT - NA and approximate/mixed - precision LUT - NA for different deep - learning models is carried out. Through these methods, the LUT - NA framework significantly improves the energy efficiency and scalability of neural network accelerators, is applicable to multiple deep - learning models, and performs well in resource - constrained environments.