Spiking Transformer Hardware Accelerators in 3D Integration

Boxun Xu,Junyoung Hwang,Pruek Vanna-iampikul,Sung Kyu Lim,Peng Li
2024-11-12
Abstract:Spiking neural networks (SNNs) are powerful models of spatiotemporal computation and are well suited for deployment on resource-constrained edge devices and neuromorphic hardware due to their low power consumption. Leveraging attention mechanisms similar to those found in their artificial neural network counterparts, recently emerged spiking transformers have showcased promising performance and efficiency by capitalizing on the binary nature of spiking operations. Recognizing the current lack of dedicated hardware support for spiking transformers, this paper presents the first work on 3D spiking transformer hardware architecture and design methodology. We present an architecture and physical design co-optimization approach tailored specifically for spiking transformers. Through memory-on-logic and logic-on-logic stacking enabled by 3D integration, we demonstrate significant energy and delay improvements compared to conventional 2D CMOS integration.
Neural and Evolutionary Computing,Hardware Architecture
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the lack of dedicated hardware support when deploying Spiking Transformers on resource - constrained edge devices and neuromorphic hardware. Specifically: 1. **Requirements for low power consumption and high efficiency**: Spiking Neural Networks (SNNs) are very suitable for deployment on resource - constrained edge devices and neuromorphic hardware due to their low - power characteristics. However, the existing hardware architectures are not specifically optimized for spiking transformers, resulting in their performance and efficiency not being fully utilized. 2. **Advantages of 3D integration**: To bridge this gap, the author proposes a new hardware accelerator architecture based on 3D integration technology. Through memory - on - logic on the logic layer and logic - on - logic on the logic layer, this architecture can significantly reduce energy consumption and latency while increasing data processing speed and area utilization. 3. **Specific challenges and contributions**: - **Challenges**: Existing hardware accelerators are mainly targeted at traditional non - spiking neural networks (ANNs) or simple SNNs (such as spiking CNNs) and are not suitable for complex spiking transformers. - **Contributions**: 1. Proposed the first 3D accelerator architecture dedicated to spiking transformers, which supports spiking computation through spatio - temporal weight reuse. 2. Implemented the first 3D memory - logic and logic - logic interconnection scheme, which significantly reduces energy consumption and latency, provides an efficient spiking neural computing system, and reduces area overhead. 4. **Performance improvement**: Compared with traditional 2D CMOS integration, this 3D accelerator achieves a 7.0% effective frequency improvement, a 50% area reduction, a 7.8% power consumption reduction, a 68.3% memory access latency reduction, and a 69.5% memory access power consumption reduction in spiking MLP workloads. Similar performance improvements are also achieved for spiking self - attention workloads. In summary, this paper aims to accelerate the workloads of spiking transformers by developing a dedicated 3D hardware architecture, thereby achieving more efficient computation and lower energy consumption, especially in resource - constrained environments.