Abstract:Spiking neural networks (SNNs) are powerful models of spatiotemporal computation and are well suited for deployment on resource-constrained edge devices and neuromorphic hardware due to their low power consumption. Leveraging attention mechanisms similar to those found in their artificial neural network counterparts, recently emerged spiking transformers have showcased promising performance and efficiency by capitalizing on the binary nature of spiking operations. Recognizing the current lack of dedicated hardware support for spiking transformers, this paper presents the first work on 3D spiking transformer hardware architecture and design methodology. We present an architecture and physical design co-optimization approach tailored specifically for spiking transformers. Through memory-on-logic and logic-on-logic stacking enabled by 3D integration, we demonstrate significant energy and delay improvements compared to conventional 2D CMOS integration.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the lack of dedicated hardware support when deploying Spiking Transformers on resource - constrained edge devices and neuromorphic hardware. Specifically: 1. **Requirements for low power consumption and high efficiency**: Spiking Neural Networks (SNNs) are very suitable for deployment on resource - constrained edge devices and neuromorphic hardware due to their low - power characteristics. However, the existing hardware architectures are not specifically optimized for spiking transformers, resulting in their performance and efficiency not being fully utilized. 2. **Advantages of 3D integration**: To bridge this gap, the author proposes a new hardware accelerator architecture based on 3D integration technology. Through memory - on - logic on the logic layer and logic - on - logic on the logic layer, this architecture can significantly reduce energy consumption and latency while increasing data processing speed and area utilization. 3. **Specific challenges and contributions**: - **Challenges**: Existing hardware accelerators are mainly targeted at traditional non - spiking neural networks (ANNs) or simple SNNs (such as spiking CNNs) and are not suitable for complex spiking transformers. - **Contributions**: 1. Proposed the first 3D accelerator architecture dedicated to spiking transformers, which supports spiking computation through spatio - temporal weight reuse. 2. Implemented the first 3D memory - logic and logic - logic interconnection scheme, which significantly reduces energy consumption and latency, provides an efficient spiking neural computing system, and reduces area overhead. 4. **Performance improvement**: Compared with traditional 2D CMOS integration, this 3D accelerator achieves a 7.0% effective frequency improvement, a 50% area reduction, a 7.8% power consumption reduction, a 68.3% memory access latency reduction, and a 69.5% memory access power consumption reduction in spiking MLP workloads. Similar performance improvements are also achieved for spiking self - attention workloads. In summary, this paper aims to accelerate the workloads of spiking transformers by developing a dedicated 3D hardware architecture, thereby achieving more efficient computation and lower energy consumption, especially in resource - constrained environments.

Spiking Transformer Hardware Accelerators in 3D Integration

Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers

Darwin:a Neuromorphic Hardware Co-Processor Based on Spiking Neural Networks

Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers

Boosting Throughput and Efficiency of Hardware Spiking Neural Accelerators using Time Compression Supporting Multiple Spike Codes

Spike Trains Encoding Optimization for Spiking Neural Networks Implementation in FPGA

Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training

Design Space Exploration of Hardware Spiking Neurons for Embedded Artificial Intelligence

PT-Spike: A Precise-Time-Dependent Single Spike Neuromorphic Architecture with Efficient Supervised Learning

Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips

Spikingformer: Spike-driven Residual Learning for Transformer-based Spiking Neural Network

You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy

A Digital Neuromorphic Hardware for Spiking Neural Network

To Spike or Not To Spike: A Digital Hardware Perspective on Deep Learning Acceleration

30.2 A 22nm 0.26nW/Synapse Spike-Driven Spiking Neural Network Processing Unit Using Time-Step-First Dataflow and Sparsity-Adaptive In-Memory Computing

Enabling Efficient On-Edge Spiking Neural Network Acceleration with Highly Flexible FPGA Architectures

A Reconfigurable FPGA-based Spiking Neural Network Accelerator

An Efficient Software-Hardware Design Framework for Spiking Neural Network Systems

SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception

A Low Power and Low Latency FPGA-Based Spiking Neural Network Accelerator

Multi-core ARM-based Hardware-Accelerated Computation for Spiking Neural Networks