Abstract:Spiking Neural Networks have attracted significant attention in recent years due to their distinctive low-power characteristics. Meanwhile, Transformer models, known for their powerful self-attention mechanisms and parallel processing capabilities, have demonstrated exceptional performance across various domains, including natural language processing and computer vision. Despite the significant advantages of both SNNs and Transformers, directly combining the low-power benefits of SNNs with the high performance of Transformers remains challenging. Specifically, while the sparse computing mode of SNNs contributes to reduced energy consumption, traditional attention mechanisms depend on dense matrix computations and complex softmax operations. This reliance poses significant challenges for effective execution in low-power scenarios. Given the tremendous success of Transformers in deep learning, it is a necessary step to explore the integration of SNNs and Transformers to harness the strengths of both. In this paper, we propose a novel model architecture, Spike Aggregation Transformer (SAFormer), that integrates the low-power characteristics of SNNs with the high-performance advantages of Transformer models. The core contribution of SAFormer lies in the design of the Spike Aggregated Self-Attention (SASA) mechanism, which significantly simplifies the computation process by calculating attention weights using only the spike matrices query and key, thereby effectively reducing energy consumption. Additionally, we introduce a Depthwise Convolution Module (DWC) to enhance the feature extraction capabilities, further improving overall accuracy. We evaluated and demonstrated that SAFormer outperforms state-of-the-art SNNs in both accuracy and energy consumption, highlighting its significant advantages in low-power and high-performance computing.

Energy efficient spike transformer accelerator at the edge

Spiking Transformer Hardware Accelerators in 3D Integration

Spike Trains Encoding Optimization for Spiking Neural Networks Implementation in FPGA

SpikingMiniLM: Energy-Efficient Spiking Transformer for Natural Language Understanding

Trimming Down Large Spiking Vision Transformers via Heterogeneous Quantization Search

Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers

Towards Energy-Preserving Natural Language Understanding with Spiking Neural Networks

BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge

Combining Aggregated Attention and Transformer Architecture for Accurate and Efficient Performance of Spiking Neural Networks

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment

Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training

Transformer Inference Acceleration in Edge Computing Environment

Masked Spiking Transformer

Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers

A Reconfigurable FPGA-based Spiking Neural Network Accelerator

Towards High-performance Spiking Transformers from ANN to SNN Conversion

Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips

Boosting Throughput and Efficiency of Hardware Spiking Neural Accelerators using Time Compression Supporting Multiple Spike Codes

Enabling Efficient On-Edge Spiking Neural Network Acceleration with Highly Flexible FPGA Architectures

Spikeformer: Training high-performance spiking neural network with transformer

Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence