Attention-free Spikformer: Mixing Spike Sequences with Simple Linear Transforms

Qingyu Wang,Duzhen Zhang,Tielin Zhang,Bo Xu

2023-08-17

Abstract:By integrating the self-attention capability and the biological properties of Spiking Neural Networks (SNNs), Spikformer applies the flourishing Transformer architecture to SNNs design. It introduces a Spiking Self-Attention (SSA) module to mix sparse visual features using spike-form Query, Key, and Value, resulting in the State-Of-The-Art (SOTA) performance on numerous datasets compared to previous SNN-like frameworks. In this paper, we demonstrate that the Spikformer architecture can be accelerated by replacing the SSA with an unparameterized Linear Transform (LT) such as Fourier and Wavelet transforms. These transforms are utilized to mix spike sequences, reducing the quadratic time complexity to log-linear time complexity. They alternate between the frequency and time domains to extract sparse visual features, showcasing powerful performance and efficiency. We conduct extensive experiments on image classification using both neuromorphic and static datasets. The results indicate that compared to the SOTA Spikformer with SSA, Spikformer with LT achieves higher Top-1 accuracy on neuromorphic datasets (i.e., CIFAR10-DVS and DVS128 Gesture) and comparable Top-1 accuracy on static datasets (i.e., CIFAR-10 and CIFAR-100). Furthermore, Spikformer with LT achieves approximately 29-51% improvement in training speed, 61-70% improvement in inference speed, and reduces memory usage by 4-26% due to not requiring learnable parameters.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The main objective of this paper is to explore whether simpler sequence mixing mechanisms (such as Fourier transform or wavelet transform) can completely replace the relatively complex Spiking Self-Attention (SSA) sublayer in the Spikformer architecture. The study found that even simple linear transformations without learnable parameters (such as Fourier transform and wavelet transform) can achieve higher Top-1 accuracy than SSA on neuromorphic datasets and exhibit comparable performance on static datasets. Additionally, these simple linear transformations significantly improve computational efficiency, reduce memory usage, and enhance training and inference speeds by approximately 29-51% and 61-70%, respectively. Specifically, the main contributions of the paper include: 1. Demonstrating that even simple linear transformations like Fourier transform and wavelet transform can effectively extract sparse visual features, with surprising results indicating that SSA may not be the key factor driving Spikformer's performance. 2. Introducing a new Spikformer variant that utilizes Fourier transform or wavelet transform for sequence mixing, and providing a comprehensive analysis of the time complexity of different sequence mixing mechanisms. 3. Extensive experiments validating that the proposed Spikformer with LT achieves higher Top-1 accuracy on neuromorphic datasets compared to the original Spikformer with SSA, and shows comparable performance on static datasets, while also significantly improving computational efficiency and reducing memory usage by 4-26%.

Attention-free Spikformer: Mixing Spike Sequences with Simple Linear Transforms

Spikformer: When Spiking Neural Network Meets Transformer

Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification

TE-Spikformer:Temporal-enhanced spiking neural network with transformer

Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket

SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in Spiking Transformer

Spikingformer: Spike-driven Residual Learning for Transformer-based Spiking Neural Network

Spikeformer: Training high-performance spiking neural network with transformer

Spiking Wavelet Transformer

Auto-Spikformer: Spikformer architecture search

Spiking Transformer with Spatial-Temporal Attention

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

PSSD-Transformer: Powerful Sparse Spike-Driven Transformer for Image Semantic Segmentation

IM-LIF: Improved Neuronal Dynamics with Attention Mechanism for Direct Training Deep Spiking Neural Network

Toward Efficient Processing and Learning with Spikes: New Approaches for Multispike Learning

Spectral Transform Forms Scalable Transformer

Spatial-Temporal Self-Attention for Asynchronous Spiking Neural Networks

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training

Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers

Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers