Abstract:Spiking Neural Networks (SNNs) have indeed shown remarkable promise in the field of computer vision, emerging as a low-energy alternative to traditional Artificial Neural Networks (ANNs). However, SNNs also face several challenges: i) Existing SNNs are not purely additive and involve a substantial amount of floating-point computations, which contradicts the original design intention of adapting to neuromorphic chips; ii) The incorrect positioning of convolutional and pooling layers relative to spiking layers leads to reduced accuracy; iii) Leaky Integrate-and-Fire (LIF) neurons have limited capability in representing local information, which is disadvantageous for downstream visual tasks like semantic segmentation. To address the challenges in SNNs, i) we introduce Pure Sparse Self Attention (PSSA) and Dynamic Spiking Membrane Shortcut (DSMS), combining them to tackle the issue of floating-point computations; ii) the Spiking Precise Gradient downsampling (SPG-down) method is proposed for accurate gradient transmission; iii) the Group-LIF neuron concept is introduced to ensure LIF neurons' capability in representing local information both horizontally and vertically, enhancing their applicability in semantic segmentation tasks. Ultimately, these three solutions are integrated into the Powerful Sparse-Spike-Driven Transformer (PSSD-Transformer), effectively handling semantic segmentation tasks and addressing the challenges inherent in SNNs. The experimental results demonstrate that our model outperforms previous results on standard classification datasets and also shows commendable performance on semantic segmentation datasets. Up to this point, PSSD is the first model in the SNN field to perform semantic segmentation on large datasets. The code will be made publicly available after the paper is accepted for publication.

SSNet: a joint learning network for semantic segmentation and disparity estimation

SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation

SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular Images

PSSD-Transformer: Powerful Sparse Spike-Driven Transformer for Image Semantic Segmentation

S$^3$M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving

JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds

JSH-Net: joint semantic segmentation and height estimation using deep convolutional networks from single high-resolution remote sensing imagery

TiCoSS: Tightening the Coupling between Semantic Segmentation and Stereo Matching within A Joint Learning Framework

TCNet: Multiscale Fusion of Transformer and CNN for Semantic Segmentation of Remote Sensing Images

LACTNet: A Lightweight Real-Time Semantic Segmentation Network Based on an Aggregated Convolutional Neural Network and Transformer

CI-Net: a joint depth estimation and semantic segmentation network using contextual information

TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven Image Fusion Network

Joint Semantic Segmentation using representations of LiDAR point clouds and camera images

JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds

Joint Learning of Semantic Segmentation and Height Estimation for Remote Sensing Image Leveraging Contrastive Learning

HSNet: an Intelligent Hierarchical Semantic-Aware Network System for Real-Time Semantic Segmentation

MFTransNet: A Multi-Modal Fusion with CNN-Transformer Network for Semantic Segmentation of HSR Remote Sensing Images

SSCNet: A Spectrum-Space Collaborative Network for Semantic Segmentation of Remote Sensing Images

RelationNet: Learning Deep-Aligned Representation for Semantic Image Segmentation

Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images

A Joint 2D-3D Complementary Network for Stereo Matching