RepSViT: an Efficient Vision Transformer Based on Spiking Neural Networks for Object Recognition in Satellite On-Orbit Remote Sensing Images

Yanhua Pang,Libo Yao,Yiping Luo,Chengguo Dong,Qinglei Kong,Bo Chen
DOI: https://doi.org/10.1109/tgrs.2024.3367709
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:The role of on-orbit computing for satellites is transitioning from being a backup measure to becoming a primary key function. However, the limited computing resources available on satellites make it difficult to deploy advanced models with large parameters. Additionally, satellite on-orbit computing requires high speed and accuracy, posing significant challenges for developing suitable models. To overcome these challenges, we propose an efficient vision transformer, RepSViT, for satellite on-orbit computing. The RepSViT introduces spiking neural networks (SNNs) with high biological plausibility, event-driven property, and low power consumption into the field of remote sensing image processing and satellite on-orbit computing for the first time and incorporates structural reparameterization. Specifically, we design a dynamic dilated spiking convolution ((DSC)-S-2) based on SNNs to improve the feature extraction capability and efficiency of RepSViT. We also develop a spiking guided attention module (SGAM) to make RepSViT pay more attention to object-related features with lower computational costs. Furthermore, we design an efficient coupled fine-coarse-grained block (ECFC) to enhance the model's capability in extracting coarse and fine-grained features. To ensure effective feature extraction, inference speed, and reduced computational costs, we design a reparameterized feed-forward network (RepFFN). RepSViT achieves an inference latency of 8.33 ms and a recognition accuracy of 95% on an embedded GPU, utilizing 3.77 million parameters and consuming 0.6 GFLOPs computational costs.
What problem does this paper attempt to address?