MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

Tianxiang Chen,Zi Ye,Zhentao Tan,Tao Gong,Yue Wu,Qi Chu,Bin Liu,Nenghai Yu,Jieping Ye

2024-06-24

Abstract:Recently, infrared small target detection (ISTD) has made significant progress, thanks to the development of basic models. Specifically, the models combining CNNs with transformers can successfully extract both local and global features. However, the disadvantage of the transformer is also inherited, i.e., the quadratic computational complexity to sequence length. Inspired by the recent basic model with linear complexity for long-distance modeling, Mamba, we explore the potential of this state space model for ISTD task in terms of effectiveness and efficiency in the paper. However, directly applying Mamba achieves suboptimal performances due to the insufficient harnessing of local features, which are imperative for detecting small targets. Instead, we tailor a nested structure, Mamba-in-Mamba (MiM-ISTD), for efficient ISTD. It consists of Outer and Inner Mamba blocks to adeptly capture both global and local features. Specifically, we treat the local patches as "visual sentences" and use the Outer Mamba to explore the global information. We then decompose each visual sentence into sub-patches as "visual words" and use the Inner Mamba to further explore the local information among words in the visual sentence with negligible computational costs. By aggregating the visual word and visual sentence features, our MiM-ISTD can effectively explore both global and local information. Experiments on NUAA-SIRST and IRSTD-1k show the superior accuracy and efficiency of our method. Specifically, MiM-ISTD is $8 \times$ faster than the SOTA method and reduces GPU memory usage by 62.2$\%$ when testing on $2048 \times 2048$ images, overcoming the computation and memory constraints on high-resolution infrared images.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper focuses on the problem of infrared small target detection (ISTD), which is a binary segmentation task widely used in remote sensing and military tracking systems. Current methods mainly consist of traditional algorithms and deep learning methods, with deep learning methods such as Convolutional Neural Networks (CNN) improving performance but lacking in capturing global information, making it easy to miss small targets. On the other hand, methods combining CNN and Transformer can handle long-range dependencies but have high computational complexity. The paper proposes a new model structure called Mamba-in-Mamba (MiM-ISTD) for effective and efficient infrared small target detection. MiM-ISTD consists of two Mamba blocks, inner and outer, which can capture both global and local features. It divides the image into "visual sentences" and "visual words", where the outer Mamba block processes global information and the inner Mamba block further explores local information within each "visual sentence" to capture key local features with lower computational cost. Experimental results show that MiM-ISTD achieves superior accuracy and efficiency on the NUAA-SIRST and IRSTD-1k datasets, with a speed improvement of 8 times and a 62.2% reduction in GPU memory usage compared to existing methods. In summary, the paper aims to address the efficiency and accuracy issues in infrared small target detection by introducing the linear complexity Mamba model and making improvements, improving accuracy in small target detection while reducing computational and memory consumption.

MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

Multilevel Interactive Enhanced Network for Infrared Small-Target Detection

4DST-BTMD: an Infrared Small Target Detection Method Based on 4-D Data-Sphered Space

Abmnet: coupling transformer with cnn based on adams-bashforth-moulton method for infrared small target detection

TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

MPANet: Multi-Patch Attention For Infrared Small Target object Detection

IMNN-LWEC: A Novel Infrared Small Target Detection Based on Spatial–Temporal Tensor Model

5-D Spatial-Temporal Information-Based Infrared Small Target Detection in Complex Environments

Cross-Layer Feature Guided Multiscale Infrared Small Target Detection

SCTransNet: Spatial-channel Cross Transformer Network for Infrared Small Target Detection

Feature Preservation and Shape Cues Assist Infrared Small Target Detection

ABC: Attention with Bilinear Correlation for Infrared Small Target Detection

IRSTFormer: A Hierarchical Vision Transformer for Infrared Small Target Detection

Infrared Dim and Small Target Detection Based on Superpixel Segmentation and Spatiotemporal Cluster 4D Fully-Connected Tensor Network Decomposition

Local Convergence Index-Based Infrared Small Target Detection against Complex Scenes

YOLO-ISTD: An infrared small target detection method based on YOLOv5-S

Infrared Small Target Detection Based on Tensor Tree Decomposition and Self-Adaptive Local Prior

An infrared small target detection model via Gather-Excite attention and normalized Wasserstein distance

Multiscale Interactive Attention Network for Infrared Small Target Detection

Guided Attention and Joint Loss for Infrared Dim Small Target Detection

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification