Abstract:Infrared target detection has important applications in rescue and Earth observation. However, the disadvantages of low signal-to-clutter ratios and severe background noise interference for infrared imaging pose great challenges to the detection technology for infrared dim targets. Most algorithms only extract features from the spatial domain, while the lack of temporal information results in unsatisfactory detection performance when the difference between the target and the background is not significant enough. Although some methods utilize temporal information in the detection process, these nonlearning-based methods fail to incorporate the complex and changeable background, and need to adjust parameters according to the input. To tackle this problem, we proposed a Spatio-Temporal Differential Multiscale Attention Network (STDMANet), a learning-based method for multiframe infrared small target detection in this article. Our STDMANet first used the temporal multiscale feature extractor to obtain spatiotemporal (ST) features from multiple time scales and then resorted them to the spatial multiscale feature refiner to enhance the semantics of ST features on the premise of maintaining the position information of small targets. Finally, unlike other learning-based networks that require binary masks for training, we designed a mask-weighted heatmap loss to train the network with only center point annotations. At the same time, the proposed loss can balance missing detection and false alarm, so as to achieve a good balance between finding the targets and suppressing the background. Extensive quantitative experiments on public datasets validated that the proposed STDMANet could improve the metric ${F_{1}}$ score up to 0.9744, surpassing the state-of-the-art baseline by 0.1682. Qualitative experiments show the proposed method could stably extract foreground moving targets from video sequences with various backgrounds while reducing false alarm rate better than other recent baseline methods.

LMAFormer: Local Motion Aware Transformer for Small Moving Infrared Target Detection

A Fourier-Transform-Based Framework with Asymptotic Attention for Mobile Thermal InfraRed Object Detection

LEC-MTNN: a Novel Multi-Frame Infrared Small Target Detection Method Based on Spatial-Temporal Patch-Tensor

IMNN-LWEC: A Novel Infrared Small Target Detection Based on Spatial–Temporal Tensor Model

5-D Spatial-Temporal Information-Based Infrared Small Target Detection in Complex Environments

Spatial-Temporal Tensor Representation Learning with Priors for Infrared Small Target Detection

Infrared Small Target Detection Based on Local Contrast Vector and Signed Normalization

Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds

Infrared Small Target Detection Based on Multiscale Local Contrast Measure Using Local Energy Factor

Multiscale Progressive Fusion Filter Network for Infrared Small Target Detection

Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

SSTNet: Sliced Spatio-Temporal Network With Cross-Slice ConvLSTM for Moving Infrared Dim-Small Target Detection

Multi-Scale Direction-Aware Network for Infrared Small Target Detection

STDMANet: Spatio-Temporal Differential Multiscale Attention Network for Small Moving Infrared Target Detection

Strengthened Local Feature-based Spatial-Temporal Tensor Model for Infrared Dim and Small Target Detection

Improved Dense Nested Attention Network Based on Transformer for Infrared Small Target Detection

Exploring reliable infrared object tracking with spatio-temporal fusion transformer

LRCFormer: lightweight transformer based radar-camera fusion for 3D target detection

Multi-Stage Multi-Scale Local Feature Fusion for Infrared Small Target Detection

Infrared Moving Small Target Detection Based on Space–Time Combination in Complex Scenes

Multi-scale feature fusion attention network for infrared small target detection