Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection

Weiwei Duan,Luping Ji,Shengjia Chen,Sicheng Zhu,Mao Ye
DOI: https://doi.org/10.1109/TGRS.2024.3452175
2024-09-05
Abstract:As a sub-field of object detection, moving infrared small target detection presents significant challenges due to tiny target sizes and low contrast against backgrounds. Currently-existing methods primarily rely on the features extracted only from spatio-temporal domain. Frequency domain has hardly been concerned yet, although it has been widely applied in image processing. To extend feature source domains and enhance feature representation, we propose a new Triple-domain Strategy (Tridos) with the frequency-aware memory enhancement on spatio-temporal domain for infrared small target detection. In this scheme, it effectively detaches and enhances frequency features by a local-global frequency-aware module with Fourier transform. Inspired by human visual system, our memory enhancement is designed to capture the spatial relations of infrared targets among video frames. Furthermore, it encodes temporal dynamics motion features via differential learning and residual enhancing. Additionally, we further design a residual compensation to reconcile possible cross-domain feature mismatches. To our best knowledge, proposed Tridos is the first work to explore infrared target feature learning comprehensively in spatio-temporal-frequency domains. The extensive experiments on three datasets (i.e., DAUB, ITSDT-15K and IRDST) validate that our triple-domain infrared feature learning scheme could often be obviously superior to state-of-the-art ones. Source codes are available at <a class="link-external link-https" href="https://github.com/UESTC-nnLab/Tridos" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the challenges in **Moving Infrared Small Target Detection (MISTD)**. Specifically, the main problems faced by MISTD include: 1. **Tiny target size**: Since the size of infrared targets is very small, they usually lack obvious visual features such as shape and texture. 2. **Low contrast and low signal - to - noise ratio (SNR)**: In a large - scale detection area, targets often have low contrast and low SNR, which makes detection more difficult. 3. **Fast movement and background noise interference**: The fast movement of the target and the background as well as background noise interference can lead to severe infrared background fluctuations and blurry target boundaries. Existing methods mainly rely on features extracted from the spatio - temporal domain while ignoring the information in the frequency domain. However, the frequency domain has been widely used in image processing and can provide rich frequency information, which is helpful for reducing image noise and interference. Therefore, this paper proposes a new Triple - domain Strategy (Tridos), which combines feature learning in the spatio - temporal and frequency domains to enhance the effect of infrared small - target detection. ### Main contributions of the paper 1. **Triple - domain feature learning**: An innovative triple - domain scheme is proposed, which not only extends the traditional spatio - temporal domain feature learning but also introduces frequency - domain features, achieving the fusion and enhancement of spatio - temporal - frequency triple - domain features. 2. **Local - Global Frequency - aware Module**: Based on the Fourier Transform, a Local - Global Frequency - aware Module (LGFM) is developed to extract comprehensive frequency features from local and global perception patterns. 3. **Memory - enhanced Spatial Relationship Module**: Inspired by the human visual system, a Memory - enhanced Spatial Relationship Module (MSR) is designed to model the spatial relationships of small targets between different frames. 4. **Residual Compensation Unit**: A Residual Compensation Unit (RCU) is constructed to eliminate possible feature mismatches between different domains and assist in fusing and enhancing spatio - temporal - frequency features. 5. **Two - view regression loss function**: A new two - view regression loss function is re - defined, which is specifically optimized for model training, especially for the infrared small - target detection task. Through these innovations, the experimental results of the Tridos method proposed in this paper on multiple datasets show that it is significantly superior to the existing state - of - the - art methods.