LMAFormer: Local Motion Aware Transformer for Small Moving Infrared Target Detection

Yuanxin Huang,Xiyang Zhi,Jianming Hu,Lijian Yu,Qichao Han,Wenbin Chen,Wei Zhang
DOI: https://doi.org/10.1109/tgrs.2024.3502663
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:In temporal infrared small target detection, it is crucial to leverage the disparities in spatio-temporal characteristics between the target and the background to distinguish the former. However, remote imaging and the relative motion between the detection platform and the background cause significant coupling of spatio-temporal characteristics, making target detection highly challenging. To address these challenges, we propose a network named LMAFormer. First, we introduce a local motion-aware spatio-temporal attention mechanism that aligns and enhances multi-frame features to extract local spatio-temporal salient features of targets while avoiding interference from moving backgrounds. Second, we employ a multi-scale fusion transformer encoder that computes self-attention weights across and within scales during encoding, to establish multi-scale correlations among different regions of temporal images, enabling motion background modeling. Lastly, we propose a multi-frame joint query decoder. The shallowest feature map after multi-scale feature propagation is mapped to initial query weights, which are refined through grouped convolutions to generate grouped query vectors. These are jointly optimized to encapsulate rich multi-frame details, strengthening motion background modeling and target feature representation, improving prediction accuracy. Experimental results on the NUDT-MIRSDT, IRDST and the established TSIRMT datasets demonstrate that our network outperforms state-of-the-art (SOTA) methods. Our code and dataset will be available at https://github.com/lifier/LMAFormer.
What problem does this paper attempt to address?