Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework

Xinyi Ying,Li Liu,Zaipin Lin,Yangsi Shi,Yingqian Wang,Ruojing Li,Xu Cao,Boyang Li,Shilin Zhou
2024-10-04
Abstract:Multi-frame infrared small target (MIRST) detection in satellite videos is a long-standing, fundamental yet challenging task for decades, and the challenges can be summarized as: First, extremely small target size, highly complex clutters & noises, various satellite motions result in limited feature representation, high false alarms, and difficult motion analyses. Second, the lack of large-scale public available MIRST dataset in satellite videos greatly hinders the algorithm development. To address the aforementioned challenges, in this paper, we first build a large-scale dataset for MIRST detection in satellite videos (namely IRSatVideo-LEO), and then develop a recurrent feature refinement (RFR) framework as the baseline method. Specifically, IRSatVideo-LEO is a semi-simulated dataset with synthesized satellite motion, target appearance, trajectory and intensity, which can provide a standard toolbox for satellite video generation and a reliable evaluation platform to facilitate the algorithm development. For baseline method, RFR is proposed to be equipped with existing powerful CNN-based methods for long-term temporal dependency exploitation and integrated motion compensation & MIRST detection. Specifically, a pyramid deformable alignment (PDA) module and a temporal-spatial-frequency modulation (TSFM) module are proposed to achieve effective and efficient feature alignment, propagation, aggregation and refinement. Extensive experiments have been conducted to demonstrate the effectiveness and superiority of our scheme. The comparative results show that ResUNet equipped with RFR outperforms the state-of-the-art MIRST detection methods. Dataset and code are released at <a class="link-external link-https" href="https://github.com/XinyiYing/RFR" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper mainly solves the long - existing, fundamental, and challenging problems of multi - frame infrared small target (MIRST) detection in satellite videos. Specifically, the challenges of MIRST detection in satellite videos can be summarized as follows: 1. **Extremely small target size**: Due to long - distance imaging systems and optical diffraction effects, targets usually appear as small dots or diffraction spots, lacking geometric features such as contours, shapes, and textures. 2. **Highly complex clutter and noise**: Satellite videos contain various complex background clutters (such as earth background clutter, star clutter, and cloud clutter) and severe sensor noises (such as random thermal noise and non - uniform device noise), resulting in a low target - to - clutter ratio (SCR) of the target signal, and being easily submerged in clutter and noise. 3. **Diverse compound satellite motions**: The motions of satellites such as translation, pitch, yaw, roll, jitter, and scheduling greatly reduce the imaging quality, leading to weakened target intensity and blurred contours. In addition, the coupled motion of the target and the background increases the difficulty of motion extraction and utilization. 4. **Lack of open - source datasets**: The lack of large - scale publicly available datasets for MIRST detection in satellite videos seriously hinders the development of algorithms. To address these challenges, the authors have taken the following measures: - **Construct the IRSatVideo - LEO dataset**: This is the first large - scale dataset for MIRST detection in satellite videos, including 200 sequences with a total of 91,366 frames and with mask annotations. This dataset is semi - simulated, based on real satellite images and synthesized satellite motions, target appearances, trajectories, and intensities. - **Propose the Recursive Feature Refinement (RFR) framework**: As a baseline method, the RFR framework combines existing powerful CNN methods for long - term dependency mining and integrates motion compensation with MIRST detection. The RFR framework contains the Pyramid Deformation Alignment (PDA) module and the Space - Time - Frequency Modulation (TSFM) module to achieve effective feature alignment, propagation, aggregation, and refinement. Through these measures, the paper aims to provide a standard toolbox for satellite video generation and a reliable evaluation platform to promote algorithm development, thereby improving the performance of MIRST detection in satellite videos. ### Formula display The formulas involved in the article are as follows: 1. **Global background sequence generation**: \[ I_{GB}^t = I_{GB}\otimes H(\alpha_t,\beta_t,\gamma_t) \] where \(I_{GB}\) is the global background image, \(I_{GB}^t\) is the global background sequence, \(\alpha_t,\beta_t,\gamma_t\) are the pitch angle, yaw angle, and roll angle of the \(t\)-th frame respectively, \(H(\cdot)\) represents the homography transformation, and \(\otimes\) represents matrix multiplication. 2. **Local background sequence cropping**: \[ I_{LB}^t=\text{Crop}(I_{GB}^t,x_B^t,y_B^t,H_0,W_0) \] where \(I_{LB}^t\) is the local background sequence, and \(\text{Crop}\) represents cropping the global background sequence \(I_{GB}^t\) according to the 2D satellite position \(x_B^t,y_B^t\) and the predefined field of view \(H_0,W_0\). 3. **Target template sequence generation**: \[ I_{nt}^{\text{int}} = G_{nt}\odot E_{nt} \] \[ E_{nt}=[\text{scr}\times\sigma(T_{LB}^1)+\mu(T_{LB}^1)]\times(1 + a_{nt}) \] where \(G_{nt}\) is the target appearance sequence, \(E_{nt}\) is the target intensity sequence, \(\mu(T_{LB}^1)\) and \(\sigma(T_{LB}^1)\) are the mean and standard deviation of the target local background in the first frame respectively.