Abstract:Multi-frame infrared small target (MIRST) detection in satellite videos is a long-standing, fundamental yet challenging task for decades, and the challenges can be summarized as: First, extremely small target size, highly complex clutters & noises, various satellite motions result in limited feature representation, high false alarms, and difficult motion analyses. Second, the lack of large-scale public available MIRST dataset in satellite videos greatly hinders the algorithm development. To address the aforementioned challenges, in this paper, we first build a large-scale dataset for MIRST detection in satellite videos (namely IRSatVideo-LEO), and then develop a recurrent feature refinement (RFR) framework as the baseline method. Specifically, IRSatVideo-LEO is a semi-simulated dataset with synthesized satellite motion, target appearance, trajectory and intensity, which can provide a standard toolbox for satellite video generation and a reliable evaluation platform to facilitate the algorithm development. For baseline method, RFR is proposed to be equipped with existing powerful CNN-based methods for long-term temporal dependency exploitation and integrated motion compensation & MIRST detection. Specifically, a pyramid deformable alignment (PDA) module and a temporal-spatial-frequency modulation (TSFM) module are proposed to achieve effective and efficient feature alignment, propagation, aggregation and refinement. Extensive experiments have been conducted to demonstrate the effectiveness and superiority of our scheme. The comparative results show that ResUNet equipped with RFR outperforms the state-of-the-art MIRST detection methods. Dataset and code are released at <a class="link-external link-https" href="https://github.com/XinyiYing/RFR" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper mainly solves the long - existing, fundamental, and challenging problems of multi - frame infrared small target (MIRST) detection in satellite videos. Specifically, the challenges of MIRST detection in satellite videos can be summarized as follows: 1. **Extremely small target size**: Due to long - distance imaging systems and optical diffraction effects, targets usually appear as small dots or diffraction spots, lacking geometric features such as contours, shapes, and textures. 2. **Highly complex clutter and noise**: Satellite videos contain various complex background clutters (such as earth background clutter, star clutter, and cloud clutter) and severe sensor noises (such as random thermal noise and non - uniform device noise), resulting in a low target - to - clutter ratio (SCR) of the target signal, and being easily submerged in clutter and noise. 3. **Diverse compound satellite motions**: The motions of satellites such as translation, pitch, yaw, roll, jitter, and scheduling greatly reduce the imaging quality, leading to weakened target intensity and blurred contours. In addition, the coupled motion of the target and the background increases the difficulty of motion extraction and utilization. 4. **Lack of open - source datasets**: The lack of large - scale publicly available datasets for MIRST detection in satellite videos seriously hinders the development of algorithms. To address these challenges, the authors have taken the following measures: - **Construct the IRSatVideo - LEO dataset**: This is the first large - scale dataset for MIRST detection in satellite videos, including 200 sequences with a total of 91,366 frames and with mask annotations. This dataset is semi - simulated, based on real satellite images and synthesized satellite motions, target appearances, trajectories, and intensities. - **Propose the Recursive Feature Refinement (RFR) framework**: As a baseline method, the RFR framework combines existing powerful CNN methods for long - term dependency mining and integrates motion compensation with MIRST detection. The RFR framework contains the Pyramid Deformation Alignment (PDA) module and the Space - Time - Frequency Modulation (TSFM) module to achieve effective feature alignment, propagation, aggregation, and refinement. Through these measures, the paper aims to provide a standard toolbox for satellite video generation and a reliable evaluation platform to promote algorithm development, thereby improving the performance of MIRST detection in satellite videos. ### Formula display The formulas involved in the article are as follows: 1. **Global background sequence generation**: \[ I_{GB}^t = I_{GB}\otimes H(\alpha_t,\beta_t,\gamma_t) \] where \(I_{GB}\) is the global background image, \(I_{GB}^t\) is the global background sequence, \(\alpha_t,\beta_t,\gamma_t\) are the pitch angle, yaw angle, and roll angle of the \(t\)-th frame respectively, \(H(\cdot)\) represents the homography transformation, and \(\otimes\) represents matrix multiplication. 2. **Local background sequence cropping**: \[ I_{LB}^t=\text{Crop}(I_{GB}^t,x_B^t,y_B^t,H_0,W_0) \] where \(I_{LB}^t\) is the local background sequence, and \(\text{Crop}\) represents cropping the global background sequence \(I_{GB}^t\) according to the 2D satellite position \(x_B^t,y_B^t\) and the predefined field of view \(H_0,W_0\). 3. **Target template sequence generation**: \[ I_{nt}^{\text{int}} = G_{nt}\odot E_{nt} \] \[ E_{nt}=[\text{scr}\times\sigma(T_{LB}^1)+\mu(T_{LB}^1)]\times(1 + a_{nt}) \] where \(G_{nt}\) is the target appearance sequence, \(E_{nt}\) is the target intensity sequence, \(\mu(T_{LB}^1)\) and \(\sigma(T_{LB}^1)\) are the mean and standard deviation of the target local background in the first frame respectively.

Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework

MRF3Net: Infrared Small Target Detection Using Multi-Receptive Field Perception and Effective Feature Fusion

One-Stage Cascade Refinement Networks for Infrared Small Target Detection

Infrared Small Target Detection Via Modified Random Walks

Infrared Small Target Detection Based on Multiscale Local Contrast Measure Using Local Energy Factor

An infrared small target detection method using coordinate attention and feature fusion

Dense Nested Attention Network for Infrared Small Target Detection

CCRANet: A Two-Stage Local Attention Network for Single-Frame Low-Resolution Infrared Small Target Detection

Infrared Small Target Detection Based on Local Contrast Vector and Signed Normalization

Infrared Small Target Detection based on Adjustable Sensitivity Strategy and Multi-Scale Fusion

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

Single-Point Supervised High-Resolution Dynamic Network for Infrared Small Target Detection

Multi-Stage Multi-Scale Local Feature Fusion for Infrared Small Target Detection

Exploring Feature Compensation and Cross-level Correlation for Infrared Small Target Detection

Single-Frame Infrared Small Target Detection by High Local Variance, Low-Rank and Sparse Decomposition

Toward Dense Moving Infrared Small Target Detection: New Datasets and Baseline

ISPANet: A Pyramid Self-Attention Network for Single-Frame High-Resolution Infrared Small Target Detection With a Large-Scale Dataset SHR-IRST

SFFNet: Shallow Feature Fusion Network Based on Detection Framework for Infrared Small Target Detection

Robust Unsupervised Multifeature Representation for Infrared Small Target Detection

Background Semantics Matter: Cross-Task Feature Exchange Network for Clustered Infrared Small Target Detection With Sky-Annotated Dataset

Multilevel Interactive Enhanced Network for Infrared Small-Target Detection