Highly Efficient and Unsupervised Framework for Moving Object Detection in Satellite Videos

C. Xiao,W. An,Y. Zhang,Z. Su,M. Li,W. Sheng,M. Pietikäinen,L. Liu
DOI: https://doi.org/10.1109/TPAMI.2024.3409824
2024-11-25
Abstract:Moving object detection in satellite videos (SVMOD) is a challenging task due to the extremely dim and small target characteristics. Current learning-based methods extract spatio-temporal information from multi-frame dense representation with labor-intensive manual labels to tackle SVMOD, which needs high annotation costs and contains tremendous computational redundancy due to the severe imbalance between foreground and background regions. In this paper, we propose a highly efficient unsupervised framework for SVMOD. Specifically, we propose a generic unsupervised framework for SVMOD, in which pseudo labels generated by a traditional method can evolve with the training process to promote detection performance. Furthermore, we propose a highly efficient and effective sparse convolutional anchor-free detection network by sampling the dense multi-frame image form into a sparse spatio-temporal point cloud representation and skipping the redundant computation on background regions. Coping these two designs, we can achieve both high efficiency (label and computation efficiency) and effectiveness. Extensive experiments demonstrate that our method can not only process 98.8 frames per second on 1024x1024 images but also achieve state-of-the-art performance. The relabeled dataset and code are available at <a class="link-external link-https" href="https://github.com/ChaoXiao12/Moving-object-detection-in-satellite-videos-HiEUM" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve several key problems in Satellite Video Moving Object Detection (SVMOD): 1. **Real - time requirements**: Due to the large amount of satellite video data and its time - redundancy, it is crucial to process these videos in real - time to detect moving objects. However, existing methods face challenges in dealing with the time - redundancy of the background and the high sparsity of the foreground, resulting in low efficiency. 2. **High - quality requirements**: Moving objects in satellite videos are usually small, have low contrast, less shape and texture information, and are sensitive to noise. These characteristics increase the difficulty of learning high - quality (accurate and robust) object representations. Therefore, it is very important to develop an effective framework that can balance recall and precision. 3. **High - labeling cost**: Due to the characteristics of moving objects and satellite videos, manual labeling of these objects requires repeated inspection, is prone to generate noisy labels and is costly. It is difficult to obtain a large amount of accurately - labeled training data. Therefore, developing a labeling - efficient solution is very valuable for practical applications. To address the above challenges, the authors propose an efficient and unsupervised framework named HiEUM (Highly Efficient and Unsupervised Moving object detection), aiming to solve the following problems: - **Reduce labeling cost**: Generate initial pseudo - labels through the method of pseudo - label self - evolution, and continuously update these labels during the training process, thereby reducing the need for manual labeling. - **Improve computational efficiency**: Design a sparse - convolution anchor - free detection network. By taking advantage of the sparsity of moving targets and the high redundancy of the background, sample the dense multi - frame image representation into a sparse spatio - temporal point - cloud representation, and skip the redundant calculations in the background area. - **Enhance detection performance**: By combining sparse representation and long - time modeling, significantly improve the detection efficiency and accuracy. Specifically, the main contributions of the paper include: - Propose an unsupervised framework with self - evolving pseudo - labels, enabling the pseudo - labels to be continuously optimized during the training process. - Design a sparse - convolution anchor - free detection network, and for the first time attempt to handle SVMOD through sparse spatio - temporal point - cloud representation. - Provide a re - labeled small - scale moving vehicle data set, and establish a new benchmark test to verify the detection performance of small and dark moving targets. In summary, this paper aims to develop an SVMOD method that is both efficient and effective to address the problems of high - labeling cost, waste of computational resources, and insufficient detection performance in existing methods.