Lost in UNet: Improving Infrared Small Target Detection by Underappreciated Local Features

Wuzhou Quan,Wei Zhao,Weiming Wang,Haoran Xie,Fu Lee Wang,Mingqiang Wei
2024-06-19
Abstract:Many targets are often very small in infrared images due to the long-distance imaging meachnism. UNet and its variants, as popular detection backbone networks, downsample the local features early and cause the irreversible loss of these local features, leading to both the missed and false detection of small targets in infrared images. We propose HintU, a novel network to recover the local features lost by various UNet-based methods for effective infrared small target detection. HintU has two key contributions. First, it introduces the "Hint" mechanism for the first time, i.e., leveraging the prior knowledge of target locations to highlight critical local features. Second, it improves the mainstream UNet-based architecture to preserve target pixels even after downsampling. HintU can shift the focus of various networks (e.g., vanilla UNet, UNet++, UIUNet, MiM+, and HCFNet) from the irrelevant background pixels to a more restricted area from the beginning. Experimental results on three datasets NUDT-SIRST, SIRSTv2 and IRSTD1K demonstrate that HintU enhances the performance of existing methods with only an additional 1.88 ms cost (on RTX Titan). Additionally, the explicit constraints of HintU enhance the generalization ability of UNet-based methods. Code is available at <a class="link-external link-https" href="https://github.com/Wuzhou-Quan/HintU" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in small - target detection in infrared images. In particular, when using UNet and its variants as the detection backbone network, due to local feature loss caused by early down - sampling, problems of missed detection and false detection of small targets occur. Specifically, the paper focuses on the following aspects: 1. **Small size**: Due to the long - distance imaging mechanism, targets in infrared images usually occupy only a few to dozens of pixels. In most infrared small - target detection datasets, more than 50% of the targets are less than 0.05% of the total number of input image pixels in size. 2. **Cluttered background**: Infrared images are often full of a large amount of noise and background clutter, making targets easily masked. 3. **Unpredictable types**: The target types are dynamic, and their shapes and sizes vary greatly in different scenarios. To address these challenges, the paper proposes HintU, a new network architecture, which aims to recover the lost local features in various UNet - based methods, thereby improving the performance of infrared small - target detection. The main contributions of HintU include: 1. **Introducing the "Hint" mechanism**: Using prior knowledge of target locations to highlight key local features. 2. **Improving the mainstream UNet architecture**: It can retain target pixels even after down - sampling, enabling the network to focus on smaller areas from the very beginning instead of irrelevant background pixels. Through these improvements, HintU can significantly improve the performance of existing methods on multiple datasets, with only an additional computational cost of 1.88 milliseconds (on RTX Titan). In addition, the explicit constraints of HintU also enhance the generalization ability of UNet - based methods.