UDHF2-Net: Uncertainty-diffusion-model-based High-Frequency TransFormer Network for Remotely Sensed Imagery Interpretation

Pengfei Zhang,Chang Li,Yongjun Zhang,Rongjun Qin
2024-10-31
Abstract:Remotely sensed imagery interpretation (RSII) faces the three major problems: (1) objective representation of spatial distribution patterns; (2) edge uncertainty problem caused by downsampling encoder and intrinsic edge noises (e.g., mixed pixel and edge occlusion etc.); and (3) false detection problem caused by geometric registration error in change detection. To solve the aforementioned problems, uncertainty-diffusion-model-based high-Frequency TransFormer network (UDHF2-Net) is the first to be proposed, whose superiorities are as follows: (1) a spatially-stationary-and-non-stationary high-frequency connection paradigm (SHCP) is proposed to enhance the interaction of spatially frequency-wise stationary and non-stationary features to yield high-fidelity edge extraction result. Inspired by HRFormer, SHCP proposes high-frequency-wise stream to replace high-resolution-wise stream in HRFormer through the whole encoder-decoder process with parallel frequency-wise high-to-low streams, so it improves the edge extraction accuracy by continuously remaining high-frequency information; (2) a mask-and-geo-knowledge-based uncertainty diffusion module (MUDM), which is a self-supervised learning strategy, is proposed to improve the edge accuracy of extraction and change detection by gradually removing the simulated spectrum noises based on geo-knowledge and the generated diffused spectrum noises; (3) a frequency-wise semi-pseudo-Siamese UDHF2-Net is the first to be proposed to balance accuracy and complexity for change detection. Besides the aforementioned spectrum noises in semantic segmentation, MUDM is also a self-supervised learning strategy to effectively reduce the edge false change detection from the generated imagery with geometric registration error.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve three main problems in remote sensing image interpretation (RSII): 1. **Objective Representation of Spatial Distribution Patterns**: - The problem lies in how to accurately represent spatial distribution patterns in the co - existence of spatial stationarity and non - stationarity. This involves effectively capturing and processing the features of different regions in the image to ensure that both global and local information can be fully expressed. 2. **Edge Uncertainty Problem**: - This problem is mainly caused by the down - sampling operation of the encoder and the inherent edge noise (such as mixed pixels, edge occlusion, etc.). Down - sampling will lead to the loss of edge details, and edge noise will increase the difficulty of edge extraction, resulting in positional and semantic uncertainty. 3. **False Detection Problem in Change Detection**: - This is due to geometric registration errors. When the registration of bi - temporal images is not precise, false changes will be generated, thus increasing the false detection rate and reducing the accuracy of change detection. To solve the above problems, the paper proposes a high - frequency transformation network based on the uncertainty diffusion model (UDHF2 - Net), with the following advantages: 1. **Spatial Stationarity and Non - stationarity High - Frequency Connection Paradigm (SHCP)**: - A spatial stationarity and non - stationarity high - frequency connection paradigm (SHCP) is proposed to enhance the interaction between spatially stable and non - stable features, so as to achieve high - fidelity edge extraction results. By introducing parallel high - frequency - to - low - frequency flows throughout the encoding - decoding process, SHCP can continuously maintain high - frequency information and improve the accuracy of edge extraction. 2. **Mask - and - Geographical - Knowledge - Based Uncertainty Diffusion Module (MUDM)**: - A self - supervised learning strategy, the mask - and - geographical - knowledge - based uncertainty diffusion module (MUDM), is proposed. By gradually removing the simulated spectral noise and diffusing spectral noise generated based on geographical knowledge, the accuracy of edge extraction and change detection is improved. 3. **Frequency - Domain Semi - Pseudo - Siamese UDHF2 - Net**: - For the first time, a frequency - domain semi - pseudo - siamese UDHF2 - Net architecture is proposed to balance the accuracy and complexity in change detection tasks. MUDM can not only effectively reduce the spectral noise in semantic segmentation, but also effectively reduce the false edge change detection caused by geometric registration errors through gradual denoising. These innovations together improve the accuracy of remote sensing image interpretation, especially in edge extraction and change detection.