P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation

Qi Zhang,Guohua Geng,Longquan Yan,Pengbo Zhou,Zhaodi Li,Kang Li,Qinglin Liu
2024-07-24
Abstract:Diffusion models and multi-scale features are essential components in semantic segmentation tasks that deal with remote-sensing images. They contribute to improved segmentation boundaries and offer significant contextual information. U-net-like architectures are frequently employed in diffusion models for segmentation tasks. These architectural designs include dense skip connections that may pose challenges for interpreting intermediate features. Consequently, they might not efficiently convey semantic information throughout various layers of the encoder-decoder architecture. To address these challenges, we propose a new model for semantic segmentation known as the diffusion model with parallel multi-scale branches. This model consists of Parallel Multiscale Diffusion modules (P-MSDiff) and a Cross-Bridge Linear Attention mechanism (CBLA). P-MSDiff enhances the understanding of semantic information across multiple levels of granularity and detects repetitive distribution data through the integration of recursive denoising branches. It further facilitates the amalgamation of data by connecting relevant branches to the primary framework to enable concurrent denoising. Furthermore, within the interconnected transformer architecture, the LA module has been substituted with the CBLA module. This module integrates a semidefinite matrix linked to the query into the dot product computation of keys and values. This integration enables the adaptation of queries within the LA framework. This adjustment enhances the structure for multi-head attention computation, leading to enhanced network performance and CBLA is a plug-and-play module. Our model demonstrates superior performance based on the J1 metric on both the UAVid and Vaihingen Building datasets, showing improvements of 1.60% and 1.40% over strong baseline models, respectively.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of semantic segmentation in remote sensing images. Specifically: 1. **Main Problem**: The paper focuses on how to improve existing diffusion models to enhance performance in remote sensing image segmentation tasks. Traditional diffusion models face challenges when dealing with multi-scale features, especially in integrating semantic information from different levels. 2. **Proposed Method**: To tackle these challenges, the authors propose a new model—Parallel Multi-Scale Diffusion (P-MSDiff), which includes a Parallel Multi-Scale Diffusion module (P-MSDiff) and a Cross-Bridge Linear Attention mechanism (CBLA). P-MSDiff enhances the understanding of semantic information at different granularity levels through the integration of recursive denoising branches and detects redundantly distributed data. CBLA, on the other hand, dynamically improves the weight allocation of the multi-head attention mechanism by introducing a query self-attention mechanism, thereby enhancing the overall performance of the network. 3. **Experimental Results**: On the UA Vid and Vaihingen Building datasets, the model achieved a performance improvement of 1.60% and 1.40% respectively based on the J1 metric, outperforming the baseline models. In summary, the main goal of the paper is to improve model performance in remote sensing image segmentation tasks by introducing a new parallel multi-scale diffusion structure and an improved attention mechanism.