UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images

Lulin Li,Ben Chen,Xuechao Zou,Junliang Xing,Pin Tao
2024-09-06
Abstract:Owing to the diverse geographical environments, intricate landscapes, and high-density settlements, the automatic identification of urban village boundaries using remote sensing images is a highly challenging task. This paper proposes a novel and efficient neural network model called UV-Mamba for accurate boundary detection in high-resolution remote sensing images. UV-Mamba mitigates the memory loss problem in long sequence modeling, which arises in state space model (SSM) with increasing image size, by incorporating deformable convolutions (DCN). Its architecture utilizes an encoder-decoder framework, includes an encoder with four deformable state space augmentation (DSSA) blocks for efficient multi-level semantic extraction and a decoder to integrate the extracted semantic information. We conducted experiments on the Beijing and Xi'an datasets, and the results show that UV-Mamba achieves state-of-the-art performance. Specifically, our model achieves 73.3% and 78.1% IoU on the Beijing and Xi'an datasets, respectively, representing improvements of 1.2% and 3.4% IoU over the previous best model, while also being 6x faster in inference speed and 40x smaller in parameter count. Source code and pre-trained models are available in the supplementary material.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to address the problem of automatic identification of urban village boundaries in high-resolution remote sensing images. Specifically, urban villages, due to their unique architectural features (such as high density, narrow streets, and diverse building forms) and complex geographical environments, make the automatic identification of their boundaries extremely challenging. Traditional information collection methods mainly rely on manual field surveys, which are time-consuming and labor-intensive. To overcome these challenges, the paper proposes a novel and efficient neural network model named UV-Mamba for precise boundary detection in high-resolution remote sensing images. UV-Mamba enhances the state space model (SSM) by incorporating deformable convolutions (DCN) to address the memory loss issue in long sequence modeling. Additionally, the model utilizes an encoder-decoder framework, comprising four deformable state space enhancement blocks, to efficiently extract multi-level semantic information and integrate this information through the decoder. Experimental results show that UV-Mamba achieves state-of-the-art performance when processing the Beijing and Xi'an datasets, with improvements of 1.2% and 3.4% in Intersection over Union (IoU) respectively, along with faster inference speed and fewer parameters.