ObjFormer: Learning Land-Cover Changes From Paired OSM Data and Optical High-Resolution Imagery via Object-Guided Transformer

Hongruixuan Chen,Cuiling Lan,Jian Song,Clifford Broni-Bediako,Junshi Xia,Naoto Yokoya
DOI: https://doi.org/10.1109/TGRS.2024.3410389
2024-06-26
Abstract:Optical high-resolution imagery and OSM data are two important data sources of change detection (CD). Previous related studies focus on utilizing the information in OSM data to aid the CD on optical high-resolution images. This paper pioneers the direct detection of land-cover changes utilizing paired OSM data and optical imagery, thereby expanding the scope of CD tasks. To this end, we propose an object-guided Transformer (ObjFormer) by naturally combining the object-based image analysis (OBIA) technique with the advanced vision Transformer architecture. This combination can significantly reduce the computational overhead in the self-attention module without adding extra parameters or layers. ObjFormer has a hierarchical pseudo-siamese encoder consisting of object-guided self-attention modules that extracts multi-level heterogeneous features from OSM data and optical images; a decoder consisting of object-guided cross-attention modules can recover land-cover changes from the extracted heterogeneous features. Beyond basic binary change detection, this paper raises a new semi-supervised semantic change detection task that does not require any manually annotated land-cover labels to train semantic change detectors. Two lightweight semantic decoders are added to ObjFormer to accomplish this task efficiently. A converse cross-entropy loss is designed to fully utilize negative samples, contributing to the great performance improvement in this task. A large-scale benchmark dataset called OpenMapCD containing 1,287 samples covering 40 regions on six continents is constructed to conduct detailed experiments. The results show the effectiveness of our methods in this new kind of CD task. Additionally, case studies in Japanese cities demonstrate the framework's generalizability and practical potential. The OpenMapCD and source code are available in <a class="link-external link-https" href="https://github.com/ChenHongruixuan/ObjFormer" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Artificial Intelligence,Computers and Society,Multimedia
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of directly detecting land - cover changes using paired OpenStreetMap (OSM) data and optical high - resolution images. Specifically, the paper proposes a new form of change detection (Change Detection, CD), that is, directly detecting general land - cover changes from paired OSM data and optical images, which includes the basic supervised binary change detection (Binary Change Detection, BCD) task and the further semi - supervised semantic change detection (Semantic Change Detection, SCD) task. #### Main problems and challenges 1. **Data heterogeneity**: - There are significant differences in nature between OSM data and optical high - resolution images. OSM data is a vector - based symbolic representation, while optical images are raster - based continuous reflectance values. - This huge gap in data forms poses a challenge to change detection. 2. **Unclear time sequence**: - Traditional change detection tasks involve analyzing pre - event and post - event images, with a clear time sequence and physical meaning. But in the task of this paper, the inputs are OSM data and optical high - resolution images, without a clear time sequence. 3. **Computational complexity**: - The core of the Transformer architecture lies in the self - attention mechanism, which can effectively model non - local relationships between pixels, but also incurs a large computational cost and GPU memory burden. #### Solutions 1. **Propose the ObjFormer framework**: - Combine the object - guided self - attention mechanism (Object - Guided Self - Attention, OGSA) and the advanced visual Transformer architecture, significantly reducing the computational cost of the self - attention module without adding additional parameters or layers. - Through the object - guided method, aggregate multiple pixels into objects, forming the basis for subsequent change analysis, effectively alleviating the "salt - and - pepper noise" problem. 2. **Design the converse cross - entropy loss function (Converse Cross - Entropy, CCE)**: - It is used for semi - supervised semantic change detection tasks and can fully utilize the information of negative samples (change regions), thereby significantly improving the performance of different methods. 3. **Construct the large - scale benchmark dataset OpenMapCD**: - It contains 1,287 map - image pairs, covering 40 regions on six continents, for detailed experiments. - This dataset will be open - sourced to promote subsequent research. #### Main contributions 1. **Propose a new form of change detection**: - Directly detect general land - cover changes from paired optical images and OSM data, including the basic supervised BCD task and the further semi - supervised SCD task. 2. **Propose an object - guided Transformer architecture**: - Accurately detect land - cover changes in two heterogeneous data forms, naturally integrate the self - attention mechanism and OBIA technology, and significantly reduce the computational cost. 3. **Design the converse cross - entropy loss function**: - Effectively mine negative sample information and significantly improve the performance of different methods. 4. **Construct a large - scale multi - modal benchmark dataset**: - The OpenMapCD dataset covers six continents and 40 regions, including seven common land - cover categories, ensuring geographical diversity. Through these methods and contributions, the paper not only solves the challenges in existing change detection tasks but also provides strong support for future related research.