Abstract:Optical high-resolution imagery and OSM data are two important data sources of change detection (CD). Previous related studies focus on utilizing the information in OSM data to aid the CD on optical high-resolution images. This paper pioneers the direct detection of land-cover changes utilizing paired OSM data and optical imagery, thereby expanding the scope of CD tasks. To this end, we propose an object-guided Transformer (ObjFormer) by naturally combining the object-based image analysis (OBIA) technique with the advanced vision Transformer architecture. This combination can significantly reduce the computational overhead in the self-attention module without adding extra parameters or layers. ObjFormer has a hierarchical pseudo-siamese encoder consisting of object-guided self-attention modules that extracts multi-level heterogeneous features from OSM data and optical images; a decoder consisting of object-guided cross-attention modules can recover land-cover changes from the extracted heterogeneous features. Beyond basic binary change detection, this paper raises a new semi-supervised semantic change detection task that does not require any manually annotated land-cover labels to train semantic change detectors. Two lightweight semantic decoders are added to ObjFormer to accomplish this task efficiently. A converse cross-entropy loss is designed to fully utilize negative samples, contributing to the great performance improvement in this task. A large-scale benchmark dataset called OpenMapCD containing 1,287 samples covering 40 regions on six continents is constructed to conduct detailed experiments. The results show the effectiveness of our methods in this new kind of CD task. Additionally, case studies in Japanese cities demonstrate the framework's generalizability and practical potential. The OpenMapCD and source code are available in <a class="link-external link-https" href="https://github.com/ChenHongruixuan/ObjFormer" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of directly detecting land - cover changes using paired OpenStreetMap (OSM) data and optical high - resolution images. Specifically, the paper proposes a new form of change detection (Change Detection, CD), that is, directly detecting general land - cover changes from paired OSM data and optical images, which includes the basic supervised binary change detection (Binary Change Detection, BCD) task and the further semi - supervised semantic change detection (Semantic Change Detection, SCD) task. #### Main problems and challenges 1. **Data heterogeneity**: - There are significant differences in nature between OSM data and optical high - resolution images. OSM data is a vector - based symbolic representation, while optical images are raster - based continuous reflectance values. - This huge gap in data forms poses a challenge to change detection. 2. **Unclear time sequence**: - Traditional change detection tasks involve analyzing pre - event and post - event images, with a clear time sequence and physical meaning. But in the task of this paper, the inputs are OSM data and optical high - resolution images, without a clear time sequence. 3. **Computational complexity**: - The core of the Transformer architecture lies in the self - attention mechanism, which can effectively model non - local relationships between pixels, but also incurs a large computational cost and GPU memory burden. #### Solutions 1. **Propose the ObjFormer framework**: - Combine the object - guided self - attention mechanism (Object - Guided Self - Attention, OGSA) and the advanced visual Transformer architecture, significantly reducing the computational cost of the self - attention module without adding additional parameters or layers. - Through the object - guided method, aggregate multiple pixels into objects, forming the basis for subsequent change analysis, effectively alleviating the "salt - and - pepper noise" problem. 2. **Design the converse cross - entropy loss function (Converse Cross - Entropy, CCE)**: - It is used for semi - supervised semantic change detection tasks and can fully utilize the information of negative samples (change regions), thereby significantly improving the performance of different methods. 3. **Construct the large - scale benchmark dataset OpenMapCD**: - It contains 1,287 map - image pairs, covering 40 regions on six continents, for detailed experiments. - This dataset will be open - sourced to promote subsequent research. #### Main contributions 1. **Propose a new form of change detection**: - Directly detect general land - cover changes from paired optical images and OSM data, including the basic supervised BCD task and the further semi - supervised SCD task. 2. **Propose an object - guided Transformer architecture**: - Accurately detect land - cover changes in two heterogeneous data forms, naturally integrate the self - attention mechanism and OBIA technology, and significantly reduce the computational cost. 3. **Design the converse cross - entropy loss function**: - Effectively mine negative sample information and significantly improve the performance of different methods. 4. **Construct a large - scale multi - modal benchmark dataset**: - The OpenMapCD dataset covers six continents and 40 regions, including seven common land - cover categories, ensuring geographical diversity. Through these methods and contributions, the paper not only solves the challenges in existing change detection tasks but also provides strong support for future related research.

ObjFormer: Learning Land-Cover Changes From Paired OSM Data and Optical High-Resolution Imagery via Object-Guided Transformer

ObjFormer: Learning Land-Cover Changes From Paired OSM Data and Optical High-Resolution Imagery via Object-Guided Transformer

UCDFormer: Unsupervised Change Detection Using a Transformer-Driven Image Translation

Robust change detection for remote sensing images based on temporospatial interactive attention module

Object-Level Double Constrained Method for Land Cover Change Detection

A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images

MDAFormer: Multi-level difference aggregation transformer for change detection of VHR optical imagery

Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)

AdaptFormer: An Adaptive Hierarchical Semantic Approach for Change Detection on Remote Sensing Images

Cross-modal change detection using historical land use maps and current remote sensing images

DiFormer: A Difference Transformer Network for Remote Sensing Change Detection

Global-Local Collaborative Learning Network for Optical Remote Sensing Image Change Detection

CDXFormer: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory

GCFormer: Global Context-Aware Transformer for Remote Sensing Image Change Detection

EfficientCD: A New Strategy For Change Detection Based With Bi-temporal Layers Exchanged

Relation Changes Matter: Cross-Temporal Difference Transformer for Change Detection in Remote Sensing Images

Cross attention is all you need: relational remote sensing change detection with transformer

Semantic-aware transformer with feature integration for remote sensing change detection

Separate Segmentation of Multi-Temporal High-Resolution Remote Sensing Images for Object-Based Change Detection in Urban Area

GeoFormer: A Geometric Representation Transformer for Change Detection