Abstract:With the emergence of the Smart City concept, the rapid advancement of urban three-dimensional (3D) reconstruction becomes imperative. While current developments in the field of 3D reconstruction have enabled the generation of 3D products such as Digital Surface Models (DSM), challenges persist in accurately reconstructing shadows, handling occlusions, and addressing low-texture areas in very-high-resolution remote sensing images. These challenges often lead to difficulties in calculating satisfactory disparity maps using existing stereo matching methods, thereby reducing the accuracy of 3D reconstruction. This issue is particularly pronounced in urban scenes, which contain numerous super high-rise and densely distributed buildings, resulting in large disparity values and occluded regions in stereo image pairs, and further leading to a large number of mismatched points in the obtained disparity map. In response to these challenges, this paper proposes a method to refine the disparity in urban scenes based on open-source GIS data. First, we register the GIS data with the epipolar-rectified images since there always exists unignorable geolocation errors between them. Specifically, buildings with different heights present different offsets in GIS data registering; thus, we perform multi-modal matching for each building and merge them into the final building mask. Subsequently, a two-layer optimization process is applied to the initial disparity map based on the building mask, encompassing both global and local optimization. Finally, we perform a post-correction on the building facades to obtain the final refined disparity map that can be employed for high-precision 3D reconstruction. Experimental results on SuperView-1, GaoFen-7, and GeoEye satellite images show that the proposed method has the ability to correct the occluded and mismatched areas in the initial disparity map generated by both hand-crafted and deep-learning stereo matching methods. The DSM generated by the refined disparity reduces the average height error from 2.2 m to 1.6 m, which demonstrates superior performance compared with other disparity refinement methods. Furthermore, the proposed method is able to improve the integrity of the target structure and present steeper building facades and complete roofs, which are conducive to subsequent 3D model generation.

Semantic Segmentation of Street Scenes Using Disparity Information

SegStereo: Exploiting Semantic Information for Disparity Estimation

Disparity Estimation Using Multilevel and Global Information

An RGB-D Fusion Based Semantic Segmentation Algorithm Based on Neighborhood Metric Relations

Geometry-Aware Instance Segmentation with Disparity Maps

Semantic Reconstruction based on RGB Image and Sparse Depth

Compensating for Local Ambiguity With Encoder-Decoder in Urban Scene Segmentation

Selecting Optimal Combination of Data Channels for Semantic Segmentation in City Information Modelling (CIM)

Semantic stereo: Integrating piecewise planar stereo with segmentation and classification

Research on Semantic Segmentation Method of Urban Streetscape Image Based on Deep Learning

Disparity Refinement for Stereo Matching of High-Resolution Remote Sensing Images Based on GIS Data

Fusing Geometrical and Visual Information Via Superpoints for the Semantic Segmentation of 3D Road Scenes

MSDC-Net: Multi-Scale Dense and Contextual Networks for Automated Disparity Map for Stereo Matching

A Fusion Network for Semantic Segmentation Using RGB-D Data

Semi-Supervised Semantic Segmentation for Light Field Images Using Disparity Information

Robust 3D Semantic Segmentation Method Based on Multi-Modal Collaborative Learning

Enhancing Feature Fusion with Spatial Aggregation and Channel Fusion for Semantic Segmentation

3D Scene Reconstruction with Sparse LiDAR Data and Monocular Image in Single Frame

SSNet: a joint learning network for semantic segmentation and disparity estimation

Fusion of images and point clouds for the semantic segmentation of large-scale 3D scenes based on deep learning

Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation