Abstract:With the development of remote sensing satellite technology for Earth observation, remote sensing stereo images have been used for three-dimensional reconstruction in various fields, such as urban planning and construction. However, remote sensing images often contain noise, occluded regions, untextured areas, and repeated textures, which can lead to reduced accuracy in stereo matching and affect the quality of 3D reconstruction results. To reduce the impact of complex scenes in remote sensing images on stereo matching and to ensure both speed and accuracy, we propose a new end-to-end stereo matching network based on convolutional neural networks (CNNs). The proposed stereo matching network can learn features at different scales from the original images and construct cost volumes with varying scales to obtain richer scale information. Additionally, when constructing the cost volume, we introduce negative disparity to adapt to the common occurrence of both negative and non-negative disparities in remote sensing stereo image pairs. For cost aggregation, we employ a 3D convolution-based encoder–decoder structure that allows the network to adaptively aggregate information. Before feature aggregation, we also introduce an attention module to retain more valuable feature information, enhance feature representation, and obtain a higher-quality disparity map. By training on the publicly available US3D dataset, we obtain an accuracy of 1.115 pixels in end-point error (EPE) and 5.32% in the error pixel ratio (D1) on the test dataset, and the inference speed is 92 ms. Comparing our model with existing state-of-the-art models, we achieve higher accuracy, and the network is beneficial for the three-dimensional reconstruction of remote sensing images.

Coatrsnet: Fully Exploiting Convolution and Attention for Stereo Matching by Region Separation

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

A Joint 2D-3D Complementary Network for Stereo Matching

Group-Based Atrous Convolution Stereo Matching Network

Stereo Matching Using Multi-Level Cost Volume and Multi-Scale Feature Constancy

Stereo Matching Method for Remote Sensing Images Based on Attention and Scale Fusion

Deep Stereo Matching With Hysteresis Attention and Supervised Cost Volume Construction

Edge supervision and multi-scale cost volume for stereo matching

Parallax attention stereo matching network based on the improved group-wise correlation stereo network

CRAR: Accelerating Stereo Matching with Cascaded Residual Regression and Adaptive Refinement

EdgeStereo: an Effective Multi-Task Learning Network for Stereo Matching and Edge Detection.

High-Frequency Stereo Matching Network

Multi-Dimensional Cooperative Network for Stereo Matching

EAI-Stereo: Error Aware Iterative Network for Stereo Matching

Multi-scale Cross-form Pyramid Network for Stereo Matching

AANet: Adaptive Aggregation Network for Efficient Stereo Matching

Accurate and Efficient Stereo Matching via Attention Concatenation Volume

Multi-Scale Context Attention Network for Stereo Matching

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Neural Markov Random Field for Stereo Matching

Cascaded Feature Interaction Network for Stereo Matching