Abstract:Despite the significant progress made in deep learning-based stereo matching, the accuracy of these methods significantly decreases when faced with challenges such as occlusions, reflections, textureless areas, and scale variations. In this paper, we propose MSCANet, a novel stereo matching network that integrates multi-scale inputs and context-aware aggregation ability. MSCANet effectively integrates rich multi-scale feature information and exhibits context-aware capability, thereby enabling it to achieve superior performance. Firstly, a multi-scale aware fusion module is designed to efficiently incorporate more comprehensive global context features at different scales, which allows the model to enhance its ability to generalize across images of varying scales. Secondly, a novel V-shaped encoder/decoder module is developed to effectively exploit the rich feature information. In the encoding stage, a 3D squeeze-and-excitation block is introduced to facilitate adaptively recalibration of learned feature maps. This block effectively suppresses irrelevant features while enhancing useful features, which improved efficiency and accuracy in disparity prediction. Additionally, a 3D context-aware decode block is designed to effectively utilize global context features to restore the original image structure during the decoding stage. Moreover, the high-level feature maps can be employed to augment low-level feature maps by incorporating more detailed information to avoid the side effects caused by the loss of information during the encoding process. Extensive ablation experiments and comparative experiments were conducted on Scene Flow dataset, KITTI2012 and KITTI2015 datasets to validate the effectiveness of each proposed module. The experimental results demonstrate MSCANet achieves competitive performance and offers a more straightforward and efficient model design, as well as faster inference speed.

Deep Contextual Structure and Semantic Feature Enhancement Stereo Network

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Deep spatial and discriminative feature enhancement network for stereo matching

SemStereo: Semantic-Constrained Stereo Matching Network for Remote Sensing

Efficient and Accurate Stereo Matching Via Guided Deformable Aggregation

Multi-Scale Context Attention Network for Stereo Matching

Monocular Contextual Constraint for Stereo Matching with Adaptive Weights Assignment

Exploiting Semantic and Boundary Information for Stereo Matching

EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching

Multi-scale inputs and context-aware aggregation network for stereo matching

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

Multi-Dimensional Cooperative Network for Stereo Matching

Superpixel Guided Network for Three-Dimensional Stereo Matching

Depth Estimation Using an Improved Stereo Network

Edge supervision and multi-scale cost volume for stereo matching

Ghost-Stereo: GhostNet-based Cost Volume Enhancement and Aggregation for Stereo Matching Networks

GA-Stereo: A Real-Time Stereo Network Based on the Gradient Flow Shunting Strategy and the Atrous Pyramid Network

Depth-aware Volume Attention for Texture-less Stereo Matching

A Deep Semantic Segmentation Network with Semantic and Contextual Refinements

A Dual Branch Multiscale Stereo Matching Network for High-Resolution Satellite Remote Sensing Images

Global Matching-Optimization Network for Stereo Depth Estimation