Abstract:In recent years, convolutional-neural-network based stereo matching methods have achieved significant gains compared to conventional methods in terms of both speed and accuracy. Current state-of-the-art disparity estimation algorithms require many parameters and large amounts of computational resources and are not suited for applications on edge devices. In this paper, we propose an end-to-end light-weight network (LWNet) for fast stereo matching, which consists of an efficient backbone with multi-scale feature fusion for feature extraction, a 3D U-Net aggregation architecture for disparity computation, and color guidance in a 2D convolutional neural network (CNN) for disparity refinement. We adopt MobileNetV2 as an efficient backbone in feature extraction. The channel attention module is applied to improve the representational capacity of features and multi-resolution information is adaptively incorporated into the cost volume via cross-scale connections. In addition, instead of using regular 3D convolutions, we utilize pseudo 3D convolutions in the 3D U-Net architecture to aggregate the cost volume for a better balance between computational cost and accuracy. Further, we introduce a left-right consistency check and color guidance and design a robust disparity refinement network with skip connections and dilated convolutions to capture global context information and further improve disparity-estimation accuracy with little computational cost and memory space. A depth-wise separable convolution is proposed to replace all the standard convolutions in the section of disparity refinement, which can decrease computational complexity and the number of parameters without significant accuracy reduction. Extensive experiments on Scene Flow, KITTI 2015, and KITTI 2012 benchmarks demonstrate that the proposed LWNet achieves competitive accuracy when compared with state-of-the-art stereo matching methods.

EBStereo: Edge-Based Loss Function for Real-Time Stereo Matching

Giving loss a personal course: Universal loss reweighting to improve stereo matching via uncertainty guidance

EdgeStereo: an Effective Multi-Task Learning Network for Stereo Matching and Edge Detection.

MA-Stereo: Real-Time Stereo Matching Via Multi-Scale Attention Fusion and Spatial Error-Aware Refinement

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

Optimized Deep Learning Stereo Matching Algorithm

Improved real-time three-dimensional stereo matching with local consistency

A Light-Weight Stereo Matching Network Based on Multi-Scale Features Fusion and Robust Disparity Refinement

EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching

EAI-Stereo: Error Aware Iterative Network for Stereo Matching

A Light-Weight Network with Multi-Scale Features Fusion and Color Guidance for Stereo Matching

Accurate edge-preserving stereo matching by enhancing anisotropy.

Edge supervision and multi-scale cost volume for stereo matching

Stereo Matching Method with Integrated Geometric Encoding for Disparity Refinement

Patchmatch Stereo++: Patchmatch Binocular Stereo with Continuous Disparity Optimization

Stereo Matching Method Based on Space-Aware Network Model

Deep Stereo Matching With Hysteresis Attention and Supervised Cost Volume Construction

Stereo Risk: A Continuous Modeling Approach to Stereo Matching

Eglcr: Edge Structure Guidance and Scale Adaptive Attention for Iterative Stereo Matching

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

Superpixel Guided Network for Three-Dimensional Stereo Matching