Abstract:In recent years, convolutional-neural-network based stereo matching methods have achieved significant gains compared to conventional methods in terms of both speed and accuracy. Current state-of-the-art disparity estimation algorithms require many parameters and large amounts of computational resources and are not suited for applications on edge devices. In this paper, we propose an end-to-end light-weight network (LWNet) for fast stereo matching, which consists of an efficient backbone with multi-scale feature fusion for feature extraction, a 3D U-Net aggregation architecture for disparity computation, and color guidance in a 2D convolutional neural network (CNN) for disparity refinement. We adopt MobileNetV2 as an efficient backbone in feature extraction. The channel attention module is applied to improve the representational capacity of features and multi-resolution information is adaptively incorporated into the cost volume via cross-scale connections. In addition, instead of using regular 3D convolutions, we utilize pseudo 3D convolutions in the 3D U-Net architecture to aggregate the cost volume for a better balance between computational cost and accuracy. Further, we introduce a left-right consistency check and color guidance and design a robust disparity refinement network with skip connections and dilated convolutions to capture global context information and further improve disparity-estimation accuracy with little computational cost and memory space. A depth-wise separable convolution is proposed to replace all the standard convolutions in the section of disparity refinement, which can decrease computational complexity and the number of parameters without significant accuracy reduction. Extensive experiments on Scene Flow, KITTI 2015, and KITTI 2012 benchmarks demonstrate that the proposed LWNet achieves competitive accuracy when compared with state-of-the-art stereo matching methods.

Holistic and Contextual Evidential Stereo-LiDAR Fusion for Depth Estimation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

MFF-Net: Towards Efficient Monocular Depth Completion With Multi-Modal Feature Fusion

Reliable Fusion of ToF and Stereo Data Based on Joint Depth Filter

Depth Generation Network: Estimating Real World Depth From Stereo And Depth Images

Expanding Sparse LiDAR Depth and Guiding Stereo Matching for Robust Dense Depth Estimation

LSMD-Net: LiDAR-Stereo Fusion with Mixture Density Network for Depth Sensing

SLFNet: A Stereo and LiDAR Fusion Network for Depth Completion

3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization

Sparse LIDAR Measurement Fusion with Joint Updating Cost for Fast Stereo Matching

Real-time depth completion based on LiDAR-stereo for autonomous driving

ELFNet: Evidential Local-global Fusion for Stereo Matching

Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios

K-nearest Neighborhood Based Integration of Time-of-flight Cameras and Passive Stereo for High-Accuracy Depth Maps.

Robust and accurate depth estimation by fusing LiDAR and Stereo

Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth Estimation

Depth Completion via Inductive Fusion of Planar LIDAR and Monocular Camera

Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions

Joint Classification of Hyperspectral and LiDAR Data Using Height Information Guided Hierarchical Fusion-and-Separation Network

A Light-Weight Network with Multi-Scale Features Fusion and Color Guidance for Stereo Matching

FastFusion: Deep stereo‐LiDAR fusion for real‐time high‐precision dense depth sensing