Cascaded multi-scale and multi-dimension convolutional neural network for stereo matching.

Haihua Lu,Hai Xu,Li Zhang,Yong Zhao
DOI: https://doi.org/10.1109/VCIP.2018.8698637
2018-01-01
Abstract:Convolutional neural networks (CNN) have been shown to perform better than the conventional stereo algorithms for stereo estimation. Numerous CNN algorithms focus on the pixel-wise matching cost computation, which is the important building block for many state-of-the-art algorithms. However, these architectures are limited to small and single scale receptive fields and use traditional methods for cost aggregation or even ignore cost aggregation. In this paper, we propose a novel architecture called cascaded multi-scale and multi-dimension network (MSMD) to take them both into consideration. Firstly, we propose a new multi-scale matching cost computation sub-network, in which two different sizes of receptive fields are implemented parallelly. In this way, the network can make the best use of both variants to balance the trade-off between the increase of receptive field and the loss of details. Furthermore, we show that our multi-dimension aggregation sub-network which contains 2D convolution and 3D convolution operations can provide rich context and semantic information for estimating an accurate initial disparity. Finally, experiments on challenging stereo benchmark KITTI demonstrate that the proposed method can achieve competitive results even without any additional post-processing.
What problem does this paper attempt to address?