PVStereo: Pyramid Voting Module for End-to-End Self-Supervised Stereo Matching

Hengli Wang,Rui Fan,Peide Cai,Ming Liu
DOI: https://doi.org/10.1109/LRA.2021.3068108
2021-03-12
Abstract:Supervised learning with deep convolutional neural networks (DCNNs) has seen huge adoption in stereo matching. However, the acquisition of large-scale datasets with well-labeled ground truth is cumbersome and labor-intensive, making supervised learning-based approaches often hard to implement in practice. To overcome this drawback, we propose a robust and effective self-supervised stereo matching approach, consisting of a pyramid voting module (PVM) and a novel DCNN architecture, referred to as OptStereo. Specifically, our OptStereo first builds multi-scale cost volumes, and then adopts a recurrent unit to iteratively update disparity estimations at high resolution; while our PVM can generate reliable semi-dense disparity images, which can be employed to supervise OptStereo training. Furthermore, we publish the HKUST-Drive dataset, a large-scale synthetic stereo dataset, collected under different illumination and weather conditions for research purposes. Extensive experimental results demonstrate the effectiveness and efficiency of our self-supervised stereo matching approach on the KITTI Stereo benchmarks and our HKUST-Drive dataset. PVStereo, our best-performing implementation, greatly outperforms all other state-of-the-art self-supervised stereo matching approaches. Our project page is available at <a class="link-external link-http" href="http://sites.google.com/view/pvstereo" rel="external noopener nofollow">this http URL</a>.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?