Stereo Matching with Local Cost Volume Refinement Network

Mingzhu Wan,Lingbao Kong
DOI: https://doi.org/10.1117/12.2653259
2023-01-01
Abstract:Deep learning methods have been widely used to complete the task of stereo matching in recent years, which is the key step in machine vision measurement. State-of-the-art methods are three-dimensional (3D) end-to-end networks that forms a cost volume by concatenating extracted features and processes it with 3D modules. Despite the strong performance in terms of accuracy, 3D networks mostly have high computational cost, heavy memory storge and long run-time. In this paper proposed Local Cost Volume Refinement Network (LCRN), which is a two-dimensional (2D) end-to-end network composed of feature extraction, disparity initialization, disparity refinement and disparity mergence module. LCRN initializes disparity maps by using correlation layer and residual blocks, and refines them by using local cost volumes, residual blocks and disparity regression. Local cost volumes are constructed by warping right features and giving a small disparity shift. To verify the effectiveness of LCRN, the network was pre-trained on SceneFlow dataset and fine-tuned on ROBI dataset. The network is evaluated on the test set of ROBI for robotic bin-picking. Experimental results show that LCRN maintains a competitive accuracy while having fast run-time and requiring less memory storage.
What problem does this paper attempt to address?