CGFNet: 3D Convolution Guided and Multi-scale Volume Fusion Network for fast and robust stereo matching

Qingyu Wang,Hao Xing,Yibin Ying,Mingchuan Zhou
DOI: https://doi.org/10.1016/j.patrec.2023.07.012
IF: 4.757
2023-07-29
Pattern Recognition Letters
Abstract:Nowadays, although significant progress has been made by convolutional neural network, it is still difficult to realize accurate and robust stereo matching in real time. In this article, we study how to achieve more accurate and robust disparity estimation based on real-time requirement. For this reason, a M ulti-scale V olume F usion ( MVF ) module was proposed and embedded to improve the matching accuracy. A stacked hourglass 3D convolution branch was appended on real-time model for guidance during the training process and discarded for lightweight inference. Based on these two structures, we designed an end-to-end stereo matching method called 3D C onvolution G uided and Multi-scale Cost Volume F usion Net work ( CGFNet ). Experimental results showed that our CGFNet has better generalization performance on cross-domain datasets, which achieves more accurate disparity estimation without additional fine tuning process in challenging regions. On KITTI benchmark, CGFNet reached D1-all =1.98% with substantial improvement among the State-Of-The-Art (SOTA) real-time models and runs a pair of images within 38 ms (26 fps). The results are notable when considering both matching accuracy and real-time performance.
computer science, artificial intelligence
What problem does this paper attempt to address?