HPM-TDP: an Efficient Hierarchical PatchMatch Depth Estimation Approach Using Tree Dynamic Programming
Mao Tian,Bisheng Yang,Chi Chen,Ronggang Huang,Liang Huo
DOI: https://doi.org/10.1016/j.isprsjprs.2019.06.015
IF: 12.7
2019-01-01
ISPRS Journal of Photogrammetry and Remote Sensing
Abstract:Accurate and efficient estimation of the dense depth information from a pair of stereo images is a key step for many applications such as digital surface model production, 3D reconstruction and visualization, autonomous driving, and robotic navigation. Although great progress has been achieved in stereo matching over the past decade, the matching difficulties in poor and repetitive texture regions remain an issue. Aiming at solving the shortcomings of the current methods, this paper proposes HPM-TDP, which is an efficient hierarchical PatchMatch depth estimation approach that integrates a coarse-to-fine image pyramid strategy with a continuous Markov random field (MRF)-based global energy optimization framework, and minimizes the energy function by combining a hierarchical PatchMatch (HPM) framework and local α-expansion based tree dynamic programming (TDP). Firstly, the coarse-to-fine image pyramid strategy is integrated with the PatchMatch filter algorithm to quickly generate the hierarchical disparity plane prior for initializing each pixel’s disparity plane of the energy function optimization. Secondly, a multi-resolution cost aggregation strategy is adopted to boost the robustness of the matching cost function in the poor and repetitive texture areas. Finally, the HPM framework and local α-expansion based TDP are adopted to solve the non-submodular energy optimization problem, resulting in a globally optimized disparity plane map. Three benchmark datasets—the Middlebury 3.0, KITTI 2015, and Vaihingen datasets—were used to test the performance of HPM-TDP. The comprehensive experimental results demonstrate that HPM-TDP obtains a good performance on all datasets in terms of the (“Out-Noc”, “Avg-Noc”, “Out-All”, “Avg-All”) of (15.45%, 4.16px, 24.26%, 12.14px) and (5.46%, 1.20px, 6.55%, 1.54px) for Middlebury 3.0 and KITTI 2015 training datasets, and the (“Out-All”, “Avg-All”) of (26.32%, 4.04px) for Vaihingen dataset, respectively.