VHS: High-Resolution Iterative Stereo Matching with Visual Hull Priors

Markus Plack,Hannah Dröge,Leif Van Holland,Matthias B. Hullin
2024-06-05
Abstract:We present a stereo-matching method for depth estimation from high-resolution images using visual hulls as priors, and a memory-efficient technique for the correlation computation. Our method uses object masks extracted from supplementary views of the scene to guide the disparity estimation, effectively reducing the search space for matches. This approach is specifically tailored to stereo rigs in volumetric capture systems, where an accurate depth plays a key role in the downstream reconstruction task. To enable training and regression at high resolutions targeted by recent systems, our approach extends a sparse correlation computation into a hybrid sparse-dense scheme suitable for application in leading recurrent network architectures. We evaluate the performance-efficiency trade-off of our method compared to state-of-the-art methods, and demonstrate the efficacy of the visual hull guidance. In addition, we propose a training scheme for a further reduction of memory requirements during optimization, facilitating training on high-resolution data.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the depth estimation problem in high - resolution stereo matching, especially how to reduce the consumption of memory and computing resources while maintaining accuracy. Specifically, the author proposes a new method. By introducing the visual hull as prior information to guide the sparse - dense iterative stereo - matching network, the efficiency and accuracy of stereo - matching are improved. ### Core Problems of the Paper 1. **Stereo - matching of High - resolution Images**: Existing methods based on dense correlation volumes face bottlenecks in memory and computing resources when processing high - resolution images. 2. **Reducing the Search Space**: Traditional stereo - matching methods need to perform matching within all possible disparity ranges, resulting in high computational complexity. This paper effectively reduces the search space by using the visual hull as prior information. 3. **Efficient Correlation Calculation**: To address the challenges brought by high - resolution data, this paper proposes a sparse - dense hybrid correlation calculation method, which can significantly reduce memory requirements while maintaining accuracy. ### Overview of the Solution - **Visual Hull Prior**: Use object masks extracted from auxiliary views to generate a visual hull as prior information for initial disparity estimation. This not only reduces the search space but also improves the accuracy of the initial disparity estimation. - **Sparse - Dense Hybrid Method**: First, calculate the initial disparity using a sparse method, and then refine it through a memory - efficient dense method. This can significantly reduce memory usage without sacrificing accuracy. - **Iterative Optimization**: Perform iterative optimization through the ConvGRU (Convolutional Gated Recurrent Unit) network to gradually improve the accuracy of disparity estimation. ### Specific Steps of the Method 1. **Feature Extraction**: Perform feature encoding on the input stereo image pair to generate a feature representation. 2. **Initial Disparity Estimation**: Combine the visual hull prior information, select the most likely disparity candidate values from the sparse correlation volume, and generate an initial disparity map. 3. **Iterative Refinement**: Use the ConvGRU network and local dense correlation calculation to iteratively optimize the initial disparity map and gradually improve the accuracy of disparity estimation. 4. **Memory - efficient Training**: Reduce the memory usage during the training process through phased forward and backward propagation while maintaining the accuracy of gradient information. ### Experimental Results The paper verifies the effectiveness of the proposed method through multiple experiments, including: - Benchmark evaluation on the SceneFlow test set, showing competitiveness in terms of average end - point error (EPE), etc. - Performance evaluation on high - resolution data sets, proving the advantages of the method in dealing with large - disparity cases. - Analyzing the role of visual hull prior information through ablation experiments, further verifying its importance for performance improvement. In conclusion, this paper proposes an innovative stereo - matching method. By introducing visual hull prior information and sparse - dense hybrid correlation calculation, it solves the bottleneck problems of memory and computing resources in high - resolution stereo - matching and improves the accuracy and efficiency of depth estimation.