SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization

Zhenlong Yuan,Jiakai Cao,Zhaoxin Li,Hao Jiang,Zhaoqi Wang
2024-01-12
Abstract:In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo (SD-MVS), a method that can effectively tackle challenges in 3D reconstruction of textureless areas. We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes and further leverage these constraints for pixelwise patch deformation on both matching cost and propagation. Concurrently, we propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths, significantly improving the completeness of reconstructed 3D model. Furthermore, we adopt the Expectation-Maximization (EM) algorithm to alternately optimize the aggregate matching cost and hyperparameters, effectively mitigating the problem of parameters being excessively dependent on empirical tuning. Evaluations on the ETH3D high-resolution multi-view stereo benchmark and the Tanks and Temples dataset demonstrate that our method can achieve state-of-the-art results with less time consumption.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper "SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization" aims to address the challenges faced by Multi-View Stereo (MVS) technology in 3D reconstruction in textureless regions. Specifically, the paper identifies the following main issues: 1. **Inaccurate Depth Estimation in Textureless Regions**: - Traditional MVS methods struggle with depth estimation in textureless regions due to the lack of texture information. - Existing methods have attempted to improve this through techniques like planar priors and superpixel segmentation, but their performance in large textureless areas remains unsatisfactory. 2. **Parameter Dependence on Empirical Tuning**: - In current MVS methods, parameters often need to be manually adjusted, which is time-consuming and can lead to suboptimal results. 3. **High Memory and Time Consumption**: - Learning-based MVS methods can improve reconstruction quality but often come with high time and memory costs, limiting their practical application. 4. **Insufficient Utilization of Edge Information**: - Edge information is crucial in image processing, but existing methods inadequately utilize edge information, especially in complex scenes where shadows and occlusions weaken the association between edges and depth boundaries. ### Solutions To address the above issues, the paper proposes a new method—**Segmentation-Driven Deformation Multi-View Stereo (SD-MVS)**, with the following main contributions: 1. **Instance Segmentation-Based Adaptive Patch Deformation**: - Utilizes the Segment Anything Model (SAM) for instance segmentation to extract fine edge information while ignoring strong lighting interference. - Through adaptive deformation patches, it better utilizes image edge information, improving the accuracy of matching costs and propagation. 2. **Spherical Gradient Refinement**: - Introduces a spherical coordinate system and gradient descent method to optimize the search accuracy of normals and depth. - By randomly selecting two orthogonal unit vectors for perturbation and further optimizing the perturbation direction with gradient descent, it improves the accuracy of each hypothesis. 3. **EM Algorithm-Based Hyperparameter Optimization**: - Employs the Expectation-Maximization (EM) algorithm to alternately optimize aggregated matching costs and hyperparameters, achieving automatic parameter tuning and balancing different information considerations. 4. **Multi-Scale Consistency Architecture**: - Introduces a multi-scale consistency architecture to reduce memory consumption and improve operational efficiency. - By parallel loading images of different scales, it replaces the traditional cascade architecture, reducing data transfer time between the CPU and GPU. ### Experimental Results The paper evaluates the SD-MVS method on the ETH3D high-resolution multi-view stereo benchmark and the Tanks and Temples dataset. The results show that the SD-MVS method achieves state-of-the-art performance while reducing time consumption. ### Conclusion By introducing techniques such as instance segmentation, spherical gradient refinement, and EM algorithm optimization, the paper effectively addresses issues like inaccurate depth estimation in textureless regions, parameter dependence on empirical tuning, and high memory and time consumption in MVS. This provides new insights for the development of multi-view stereo technology.