Abstract:Significant progress has been witnessed in learning-based Multi-view Stereo (MVS) under supervised and unsupervised settings. To combine their respective merits in accuracy and completeness, meantime reducing the demand for expensive labeled data, this paper explores the problem of learning-based MVS in a semi-supervised setting that only a tiny part of the MVS data is attached with dense depth ground truth. However, due to huge variation of scenarios and flexible settings in views, it may break the basic assumption in classic semi-supervised learning, that unlabeled data and labeled data share the same label space and data distribution, named as semi-supervised distribution-gap ambiguity in the MVS problem. To handle these issues, we propose a novel semi-supervised distribution-augmented MVS framework, namely SDA-MVS. For the simple case that the basic assumption works in MVS data, consistency regularization encourages the model predictions to be consistent between original sample and randomly augmented sample. For further troublesome case that the basic assumption is conflicted in MVS data, we propose a novel style consistency loss to alleviate the negative effect caused by the distribution gap. The visual style of unlabeled sample is transferred to labeled sample to shrink the gap, and the model prediction of generated sample is further supervised with the label in original labeled sample. The experimental results in semi-supervised settings of multiple MVS datasets show the superior performance of the proposed method. With the same settings in backbone network, our proposed SDA-MVS outperforms its fully-supervised and unsupervised baselines.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is in the Multi - view Stereo (MVS) task, how to combine the advantages of supervised learning and unsupervised learning to reduce the need for expensive labeled data when only a small amount of data has dense depth ground - truth labels. Specifically, the paper explores the MVS problem in a semi - supervised setting and proposes a novel framework to deal with the problem of the failure of traditional semi - supervised learning assumptions due to large scene variations. ### Problem Background 1. **Advantages and Disadvantages of Supervised and Unsupervised Learning**: - **Supervised Learning**: Although it can provide more accurate 3D reconstruction results, it requires a large amount of dense depth ground - truth labels, and the collection process of these labels is both expensive and time - consuming. - **Unsupervised Learning**: By using photometric consistency loss to avoid relying on depth ground - truth, it can provide an effective supervision signal on all pixels, thereby improving the integrity of 3D reconstruction, but it has certain limitations in accuracy. 2. **Challenges in Semi - supervised Learning**: - In the MVS problem, labeled and unlabeled data may come from different distributions, which violates the basic assumption of traditional semi - supervised learning, that is, labeled and unlabeled data share the same label space and data distribution. - This distribution difference is called "semi - supervised distribution - gap ambiguity", which may lead to unstable model training or performance degradation. ### The Method Proposed in the Paper To solve the above problems, the paper proposes a new semi - supervised distribution - enhanced MVS framework (SDA - MVS), which specifically includes the following aspects: 1. **Basic Framework**: - For labeled samples, use common supervised losses for supervision. - For unlabeled samples, use photometric consistency losses for supervision. - Do not introduce additional self - supervised loss extensions to maintain a simple pipeline. 2. **Consistency Regularization**: - For simple scenes, use consistency regularization losses to minimize the depth prediction differences between the original samples and randomly augmented samples. - Through data augmentation and proximity in the latent space, enforce low - density separation boundaries between classes and at the same time propagate the prior of labeled data to unlabeled data. 3. **Style Consistency Loss**: - For complex scenes, propose style consistency losses, including a Style Transfer Module (STM) and a Geometry Preservation Module (GPM). - The STM transfers the visual style of unlabeled samples to labeled samples to narrow the distribution gap. - The GPM uses a Spatial Propagation Network (SPN) to deal with the possible loss of geometric details during the style transfer process and ensure the geometric consistency of the generated images. ### Experimental Results The paper conducts experiments on multiple MVS datasets (such as DTU, BlendedMVS, GTA - SFM, and Tanks & Temples) to verify the effectiveness of the proposed method. The experimental results show that SDA - MVS outperforms fully - supervised and unsupervised baseline methods in the semi - supervised setting. ### Summary The main contributions of the paper are: 1. Proposing a new semi - supervised MVS framework SDA - MVS, which solves the situation where only a small amount of data has dense depth ground - truth labels. 2. Introducing style consistency losses to deal with the distribution gap problem between labeled and unlabeled data. 3. The experimental results on multiple MVS datasets show the superior performance of this method and further extend it to semi - supervised domain adaptation tasks. Through these innovations, the paper effectively combines the advantages of supervised learning and unsupervised learning, reduces the need for expensive labeled data, and at the same time improves the quality of 3D reconstruction.

Semi-supervised Deep Multi-view Stereo

Self-supervised Multi-view Stereo Via Inter and Intra Network Pseudo Depth

ADeLA: Automatic Dense Labeling with Attention for Viewpoint Shift in Semantic Segmentation

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Unsupervised multi-view stereo network based on multi-stage depth estimation

Digging into Uncertainty in Self-supervised Multi-view Stereo

Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency

Learning-based Multi-View Stereo: A Survey

A contrastive learning based unsupervised multi-view stereo with multi-stage self-training strategy

CL-MVSNet: Unsupervised Multi-View Stereo with Dual-Level Contrastive Learning

RobustMVS: Single Domain Generalized Deep Multi-view Stereo

SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

Semi-Stereo: A Universal Stereo Matching Framework for Imperfect Data Via Semi-supervised Learning

Mono‐MVS: textureless‐aware multi‐view stereo assisted by monocular prediction

Rethinking the Multi-view Stereo from the Perspective of Rendering-based Augmentation

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

Adaptive Learning for Multi-view Stereo Reconstruction

A Multitask Network for Multiview Stereo Reconstruction: When Semantic Consistency-Based Clustering Meets Depth Estimation Optimization

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo