BSI-MVS: multi-view stereo network with bidirectional semantic information

Ruiming Jia,Jun Yu,Zhenghui Hu,Fei Yuan

DOI: https://doi.org/10.1038/s41598-024-55612-6

IF: 4.6

2024-03-23

Scientific Reports

Abstract:The basic principle of multi-view stereo (MVS) is to perform 3D reconstruction by extracting depth information from multiple views. Most current SOTA MVS networks are based on Vision Transformer, which usually means expensive computational complexity. To reduce computational complexity and improve depth map accuracy, we propose a MVS network with Bidirectional Semantic Information (BSI-MVS). Firstly, we design a Multi-Level Spatial Pyramid module to generate multiple layers of feature map for extracting multi-scale information. Then we propose a 2D Bidirectional-LSTM module to capture bidirectional semantic information at different time steps in the horizontal and vertical directions, which contains abundant depth information. Finally, cost volumes are built based on various levels of feature maps to optimize the final depth map. We experiment on the DTU and BlendedMVS datasets. The result shows that our network, in terms of overall metrics, surpasses TransMVSNet, CasMVSNet, CVP-MVSNet, and AACVP-MVSNet respectively by 17.84%, 36.42%, 14.96%, and 4.86%, which also shows a noticeable performance enhancement in objective metrics and visualizations.

multidisciplinary sciences

What problem does this paper attempt to address?

The paper attempts to address the issues of high computational complexity and insufficient depth map accuracy in multi-view stereo (MVS) reconstruction. Specifically: 1. **Computational Complexity**: Most state-of-the-art MVS networks are based on Vision Transformer. Although this model performs well in feature extraction, it has high computational complexity, leading to inefficiency when processing high-resolution images and slow convergence speed. 2. **Depth Map Accuracy**: Traditional MVS methods tend to encounter problems such as holes and texture mixing when dealing with complex geometric structures or textureless regions, affecting the reconstruction quality. To solve these problems, the authors propose a new MVS network—BSI-MVS (Bidirectional Semantic Information Multi-View Stereo), with the main innovations including: - **Multi-Scale Spatial Pyramid Module (MLSP)**: Generates multiple levels of feature maps, extracts multi-scale information, and enhances the network's adaptability to different spatial structures. - **Bidirectional LSTM Module (BiLSTM)**: Captures bidirectional semantic information in both horizontal and vertical directions, contains rich depth information, and improves the model's generalization ability and depth map accuracy. - **Cost Volume Construction**: Constructs cost volumes based on feature maps of different levels to optimize the final depth map. With these improvements, BSI-MVS achieves significant enhancements in network performance and depth map accuracy. Experimental results show that BSI-MVS outperforms TransMVSNet, CasMVSNet, CVP-MVSNet, and AACVP-MVSNet on the DTU and BlendedMVS datasets, with overall metrics improved by 17.84%, 36.42%, 14.96%, and 4.86%, respectively.

BSI-MVS: multi-view stereo network with bidirectional semantic information

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

Bi-ClueMVSNet: Learning Bidirectional Occlusion Clues for Multi-View Stereo.

A Multitask Network for Multiview Stereo Reconstruction: When Semantic Consistency-Based Clustering Meets Depth Estimation Optimization

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

EI-MVSNet: Epipolar-Guided Multi-View Stereo Network With Interval-Aware Label

MFE‐MVSNet: Multi‐scale feature enhancement multi‐view stereo with bi‐directional connections

Visibility-Aware Point-Based Multi-View Stereo Network

OD-MVSNet: Omni-dimensional dynamic multi-view stereo network

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

Enhanced multi view 3D reconstruction with improved MVSNet

Mono‐MVS: textureless‐aware multi‐view stereo assisted by monocular prediction

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

Unsupervised multi-view stereo network based on multi-stage depth estimation

Attention-enhanced multi-source cost volume multi-view stereo

RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo

MVSNet: Depth Inference for Unstructured Multi-view Stereo

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction