Abstract:We introduce MFE‐MVSNet, which is designed for more effective and precise depth estimation. According to experiments, when compared to other multi‐view stereo networks, our approach more effectively balances reconstruction quality with efficiency. Recent advancements in deep learning have significantly improved performance in the multi‐view stereo (MVS) domain, yet achieving a balance between reconstruction efficiency and quality remains challenging for learning‐based MVS methods. To address this, we introduce MFE‐MVSNet, designed for more effective and precise depth estimation. Our model incorporates a pyramid feature extraction network, featuring efficient multi‐scale attention and multi‐scale feature enhancement modules. These components capture pixel‐level pairwise relationships and semantic features with long‐range contextual information, enhancing feature representation. Additionally, we propose a lightweight 3D UNet regularization network based on depthwise separable convolutions to reduce computational costs. This network employs bi‐directional skip connections, establishing a fluid relationship between encoders and decoders and enabling cyclic reuse of building blocks without adding learnable parameters. By integrating these methods, MFE‐MVSNet effectively balances reconstruction quality and efficiency. Extensive qualitative and quantitative experiments on the DTU dataset validate our model's competitiveness, demonstrating approximately 33% and 12% relative improvements in overall score compared to MVSNet and CasMVSNet, respectively. Compared to other MVS networks, our approach more effectively balances reconstruction quality with efficiency.

Hierarchical MVSNet with Cost Volume Separation and Fusion Based on U-shape Feature Extraction

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

Attention-enhanced multi-source cost volume multi-view stereo

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering

Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

OD-MVSNet: Omni-dimensional dynamic multi-view stereo network

EI-MVSNet: Epipolar-Guided Multi-View Stereo Network With Interval-Aware Label

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

Enhanced multi view 3D reconstruction with improved MVSNet

MFE‐MVSNet: Multi‐scale feature enhancement multi‐view stereo with bi‐directional connections

Multi-View Stereo Representation Revist: Region-Aware MVSNet

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

EPP-MVSNet: Epipolar-assembling based Depth Prediction for Multi-view Stereo

Multi-View Stereo Network with attention thin volume

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

Transformer-guided Feature Pyramid Network for Multi-View Stereo

Uanet: uncertainty-aware cost volume aggregation-based multi-view stereo for 3D reconstruction

Cost Volume Pyramid Based Depth Inference for Multi-View Stereo

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement