MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion

Bardienus Duisterhof,Lojze Zust,Philippe Weinzaepfel,Vincent Leroy,Yohann Cabon,Jerome Revaud
2024-09-28
Abstract:Structure-from-Motion (SfM), a task aiming at jointly recovering camera poses and 3D geometry of a scene given a set of images, remains a hard problem with still many open challenges despite decades of significant progress. The traditional solution for SfM consists of a complex pipeline of minimal solvers which tends to propagate errors and fails when images do not sufficiently overlap, have too little motion, etc. Recent methods have attempted to revisit this paradigm, but we empirically show that they fall short of fixing these core issues. In this paper, we propose instead to build upon a recently released foundation model for 3D vision that can robustly produce local 3D reconstructions and accurate matches. We introduce a low-memory approach to accurately align these local reconstructions in a global coordinate system. We further show that such foundation models can serve as efficient image retrievers without any overhead, reducing the overall complexity from quadratic to linear. Overall, our novel SfM pipeline is simple, scalable, fast and truly unconstrained, i.e. it can handle any collection of images, ordered or not. Extensive experiments on multiple benchmarks show that our method provides steady performance across diverse settings, especially outperforming existing methods in small- and medium-scale settings.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the **Structure - from - Motion (SfM) problem in unconstrained scenarios**. Specifically, SfM is a task aimed at simultaneously recovering camera poses and 3D scene geometry from a set of images. Despite decades of development, this task still faces many challenges, especially performing poorly in cases such as insufficient input image overlap and too - little motion. Traditional methods usually use a complex pipeline composed of a series of minimal solvers to solve the SfM problem. This method is prone to error propagation and fails in cases of insufficient image overlap or motion. Although some recent methods have attempted to re - examine this paradigm, experiments show that they have not completely solved these core problems. For this reason, the paper proposes **MASt3R - SfM**, which is a brand - new SfM pipeline. Based on the recently released 3D vision foundation model, it can robustly generate local 3D reconstructions and accurate matches. The paper introduces a low - memory method for accurately aligning these local reconstructions to the global coordinate system. In addition, the paper also shows that this foundation model can be used as an efficient image retrieval tool without additional overhead, thereby reducing the overall complexity from quadratic to linear. Overall, the new SfM pipeline is simple, scalable, fast, and truly unconstrained, that is, it can handle any collection of images, whether ordered or unordered. Through extensive experiments on multiple benchmarks, the paper shows that this method provides stable performance under various settings, especially significantly outperforming existing methods in small - scale and medium - scale settings.