Abstract:We introduce a new system for Multi-Session SLAM, which tracks camera motion across multiple disjoint videos under a single global reference. Our approach couples the prediction of optical flow with solver layers to estimate camera pose. The backbone is trained end-to-end using a novel differentiable solver for wide-baseline two-view pose. The full system can connect disjoint sequences, perform visual odometry, and global optimization. Compared to existing approaches, our design is accurate and robust to catastrophic failures. Code is available at

What problem does this paper attempt to address?

The problem that this paper attempts to solve is Multi - Session Simultaneous Localization and Mapping (Multi - Session SLAM). Specifically, the researchers proposed a new system to handle camera motion tracking from multiple discontinuous video streams and unify them into a global reference frame. The following are the key issues described in the paper: 1. **Challenges of Multi - Session SLAM**: - Traditional SLAM tasks assume that the input is a continuous video stream. - However, in practical applications, video data are often composed of multiple discontinuous segments, which may be intentional (such as collaborative 3D reconstruction) or due to visual discontinuities in the video stream (such as camera failure, extreme parallax, sharp turns, automatic exposure delay, dark areas, or severe occlusion of dynamic objects). 2. **Limitations of Existing Methods**: - Most existing Multi - Session SLAM methods rely on additional sensor data to eliminate the scale degree of freedom and simplify tracking. - Only a few methods (such as CCM - SLAM and ORB - SLAM3) support Multi - Session SLAM using only monocular RGB video, but these methods are based on classical feature descriptors and have lower average accuracy. - Other deep - learning methods (such as DROID - SLAM and DPVO), although performing well on a single continuous video, cannot handle large - baseline matching and non - local optimization, so they are not suitable for Multi - Session SLAM. 3. **The Method Proposed in This Paper**: - A new differentiable solver layer is introduced to minimize the Symmetric Epipolar Distance (SED) of bidirectional optical flow, thereby estimating the camera pose. - A unified backbone architecture is proposed, which can handle large - baseline relative pose estimation and visual odometry simultaneously. - By iteratively updating the optical flow and camera pose, this method can establish connections between multiple discontinuous video streams, perform visual odometry, and conduct global optimization. 4. **Experimental Verification**: - Evaluations were carried out on challenging real - world datasets such as EuRoC MAV and ETH3D, and the results show that this method is more accurate and robust than existing methods. - The two - view pose estimation method was separately evaluated on the Scannet and Megadepth datasets, and the results show that its performance is comparable to that of Transformer - based matching networks, especially in long - distance view matching. In conclusion, this paper aims to solve the key problems in Multi - Session SLAM, that is, how to perform accurate camera pose estimation and global optimization between multiple discontinuous video segments.

Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization

Multi-Session Slam over Low Dynamic Workspace Using Rgbd Sensor

Self-supervised Visual-LiDAR Odometry with Flip Consistency

Local Optimized and Scalable Frame-to-model SLAM

MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras

Omnidirectional Dense SLAM for Back-to-back Fisheye Cameras

Robust Monocular SLAM in Dynamic Environments

Multi-camera visual SLAM for autonomous navigation of micro aerial vehicles

Redesigning SLAM for Arbitrary Multi-Camera Systems

Robust Monocular SLAM for Egocentric Videos

MAVIS: Multi-Camera Augmented Visual-Inertial SLAM using SE2(3) Based Exact IMU Pre-integration

Asynchronous Multi-View SLAM

Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation

Hybrid Camera Pose Estimation With Online Partitioning for SLAM

Fast Direct Stereo Visual SLAM

Active Visual SLAM with Independently Rotating Camera

MultiCol-SLAM - A Modular Real-Time Multi-Camera SLAM System

Visual-Inertial Multi-Instance Dynamic SLAM with Object-level Relocalisation

Multi-object Monocular SLAM for Dynamic Environments

Design and Evaluation of a Generic Visual SLAM Framework for Multi-Camera Systems

A SLAM Pose Graph Optimization Method Using Dual Visual Odometry