DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction

Jenny Seidenschwarz,Qunjie Zhou,Bardienus Duisterhof,Deva Ramanan,Laura Leal-Taixé

2024-09-04

Abstract:Reconstructing scenes and tracking motion are two sides of the same coin. Tracking points allow for geometric reconstruction [14], while geometric reconstruction of (dynamic) scenes allows for 3D tracking of points over time [24, 39]. The latter was recently also exploited for 2D point tracking to overcome occlusion ambiguities by lifting tracking directly into 3D [38]. However, above approaches either require offline processing or multi-view camera setups both unrealistic for real-world applications like robot navigation or mixed reality. We target the challenge of online 2D and 3D point tracking from unposed monocular camera input introducing Dynamic Online Monocular Reconstruction (DynOMo). We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion. Our approach extends 3D Gaussians to capture new content and object motions while estimating camera movements from a single RGB frame. DynOMo stands out by enabling emergence of point trajectories through robust image feature reconstruction and a novel similarity-enhanced regularization term, without requiring any correspondence-level supervision. It sets the first baseline for online point tracking with monocular unposed cameras, achieving performance on par with existing methods. We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily addresses the following issues: 1. **Online Monocular Point Tracking**: - Proposes a new method called **DynOMo** for online 2D and 3D point tracking from monocular videos with unknown (uncalibrated) poses. - Utilizes Dynamic Online Monocular Gaussian Reconstruction technology to achieve real-time point tracking in dynamic scenes. 2. **Geometric Reconstruction and Camera Localization**: - Combines 3D Gaussian scattering technology to achieve dynamic scene reconstruction and camera localization, enabling effective point tracking without relying on multi-view or multi-camera setups. 3. **No Correspondence Supervision Needed**: - Achieves point trajectory generation without correspondence-level supervision by enhancing image features and depth supervision signals, along with a novel similarity-enhancing regularization term. 4. **Online Processing and Real-Time Applications**: - Unlike existing methods that require offline processing or complex multi-view camera setups, DynOMo can be applied to real-time responsive applications such as real-time video analysis, surveillance systems, and mixed reality. ### Summary This paper aims to develop a technology capable of efficient online point tracking with monocular video input, overcoming the limitations of traditional methods in practical applications. It achieves effective point tracking in dynamic scenes without the need for prior knowledge of camera poses.

DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction

Object-centric Reconstruction and Tracking of Dynamic Unknown Objects using 3D Gaussian Splatting

PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching

Mobile3DRecon: Real-time Monocular 3D Reconstruction on a Mobile Phone

Object-Level Pseudo-3D Lifting for Distance-Aware Tracking

Optical coherence tomography findings in multiple evanescent white dot syndrome.

D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video

Tracking Objects as Points

OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

BiTrack: Bidirectional Offline 3D Multi-Object Tracking Using Camera-LiDAR Data

A Region-based Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking

Joint Monocular 3D Vehicle Detection and Tracking

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos

DGNS: Deformable Gaussian Splatting and Dynamic Neural Surface for Monocular Dynamic 3D Reconstruction

PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video

SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking

6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting

Delving into Motion-Aware Matching for Monocular 3D Object Tracking

DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields

DynPoint: Dynamic Neural Point For View Synthesis