SADG: Segment Any Dynamic Gaussian Without Object Trackers

Yun-Jin Li,Mariia Gladkova,Yan Xia,Daniel Cremers
2024-11-29
Abstract:Understanding dynamic 3D scenes is fundamental for various applications, including extended reality (XR) and autonomous driving. Effectively integrating semantic information into 3D reconstruction enables holistic representation that opens opportunities for immersive and interactive applications. We introduce SADG, Segment Any Dynamic Gaussian Without Object Trackers, a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs. In contrast to existing works, we do not rely on supervision based on object identities to enable consistent segmentation of dynamic 3D objects. To this end, we propose to learn semantically-aware features by leveraging masks generated from the Segment Anything Model (SAM) and utilizing our novel contrastive learning objective based on hard pixel mining. The learned Gaussian features can be effectively clustered without further post-processing. This enables fast computation for further object-level editing, such as object removal, composition, and style transfer by manipulating the Gaussians in the scene. We further extend several dynamic novel-view datasets with segmentation benchmarks to enable testing of learned feature fields from unseen viewpoints. We evaluate SADG on proposed benchmarks and demonstrate the superior performance of our approach in segmenting objects within dynamic scenes along with its effectiveness for further downstream editing tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to achieve multi - view - consistent semantic segmentation in dynamic 3D scenes without relying on object trackers. Specifically, the authors propose a new method - SADG (Segment Any Dynamic Gaussian Without Object Trackers), aiming to combine dynamic Gaussian point cloud representations and semantic information to achieve consistent segmentation of dynamic 3D objects. ### Main problems 1. **Semantic segmentation in dynamic 3D scenes**: - Existing methods usually rely on object trackers to provide consistent object mask IDs, but these methods are prone to inconsistency problems in multi - view scenes, resulting in the failure of the optimization pipeline or sub - optimal results. 2. **Efficient and real - time interactive editing**: - Existing methods are computationally intensive when dealing with dynamic scenes and need to re - render multiple views to ensure editing consistency, which limits their application in real - time interactive tasks. ### Solutions - **Semantic segmentation without trackers**: SADG avoids relying on object trackers by introducing a new contrastive learning objective and using masks generated from SAM (Segment Anything Model) to learn semantically - aware features. - **Efficient feature learning and clustering**: SADG uses 32 - dimensional compact Gaussian features and clusters them through the DBSCAN algorithm, so that the features can be effectively grouped without further post - processing. - **Extended data set**: To evaluate the effectiveness of the learned feature fields, the authors extended several dynamic new - view data sets and added segmentation benchmarks, so that the performance of the model can be tested on unseen views. ### Key contributions 1. **Proposing the SADG framework**: Achieved multi - view - consistent segmentation of dynamic scenes without tracking supervision. 2. **Innovative contrastive learning objective**: Utilized hard positive - negative sample mining techniques to learn semantically - aware latent representations from 2D masks. 3. **Extensive experimental verification**: Conducted extensive experiments in single - view and multi - view scenes, demonstrating superior performance on five dynamic new - view benchmarks. 4. **Application in downstream tasks**: Demonstrated the versatility of the learned feature space, including editing tasks such as object removal, style transfer, and scene composition. 5. **User - friendly interaction interface**: Provided tools that can edit scenes through simple mouse clicks or text prompts, which are easy to operate and real - time. Through these contributions, SADG not only solves the challenges of semantic segmentation in dynamic 3D scenes but also provides effective technical support for real - time interactive applications.