Template-free Articulated Gaussian Splatting for Real-time Reposable Dynamic View Synthesis

Diwen Wan,Yuxiang Wang,Ruijie Lu,Gang Zeng
2024-12-07
Abstract:While novel view synthesis for dynamic scenes has made significant progress, capturing skeleton models of objects and re-posing them remains a challenging task. To tackle this problem, in this paper, we propose a novel approach to automatically discover the associated skeleton model for dynamic objects from videos without the need for object-specific templates. Our approach utilizes 3D Gaussian Splatting and superpoints to reconstruct dynamic objects. Treating superpoints as rigid parts, we can discover the underlying skeleton model through intuitive cues and optimize it using the kinematic model. Besides, an adaptive control strategy is applied to avoid the emergence of redundant superpoints. Extensive experiments demonstrate the effectiveness and efficiency of our method in obtaining re-posable 3D objects. Not only can our approach achieve excellent visual fidelity, but it also allows for the real-time rendering of high-resolution images.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problems of object pose relocalization and novel view synthesis in dynamic scenes, especially achieving this goal without the need for templates or prior knowledge of specific categories. Specifically, the paper proposes a new method to reconstruct dynamic objects by automatically discovering associated skeletal models from videos, and allows these objects to be rendered in real - time and have their poses adjusted while maintaining high - quality visual effects. ### Main Problem Description 1. **Pose Relocalization in Dynamic Scenes**: - Current methods mainly focus on reproducing the motion in videos, which means they can only render images of new views within a given time range, and it is difficult to explicitly relocalize or control the motion of individual objects in the scene. - For some specific categories (such as the human body or human head), existing methods rely on category - specific templates to support the manipulation of reconstructed objects, but this method is difficult to generalize to large - scale natural scenes or artificially - made jointed objects. 2. **High - Quality and Real - Time Rendering**: - Some existing methods can achieve high - quality visual effects, but cannot achieve real - time rendering, which limits their practical applications. - Other methods can achieve real - time rendering, but have deficiencies in visual quality and flexibility. ### Solutions Proposed in the Paper To address the above challenges, the paper proposes the following innovations: - **3D Gaussian Splatting and Superpoints**: - Use 3D Gaussian Splatting and superpoints to represent the appearance, skeletal model, and motion of objects. Each superpoint binds Gaussian points with similar motion, thereby decomposing the object into multiple rigid parts. - **Automatically Generate Skeletal Models**: - Automatically discover and optimize skeletal models based on the motion cues of superpoints in videos, without the need for any category - specific templates or pose annotations. - **Adaptive Control Strategy**: - Adopt an adaptive control strategy and a regularization loss function to reduce redundant superpoints, simplify the skeletal model, and avoid overfitting. - **Real - Time Rendering**: - Achieve high - quality novel view synthesis and be able to achieve real - time rendering (> 100 FPS) on various datasets. ### Summary The main contribution of the paper is to provide a general method without templates or pose annotations, which can reconstruct relocatable jointed objects from multi - view videos and achieve real - time rendering. This provides new possibilities for virtual reality, augmented reality, game production, and other fields.