Learning Part-aware 3D Representations by Fusing 2D Gaussians and Superquadrics

Zhirui Gao,Renjiao Yi,Yuhang Huang,Wei Chen,Chenyang Zhu,Kai Xu
2024-08-20
Abstract:Low-level 3D representations, such as point clouds, meshes, NeRFs, and 3D Gaussians, are commonly used to represent 3D objects or scenes. However, humans usually perceive 3D objects or scenes at a higher level as a composition of parts or structures rather than points or voxels. Representing 3D as semantic parts can benefit further understanding and applications. We aim to solve part-aware 3D reconstruction, which parses objects or scenes into semantic parts. In this paper, we introduce a hybrid representation of superquadrics and 2D Gaussians, trying to dig 3D structural clues from multi-view image inputs. Accurate structured geometry reconstruction and high-quality rendering are achieved at the same time. We incorporate parametric superquadrics in mesh forms into 2D Gaussians by attaching Gaussian centers to faces in meshes. During the training, superquadrics parameters are iteratively optimized, and Gaussians are deformed accordingly, resulting in an efficient hybrid representation. On the one hand, this hybrid representation inherits the advantage of superquadrics to represent different shape primitives, supporting flexible part decomposition of scenes. On the other hand, 2D Gaussians are incorporated to model the complex texture and geometry details, ensuring high-quality rendering and geometry reconstruction. The reconstruction is fully unsupervised. We conduct extensive experiments on data from DTU and ShapeNet datasets, in which the method decomposes scenes into reasonable parts, outperforming existing state-of-the-art approaches.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the problem of **part-aware 3D reconstruction**. Specifically, the authors propose a novel hybrid representation method that integrates 2D Gaussian distributions and superquadrics to parse and reconstruct different semantic parts of 3D scenes. Traditional methods typically use low-level representations such as point clouds, voxels, or meshes to reconstruct 3D objects or scenes, which do not align with the human understanding of 3D scenes. Humans usually perceive 3D objects or scenes as composed of multiple parts or structures rather than simple points or voxels. Therefore, the goal of this paper is to develop a method that can decompose 3D scenes into semantic parts, thereby better supporting tasks such as scene manipulation, editing, and scene graph generation. ### Main Contributions 1. **Novel Hybrid Representation Method**: Introduces a hybrid representation method that combines superquadrics and 2D Gaussian distributions. Superquadrics are used to model different shape primitives, while 2D Gaussian distributions capture complex textures and geometric details, ensuring high-quality rendering and geometric reconstruction. 2. **End-to-End Unsupervised Pipeline**: Proposes a fully unsupervised end-to-end pipeline for part-aware reconstruction at both block and point levels, introducing new regularization terms to simultaneously optimize superquadrics and 2D Gaussian distributions. 3. **Extensive Experimental Validation**: Conducts extensive experiments on the DTU and ShapeNet datasets, demonstrating the superiority of the proposed method in part-aware reconstruction, particularly in part segmentation and geometric detail modeling, surpassing existing state-of-the-art methods. ### Method Overview 1. **Hybrid Representation**: Combines superquadrics and 2D Gaussian distributions to form a compact hybrid representation. Each superquadric block is initialized with random parameters and gradually optimized during training. The 2D Gaussian distributions are attached to the surface of the superquadrics, sharing pose parameters to improve efficiency. 2. **Optimization Process**: Optimizes the hybrid representation by minimizing the rendering loss of multi-view images. To ensure stability and accuracy, multiple regularization terms such as coverage, overlap, simplicity, and opacity entropy are introduced. 3. **Stage-wise Optimization**: - **Block-level Optimization**: Optimizes the position and shape of the blocks through image rendering loss and multiple regularization terms, ensuring that the blocks cover meaningful areas without overlapping. - **Point-level Optimization**: Based on block-level optimization, further releases the constraints of the 2D Gaussian distributions, allowing them to move freely to fill complex areas, improving the accuracy of geometric detail modeling. ### Experimental Results Experimental results on the DTU and ShapeNet datasets show that the proposed method not only reasonably decomposes 3D scenes into different parts but also captures detailed geometric details, significantly outperforming existing state-of-the-art methods. Additionally, the method performs well when handling real data, demonstrating its potential for practical applications.