Planar Gaussian Splatting

Farhad G. Zanjani,Hong Cai,Hanno Ackermann,Leila Mirvakhabova,Fatih Porikli
DOI: https://doi.org/10.48550/arXiv.2412.01931
2024-12-03
Abstract:This paper presents Planar Gaussian Splatting (PGS), a novel neural rendering approach to learn the 3D geometry and parse the 3D planes of a scene, directly from multiple RGB images. The PGS leverages Gaussian primitives to model the scene and employ a hierarchical Gaussian mixture approach to group them. Similar Gaussians are progressively merged probabilistically in the tree-structured Gaussian mixtures to identify distinct 3D plane instances and form the overall 3D scene geometry. In order to enable the grouping, the Gaussian primitives contain additional parameters, such as plane descriptors derived by lifting 2D masks from a general 2D segmentation model and surface normals. Experiments show that the proposed PGS achieves state-of-the-art performance in 3D planar reconstruction without requiring either 3D plane labels or depth supervision. In contrast to existing supervised methods that have limited generalizability and struggle under domain shift, PGS maintains its performance across datasets thanks to its neural rendering and scene-specific optimization mechanism, while also being significantly faster than existing optimization-based approaches.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of unsupervised learning and reconstruction of planar structures in 3D scenes from multi - view RGB images. Specifically, the authors propose the **Planar Gaussian Splatting (PGS)** method, which is a novel neural rendering method for directly learning 3D geometric structures and parsing 3D planes from unlabeled multi - view RGB images. #### Main problems and challenges: 1. **Limitations of existing methods**: - Existing supervised learning methods rely on 2D or 3D plane labels, which are difficult to obtain on a large scale and are costly. - These models have limited generalization ability on different datasets, especially performing poorly in the case of domain shift. - Implicit representation methods (such as NeRF and its variants) can generate high - quality novel view synthesis (NVS), but it is still challenging to extract explicit planar structures from them. 2. **Objectives**: - Propose a method without 3D plane labels or depth supervision that can achieve efficient 3D plane reconstruction on multi - view RGB images. - Improve the generalization ability and speed of the model through neural rendering and scene - specific optimization mechanisms. #### Solutions: - **Gaussian Primitives**: Use Gaussian primitives to model the scene and group these primitives through a Hierarchical Gaussian Mixture Model (HGMM) to identify different 3D plane instances. - **Plane Descriptors**: Introduce plane descriptors, which are extracted from masks generated by 2D segmentation models (such as SAM), and are jointly optimized with other parameters of Gaussian primitives. - **Probabilistic Grouping**: Gradually merge similar Gaussian primitives through a probabilistic method to form a tree - like Gaussian Mixture Model, thereby effectively parsing and optimizing 3D planes. #### Experimental results: - PGS shows state - of - the - art performance on multiple benchmark datasets (such as ScanNet and Replica), especially in the 3D plane instance segmentation task, significantly outperforming existing supervised and optimization methods. - PGS not only has a significant improvement in accuracy, but also is more than 60% faster in running time than the latest optimization methods (such as NMF). In conclusion, this paper proposes an innovative unsupervised method that solves the problem of 3D plane reconstruction in multi - view RGB images and surpasses existing methods in multiple aspects.