On Scaling Up 3D Gaussian Splatting Training

Hexu Zhao,Haoyang Weng,Daohan Lu,Ang Li,Jinyang Li,Aurojit Panda,Saining Xie
2024-06-27
Abstract:3D Gaussian Splatting (3DGS) is increasingly popular for 3D reconstruction due to its superior visual quality and rendering speed. However, 3DGS training currently occurs on a single GPU, limiting its ability to handle high-resolution and large-scale 3D reconstruction tasks due to memory constraints. We introduce Grendel, a distributed system designed to partition 3DGS parameters and parallelize computation across multiple GPUs. As each Gaussian affects a small, dynamic subset of rendered pixels, Grendel employs sparse all-to-all communication to transfer the necessary Gaussians to pixel partitions and performs dynamic load balancing. Unlike existing 3DGS systems that train using one camera view image at a time, Grendel supports batched training with multiple views. We explore various optimization hyperparameter scaling strategies and find that a simple sqrt(batch size) scaling rule is highly effective. Evaluations using large-scale, high-resolution scenes show that Grendel enhances rendering quality by scaling up 3DGS parameters across multiple GPUs. On the Rubble dataset, we achieve a test PSNR of 27.28 by distributing 40.4 million Gaussians across 16 GPUs, compared to a PSNR of 26.28 using 11.2 million Gaussians on a single GPU. Grendel is an open-source project available at: <a class="link-external link-https" href="https://github.com/nyu-systems/Grendel-GS" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the issue of single GPU memory limitations faced by 3D Gaussian Splatting (3DGS) technology when handling large-scale, high-resolution 3D reconstruction tasks. Specifically, the research team designed a distributed training system named Grendel, aimed at extending the training capabilities of 3DGS through multi-GPU parallel computing. The main problems addressed by the paper include: 1. **Memory Limitation**: Current 3DGS training is typically constrained by the memory capacity of a single GPU, limiting its ability to handle high-resolution and large-scale scenes. 2. **Computational Efficiency**: Due to the computational bottleneck of a single GPU, the processing efficiency for large scenes is low. 3. **Support for Batch Training**: Traditional 3DGS training methods process only one view image at a time, whereas Grendel supports batch training, processing multiple view images simultaneously to improve efficiency. To overcome these challenges, Grendel employs the following strategies: - **Distributed Parameter Storage**: Distributes the parameters of 3DGS (such as position, shape, etc.) across multiple GPUs. - **Hybrid Parallelism**: Uses different parallel strategies at different stages, such as Gaussian-based parallelism and pixel-based parallelism. - **Sparse All-to-All Communication**: Utilizes the spatial locality characteristics of 3DGS to transmit only the Gaussian splats related to specific pixel blocks, reducing communication overhead. - **Dynamic Load Balancing**: Reallocates pixels based on the computation time from previous training iterations to balance the workload across different GPUs. - **Batch Training Optimization**: Proposes a simple square root rule to adjust learning rate and momentum parameters, maintaining good training performance even with increased batch sizes. Through these methods, Grendel effectively scales 3DGS training across multiple GPUs, achieving efficient rendering in large-scale, high-resolution scenes, and handling larger datasets than a single GPU can manage. Experimental results show that on the "Rubble" dataset, using 16 GPUs to distribute 40.4 million Gaussian splats, Grendel achieved a peak signal-to-noise ratio (PSNR) of 27.28, significantly surpassing the result of using a single GPU (11.2 million Gaussian splats achieving a PSNR of 26.28). Additionally, even for smaller scenes (such as the "Train" scene), Grendel was able to achieve speed improvements without compromising the quality of the test results (PSNR).