DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments

Mahmud A. Mohamad,Gamal Elghazaly,Arthur Hubert,Raphael Frank
2024-09-16
Abstract:This paper presents DENSER, an efficient and effective approach leveraging 3D Gaussian splatting (3DGS) for the reconstruction of dynamic urban environments. While several methods for photorealistic scene representations, both implicitly using neural radiance fields (NeRF) and explicitly using 3DGS have shown promising results in scene reconstruction of relatively complex dynamic scenes, modeling the dynamic appearance of foreground objects tend to be challenging, limiting the applicability of these methods to capture subtleties and details of the scenes, especially far dynamic objects. To this end, we propose DENSER, a framework that significantly enhances the representation of dynamic objects and accurately models the appearance of dynamic objects in the driving scene. Instead of directly using Spherical Harmonics (SH) to model the appearance of dynamic objects, we introduce and integrate a new method aiming at dynamically estimating SH bases using wavelets, resulting in better representation of dynamic objects appearance in both space and time. Besides object appearance, DENSER enhances object shape representation through densification of its point cloud across multiple scene frames, resulting in faster convergence of model training. Extensive evaluations on KITTI dataset show that the proposed approach significantly outperforms state-of-the-art methods by a wide margin. Source codes and models will be uploaded to this repository <a class="link-external link-https" href="https://github.com/sntubix/denser" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problems encountered in scene reconstruction in dynamic urban environments, especially how to effectively represent and model the appearance of dynamic objects. Existing methods, such as using Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS), although showing promising results in scene reconstruction of relatively complex dynamic scenes, still face challenges in modeling the appearance of dynamic foreground objects, which limits the ability of these methods to capture scene details and subtleties, especially for dynamic objects at long distances. To overcome these limitations, the authors propose the DENSER framework. ### Main contributions of DENSER include: 1. **Enhanced dynamic object representation**: DENSER introduces a new method to dynamically estimate the spherical harmonic (SH) basis using wavelet transform, thus better representing the appearance of dynamic objects in space and time. This method can capture the changes in the appearance of dynamic objects more effectively than directly using spherical harmonics. 2. **Improved object shape representation**: DENSER improves the representation accuracy of object shapes by densifying point clouds across multiple scene frames, thereby accelerating the convergence speed of model training. 3. **Scene graph representation**: DENSER adopts the scene graph representation method, decomposing the scene into background nodes and dynamic object nodes, and optimizing each node separately. The background nodes are directly optimized in the world reference frame, while the dynamic object nodes are optimized in their own object reference frame and can be transformed into the world reference frame through a transformation matrix. 4. **Optimization method**: DENSER is optimized using a composite loss function, including color loss, depth loss, and cumulative loss, to ensure the consistency and realism of the scene's appearance, geometry, and occupancy probability. ### Experimental results The paper conducts extensive evaluations on the KITTI dataset, and the results show that DENSER significantly outperforms existing methods in dynamic scene reconstruction. Specifically, DENSER achieves the best performance in metrics such as Peak Signal - to - Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS). ### Applications and extensions DENSER not only performs well in dynamic scene reconstruction but also supports photorealistic scene editing, such as vehicle swapping, translation, and rotation, as well as trajectory modification. These functions are crucial for improving the performance of autonomous driving systems and dealing with complex real - world conditions. In conclusion, DENSER significantly improves the quality and efficiency of scene reconstruction in dynamic urban environments by introducing the method of dynamically estimating the spherical harmonic basis and the point cloud densification technique. Future work will focus on extending this method to handle deformable dynamic objects, such as pedestrians and cyclists.