Abstract:This paper presents DENSER, an efficient and effective approach leveraging 3D Gaussian splatting (3DGS) for the reconstruction of dynamic urban environments. While several methods for photorealistic scene representations, both implicitly using neural radiance fields (NeRF) and explicitly using 3DGS have shown promising results in scene reconstruction of relatively complex dynamic scenes, modeling the dynamic appearance of foreground objects tend to be challenging, limiting the applicability of these methods to capture subtleties and details of the scenes, especially far dynamic objects. To this end, we propose DENSER, a framework that significantly enhances the representation of dynamic objects and accurately models the appearance of dynamic objects in the driving scene. Instead of directly using Spherical Harmonics (SH) to model the appearance of dynamic objects, we introduce and integrate a new method aiming at dynamically estimating SH bases using wavelets, resulting in better representation of dynamic objects appearance in both space and time. Besides object appearance, DENSER enhances object shape representation through densification of its point cloud across multiple scene frames, resulting in faster convergence of model training. Extensive evaluations on KITTI dataset show that the proposed approach significantly outperforms state-of-the-art methods by a wide margin. Source codes and models will be uploaded to this repository <a class="link-external link-https" href="https://github.com/sntubix/denser" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

This paper attempts to solve the problems encountered in scene reconstruction in dynamic urban environments, especially how to effectively represent and model the appearance of dynamic objects. Existing methods, such as using Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS), although showing promising results in scene reconstruction of relatively complex dynamic scenes, still face challenges in modeling the appearance of dynamic foreground objects, which limits the ability of these methods to capture scene details and subtleties, especially for dynamic objects at long distances. To overcome these limitations, the authors propose the DENSER framework. ### Main contributions of DENSER include: 1. **Enhanced dynamic object representation**: DENSER introduces a new method to dynamically estimate the spherical harmonic (SH) basis using wavelet transform, thus better representing the appearance of dynamic objects in space and time. This method can capture the changes in the appearance of dynamic objects more effectively than directly using spherical harmonics. 2. **Improved object shape representation**: DENSER improves the representation accuracy of object shapes by densifying point clouds across multiple scene frames, thereby accelerating the convergence speed of model training. 3. **Scene graph representation**: DENSER adopts the scene graph representation method, decomposing the scene into background nodes and dynamic object nodes, and optimizing each node separately. The background nodes are directly optimized in the world reference frame, while the dynamic object nodes are optimized in their own object reference frame and can be transformed into the world reference frame through a transformation matrix. 4. **Optimization method**: DENSER is optimized using a composite loss function, including color loss, depth loss, and cumulative loss, to ensure the consistency and realism of the scene's appearance, geometry, and occupancy probability. ### Experimental results The paper conducts extensive evaluations on the KITTI dataset, and the results show that DENSER significantly outperforms existing methods in dynamic scene reconstruction. Specifically, DENSER achieves the best performance in metrics such as Peak Signal - to - Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS). ### Applications and extensions DENSER not only performs well in dynamic scene reconstruction but also supports photorealistic scene editing, such as vehicle swapping, translation, and rotation, as well as trajectory modification. These functions are crucial for improving the performance of autonomous driving systems and dealing with complex real - world conditions. In conclusion, DENSER significantly improves the quality and efficiency of scene reconstruction in dynamic urban environments by introducing the method of dynamically estimating the spherical harmonic basis and the point cloud densification technique. Future work will focus on extending this method to handle deformable dynamic objects, such as pedestrians and cyclists.

DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments

Urban4D: Semantic-Guided 4D Gaussian Splatting for Urban Scene Reconstruction

Dynamic 3D Gaussian Fields for Urban Areas

The Potential of Neural Radiance Fields and 3D Gaussian Splatting for 3D Reconstruction from Aerial Imagery

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction

GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction

Enhanced 3D Urban Scene Reconstruction and Point Cloud Densification using Gaussian Splatting and Google Earth Imagery

SUDS: Scalable Urban Dynamic Scenes

GaRField++: Reinforced Gaussian Radiance Fields for Large-Scale 3D Scene Reconstruction

DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing

AAGS: Appearance-Aware 3D Gaussian Splattingwith Unconstrained Photo Collections

Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects

WildGaussians: 3D Gaussian Splatting in the Wild

Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections

Unbounded-GS: Extending 3D Gaussian Splatting with Hybrid Representation for Unbounded Large-Scale Scene Reconstruction

DENSER Cities: A System for Dense Efficient Reconstructions of Cities

S^3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving