Abstract:We present UrbanIR (Urban Scene Inverse Rendering), a new inverse graphics model that enables realistic, free-viewpoint renderings of scenes under various lighting conditions with a single video. It accurately infers shape, albedo, visibility, and sun and sky illumination from wide-baseline videos, such as those from car-mounted cameras, differing from NeRF's dense view settings. In this context, standard methods often yield subpar geometry and material estimates, such as inaccurate roof representations and numerous 'floaters'. UrbanIR addresses these issues with novel losses that reduce errors in inverse graphics inference and rendering artifacts. Its techniques allow for precise shadow volume estimation in the original scene. The model's outputs support controllable editing, enabling photorealistic free-viewpoint renderings of night simulations, relit scenes, and inserted objects, marking a significant improvement over existing state-of-the-art methods.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper "UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video" aims to address the problem of inverse rendering for large-scale, unbounded urban scenes from a single video. Specifically, the paper proposes a novel inverse graphics model—UrbanIR (Urban Scene Inverse Rendering), which can generate realistic free-viewpoint rendered images under various lighting conditions from a single video. ### Main Contributions 1. **Inverse Rendering and Relighting**: - UrbanIR can recover intrinsic properties of the scene such as shape, albedo, visibility, and sunlight and sky illumination from a single video. - Compared to existing methods, UrbanIR significantly reduces artifacts in sparse view settings, especially when dealing with geometric structures like rooftops. 2. **Nighttime Simulation**: - A physics-based nighttime simulation framework is proposed, capable of generating controllable, realistic, physically plausible, and consistent nighttime scene simulations from a single daytime video. - This is the first method to achieve free-viewpoint nighttime simulation from a single daytime video. 3. **Large-Scale Scene Handling**: - UrbanIR is the first application capable of inverse rendering and relighting from a single monocular video in large-scale outdoor scenes. - Unlike other methods, UrbanIR does not require additional information such as multiple light sources, depth sensing, or LiDAR. ### Method Overview - **Scene Representation**: - Uses a spatial hash-based voxel NeRF representation (Instant-NGP) to encode the scene's albedo, normals, semantics, and visibility. - The scene model implicitly encodes the geometry of the scene, providing compact and consistent appearance modeling. - **Lighting Model**: - Employs a parameterized sun-sky model to encode outdoor lighting, including sun color, azimuth, and zenith angles, as well as ambient light and sky texture. - **Rendering**: - Projects the intrinsic properties and visibility maps of the scene onto the image plane through volumetric rendering, then generates the final result through a shading process. - Uses the Blinn-Phong model combined with sun and sky illumination terms for shading, ensuring realistic shadow effects under different lighting conditions. - **Inverse Graphics**: - Trains the scene model and lighting model by jointly optimizing all properties (including rendering loss, visibility loss, normal loss, semantic loss, and regularization loss). - Visibility loss guides the recovery of geometric structures, improving shadow synthesis effects. ### Applications - **Changing Sun Angle**: The position of the sun can be adjusted to generate scene images at different times. - **Day-Night Transition**: New light sources (such as street lamps and car headlights) can be inserted at night to generate realistic nighttime scenes. - **Object Insertion**: Virtual objects can be inserted into the scene, generating realistic shadow effects. ### Experimental Results - **Datasets**: Evaluations were conducted on KITTI-360 and Waymo Open Dataset. - **Baseline Comparison**: Compared with methods like FEGR, Instruct NeRF2NeRF, NeRF-OSR, and RelightNet, UrbanIR performed excellently across various metrics. - **Quantitative Evaluation**: UrbanIR outperformed other methods in PSNR, SSIM, and LPIPS metrics under novel view synthesis and novel lighting conditions. - **Qualitative Evaluation**: Demonstrated the effectiveness of UrbanIR in removing existing shadows, changing shadows on building surfaces, and modifying sky textures. ### Limitations - The optimization process may be affected by noise predictions from prior models, requiring careful adjustment of the loss functions. - In some cases, handling shadows remains challenging, especially when dealing with complex urban driving sequences. Overall, UrbanIR has made significant progress in the field of inverse rendering and relighting, particularly excelling in handling large-scale urban scenes.

UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video

Neural Fields meet Explicit Geometric Representation for Inverse Rendering of Urban Scenes

Neural Fields Meet Explicit Geometric Representations for Inverse Rendering of Urban Scenes

IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images

PhyIR: Physics-based Inverse Rendering for Panoramic Indoor Images

Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF from a Single Image

SIRe-IR: Inverse Rendering for BRDF Reconstruction with Shadow and Illumination Removal in High-Illuminance Scenes

An Infrared Image Synthesis Model For Large-Scale Complex Urban Scene

Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes

A Global Infrared Image Synthesis Model for Large-Scale Complex Urban Scene

IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes

GS-IR: 3D Gaussian Splatting for Inverse Rendering

Scalable Image-Based Indoor Scene Rendering with Reflections

Neural Inverse Rendering of an Indoor Scene from a Single Image

SIR: Multi-view Inverse Rendering with Decomposable Shadow for Indoor Scenes

Computer Graphics in China: A practical approach for real-time illumination estimation of outdoor videos

GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering

ReN Human: Learning Relightable Neural Implicit Surfaces for Animatable Human Rendering

MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation

Online Illumination Estimation of Outdoor Scenes Based on Videos Containing No Shadow Area.

Point-Based Neural Scene Rendering for Street Views