Abstract:Typical inverse rendering methods focus on learning implicit neural scene representations by modeling the geometry, materials and illumination separately, which entails significant computations for optimization. In this work we design a Unified Voxelization framework for explicit learning of scene representations, dubbed UniVoxel, which allows for efficient modeling of the geometry, materials and illumination jointly, thereby accelerating the inverse rendering significantly. To be specific, we propose to encode a scene into a latent volumetric representation, based on which the geometry, materials and illumination can be readily learned via lightweight neural networks in a unified manner. Particularly, an essential design of UniVoxel is that we leverage local Spherical Gaussians to represent the incident light radiance, which enables the seamless integration of modeling illumination into the unified voxelization framework. Such novel design enables our UniVoxel to model the joint effects of direct lighting, indirect lighting and light visibility efficiently without expensive multi-bounce ray tracing. Extensive experiments on multiple benchmarks covering diverse scenes demonstrate that UniVoxel boosts the optimization efficiency significantly compared to other methods, reducing the per-scene training time from hours to 18 minutes, while achieving favorable reconstruction quality. Code is available at <a class="link-external link-https" href="https://github.com/freemantom/UniVoxel" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the inverse rendering problem of estimating the geometry, material, and illumination properties of 3D scenes from multi - view 2D images in computer vision and graphics. Traditional inverse rendering methods usually model the geometry, material, and illumination of the scene through implicit neural representations, which require a large amount of computational resources for optimization. Specifically, these methods usually need to model each property separately, use deep MLP networks, and require expensive multi - hop ray tracing when modeling illumination, resulting in low optimization efficiency and training times as long as several hours or even days. To solve these problems, the paper proposes a unified voxelization framework (UniVoxel) for explicitly learning scene representations. This framework allows for efficient joint modeling of geometry, material, and illumination, thereby significantly accelerating the inverse rendering process. The specific contributions are as follows: 1. **Unified Voxelization Framework**: UniVoxel designs a unified voxelization framework that can efficiently learn all key scene properties, including geometry, material, and illumination. 2. **Spherical Gaussian - Based Illumination Modeling**: UniVoxel uses Spherical Gaussians (SG) to represent the local incident light radiance, eliminating the need for multi - hop ray tracing, making the illumination modeling seamlessly integrated with the modeling of other scene properties and significantly improving the training efficiency. 3. **Experimental Verification**: Extensive experiments show that UniVoxel significantly improves the optimization efficiency on multiple benchmark datasets, reducing the training time per scene from several hours to 18 minutes while maintaining good reconstruction quality. Through these innovations, UniVoxel effectively solves the problem of low optimization efficiency in traditional inverse rendering methods and provides a more efficient and feasible solution for practical applications.

UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation

Joint Optimization of Triangle Mesh, Material, and Light from Neural Fields with Neural Radiance Cache

VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for Enhanced Indoor View Synthesis

UniVision: A Unified Framework for Vision-Centric 3D Perception

Scalable Neural Indoor Scene Rendering.

Unified Gaussian Primitives for Scene Representation and Rendering

V4d: voxel for 4d novel view synthesis

VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources

Fast Neural Representations for Direct Volume Rendering

Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF from a Single Image

Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

NeuVV: Neural Volumetric Videos with Immersive Rendering and Editing

ProbIBR: Fast Image-Based Rendering with Learned Probability-Guided Sampling

Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction

UniRender: Reconstructing 3D Surfaces from Aerial Images with a Unified Rendering Scheme

GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes

Uni3D: Exploring Unified 3D Representation at Scale

Voxel-Mesh Hybrid Representation for Real-Time View Synthesis

Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes

Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency

Neural Fields meet Explicit Geometric Representation for Inverse Rendering of Urban Scenes