Abstract:We formulate SVBRDF estimation from photographs as a diffusion task. To model the distribution of spatially varying materials, we first train a novel unconditional SVBRDF diffusion backbone model on a large set of 312,165 synthetic spatially varying material exemplars. This SVBRDF diffusion backbone model, named MatFusion, can then serve as a basis for refining a conditional diffusion model to estimate the material properties from a photograph under controlled or uncontrolled lighting. Our backbone MatFusion model is trained using only a loss on the reflectance properties, and therefore refinement can be paired with more expensive rendering methods without the need for backpropagation during training. Because the conditional SVBRDF diffusion models are generative, we can synthesize multiple SVBRDF estimates from the same input photograph from which the user can select the one that best matches the users' expectation. We demonstrate the flexibility of our method by refining different SVBRDF diffusion models conditioned on different types of incident lighting, and show that for a single photograph under colocated flash lighting our method achieves equal or better accuracy than existing SVBRDF estimation methods.

What problem does this paper attempt to address?

The problem this paper attempts to solve is: estimating the SVBRDF (Spatially Varying Bidirectional Reflectance Distribution Function) parameters of spatially varying materials from a single photograph to reproduce the visual appearance of real-world materials. Specifically, the paper proposes a diffusion model-based approach, called MatFusion, for generating and estimating SVBRDF parameter maps. ### Background and Motivation 1. **Challenges of SVBRDF Estimation**: - Estimating SVBRDF parameters from a single photograph is a challenging problem that requires balancing multiple objectives, such as ease of capture, robustness, accuracy of reproduction, and suitability for post-editing. - Existing methods, while convenient and capable of generating reasonable SVBRDFs, suffer from issues like ambiguity and lack of parameter adjustment capability, leading to the inability to generate expected material properties in some cases. - These methods are typically trained for specific lighting conditions, requiring retraining of the network when input lighting conditions change. 2. **Advantages of Diffusion Models**: - Inspired by the recent success of diffusion models in image processing tasks, the paper models the SVBRDF estimation task as a diffusion task. - Diffusion models can start from pure random noise and gradually denoise to generate SVBRDF parameter maps, thus avoiding the ambiguity problem of traditional methods. - By generating multiple candidate SVBRDFs, users can select the result that best meets their expectations. ### Method Overview 1. **Unconditional SVBRDF Diffusion Model**: - The paper first trains an unconditional SVBRDF diffusion model, called MatFusion, to generate SVBRDF parameter maps (including diffuse albedo, specular albedo, specular roughness, and normal maps). - This model is trained using a large-scale synthetic SVBRDF dataset containing 312,165 unique training samples. 2. **Conditional SVBRDF Diffusion Models**: - Based on the MatFusion model, the paper further proposes three conditional SVBRDF diffusion models, each suitable for different types of input lighting conditions: - **Same-Position Flash Lighting**: Suitable for single photographs taken under same-position flash lighting. - **Natural Lighting**: Suitable for single photographs taken under natural lighting. - **Flash/No-Flash Pair**: Suitable for a pair of flash and no-flash photographs. 3. **Generating Diverse SVBRDFs**: - By changing the random seed, multiple candidate SVBRDFs can be generated, allowing users to select the result that best meets their expectations. - The paper proposes three selection strategies: fixed seed, rendering error selection, and manual selection. ### Experimental Results 1. **Synthetic Data Results**: - The paper presents the estimation results of the three conditional diffusion models on synthetic materials, showing the performance of each model under different lighting conditions. - The same-position flash model shows the most consistent results under known lighting conditions but sometimes fails or produces unexpected texture changes on small features. - The natural lighting model exhibits greater variability in accuracy but still generates reasonable SVBRDFs. - The flash/no-flash model benefits from input without strong specular highlights, better recovering diffuse textures but may underestimate diffuse albedo or specular roughness in some cases. 2. **Comparison with Existing Methods**: - The paper compares the same-position flash model with existing adversarial direct inference methods, showing performance under different selection strategies. ### Conclusion The paper proposes a diffusion model-based SVBRDF estimation method that can generate multiple candidate SVBRDFs from a single photograph and allows users to select the result that best meets their expectations. This method performs well under different lighting conditions, addressing the ambiguity and lack of parameter adjustment capability of existing methods.

MatFusion: A Generative Diffusion Model for SVBRDF Capture

MatFusion: A Generative Diffusion Model for SVBRDF Capture

ReflectanceFusion: Diffusion-based text to SVBRDF Generation

Ultra-High Resolution SVBRDF Recovery from a Single Image

DiffMat: Latent diffusion models for image-guided material generation

Diffuse Map Guiding Unsupervised Generative Adversarial Network for SVBRDF Estimation

Flexible SVBRDF Capture with a Multi‐Image Deep Network

Intrinsic Image Diffusion for Indoor Single-view Material Estimation

Single-image SVBRDF capture with a rendering-aware deep network

Guided Fine-Tuning for Large-Scale Material Transfer

MaterialGAN: Reflectance Capture using a Generative SVBRDF Model

ControlMat: A Controlled Generative Approach to Material Capture

A Statistical Method for SVBRDF Approximation from Video Sequences in General Lighting Conditions

Single‐Image SVBRDF Estimation with Learned Gradient Descent

Relightify: Relightable 3D Faces from a Single Image via Diffusion Models

Multi-view Gradient Consistency for SVBRDF Estimation of Complex Scenes under Natural Illumination

RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation

DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

SVBRDF Recovery from a Single Image with Highlights Using a Pre-trained Generative Adversarial Network

MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors