Optimized View and Geometry Distillation from Multi-view Diffuser

Youjia Zhang,Zikai Song,Junqing Yu,Yawei Luo,Wei Yang
DOI: https://doi.org/10.48550/arxiv.2312.06198
2023-01-01
Abstract:Generating multi-view images from a single input view using image-conditioneddiffusion models is a recent advancement and has shown considerable potential.However, issues such as the lack of consistency in synthesized views andover-smoothing in extracted geometry persist. Previous methods integratemulti-view consistency modules or impose additional supervisory to enhance viewconsistency while compromising on the flexibility of camera positioning andlimiting the versatility of view synthesis. In this study, we consider theradiance field optimized during geometry extraction as a more rigid consistencyprior, compared to volume and ray aggregation used in previous works. Wefurther identify and rectify a critical bias in the traditional radiance fieldoptimization process through score distillation from a multi-view diffuser. Weintroduce an Unbiased Score Distillation (USD) that utilizes unconditionednoises from a 2D diffusion model, greatly refining the radiance field fidelity.We leverage the rendered views from the optimized radiance field as the basisand develop a two-step specialization process of a 2D diffusion model, which isadept at conducting object-specific denoising and generating high-qualitymulti-view images. Finally, we recover faithful geometry and texture directlyfrom the refined multi-view images. Empirical evaluations demonstrate that ouroptimized geometry and view distillation technique generates comparable resultsto the state-of-the-art models trained on extensive datasets, all whilemaintaining freedom in camera positioning. Please see our project page athttps://youjiazhang.github.io/USD/.
What problem does this paper attempt to address?