Enhancing Zero-shot 3D Photography Via Mesh-represented Image Inpainting

Yuejian Fang,Xiaodong Wang
DOI: https://doi.org/10.1109/icme57554.2024.10687811
2024-01-01
Abstract:3D photography techniques create a consistent 3D video given a single image. Existing methods use multi-plane images or layered depth images to represent 3D scenes and then render subsequent novel views. However, these methods involve warping pixels across frames, which easily causes distortions, harms visual coherence, and lacks controllable generation with textual prompts. Moreover, these methods require training models on adequately large datasets beforehand, whether they are tailored to a specific domain or open domain, which requires high computational resources. To address these issues, we propose an enhanced zero-shot 3D photography method, termed Zero-3DP, to enable rendering any image into a 3D video anytime. We first integrate meshes to represent 3D scenes in our pipeline and update meshes along predefined trajectories, ensuring geometry consistency via depth alignment and prior preservation in rendering. To maintain semantic consistency, we test-time fine-tune the diffusion-based inpainting module for each incoming frame. Experiments on two public benchmarks show that without previous training, just relying on test-time fine-tuning in inference, Zero-3DP can match or beat the state-of-the-art methods.
What problem does this paper attempt to address?