LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Jiaxiang Tang,Zhaoxi Chen,Xiaokang Chen,Tengfei Wang,Gang Zeng,Ziwei Liu
2024-02-08
Abstract:3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: 1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.
Computer Science
What problem does this paper attempt to address?
The problems that this paper attempts to solve are two main bottlenecks in high - resolution 3D content generation: 1) **Inefficient 3D representation**: Current methods are inefficient in 3D representation and cannot generate detailed 3D content while maintaining high resolution; 2) **Complex 3D backbone network**: Existing methods rely on 3D backbone networks with a large number of parameters, which limits the training resolution and the generated details. To overcome these challenges, the authors propose a new framework named **Large Multi - View Gaussian Model (LGM)**. The main contributions of this framework include: 1. **Efficient and powerful 3D representation**: LGM uses multi - view Gaussian features as 3D representation. This representation method is not only efficient but also can fuse multiple views for differentiable rendering. 2. **Efficient 3D backbone network**: LGM introduces an asymmetric U - Net as a high - throughput backbone network. This network can operate on multi - view images and generate 3D content from text or single - view image inputs through a multi - view diffusion model. 3. **High - resolution 3D content generation**: LGM can generate high - resolution 3D content within 5 seconds and at the same time increase the training resolution to 512, thus achieving rapid generation of high - resolution 3D content. Through these innovations, LGM not only improves the quality and speed of 3D content generation but also solves the deficiencies of existing methods in generating complex geometric structures and detailed textures.