LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Jiaxiang Tang,Zhaoxi Chen,Xiaokang Chen,Tengfei Wang,Gang Zeng,Ziwei Liu

2024-02-08

Abstract:3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: 1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.

Computer Science

What problem does this paper attempt to address?

The problems that this paper attempts to solve are two main bottlenecks in high - resolution 3D content generation: 1) **Inefficient 3D representation**: Current methods are inefficient in 3D representation and cannot generate detailed 3D content while maintaining high resolution; 2) **Complex 3D backbone network**: Existing methods rely on 3D backbone networks with a large number of parameters, which limits the training resolution and the generated details. To overcome these challenges, the authors propose a new framework named **Large Multi - View Gaussian Model (LGM)**. The main contributions of this framework include: 1. **Efficient and powerful 3D representation**: LGM uses multi - view Gaussian features as 3D representation. This representation method is not only efficient but also can fuse multiple views for differentiable rendering. 2. **Efficient 3D backbone network**: LGM introduces an asymmetric U - Net as a high - throughput backbone network. This network can operate on multi - view images and generate 3D content from text or single - view image inputs through a multi - view diffusion model. 3. **High - resolution 3D content generation**: LGM can generate high - resolution 3D content within 5 seconds and at the same time increase the training resolution to 512, thus achieving rapid generation of high - resolution 3D content. Through these innovations, LGM not only improves the quality and speed of 3D content generation but also solves the deficiencies of existing methods in generating complex geometric structures and detailed textures.

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

L4GM: Large 4D Gaussian Reconstruction Model

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

GVGEN: Text-to-3D Generation with Volumetric Representation

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

Large Point-to-Gaussian Model for Image-to-3D Generation

GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images