LRM: Large Reconstruction Model for Single Image to 3D

Yicong Hong,Kai Zhang,Jiuxiang Gu,Sai Bi,Yang Zhou,Difan Liu,Feng Liu,Kalyan Sunkavalli,Trung Bui,Hao Tan

2024-03-09

Abstract:We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. In contrast to many previous methods that are trained on small-scale datasets such as ShapeNet in a category-specific fashion, LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image. We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects, including both synthetic renderings from Objaverse and real captures from MVImgNet. This combination of a high-capacity model and large-scale training data empowers our model to be highly generalizable and produce high-quality 3D reconstructions from various testing inputs, including real-world in-the-wild captures and images created by generative models. Video demos and interactable 3D meshes can be found on our LRM project webpage: <a class="link-external link-https" href="https://yiconghong.me/LRM" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition,Artificial Intelligence,Graphics,Machine Learning

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper proposes a new method called the Large Reconstruction Model (LRM), aimed at quickly reconstructing high-quality 3D models from a single input image. Specifically, LRM has the following features: 1. **Efficiency**: LRM can complete 3D reconstruction within 5 seconds. 2. **Large-scale Model Architecture**: It adopts a Transformer-based encoder-decoder architecture, containing 500 million learnable parameters. 3. **Large-scale Training Dataset**: It is trained end-to-end on a large-scale multi-view dataset containing approximately 1 million objects, including both synthetic renderings and real captured data. 4. **High Generality**: It can handle various test inputs, including real-world in-the-wild captured images and images created by generative models. Through these designs, LRM addresses several key issues present in traditional methods: - Early learning methods typically performed well only on specific categories because they leveraged category-specific data priors to infer overall shapes. - Recent methods rely on complex parameter tuning and regularization, and are limited by pre-trained 2D generative models. - Some methods require optimizing 3D geometry one by one, which is often slow and impractical. By combining a large-scale model architecture with large-scale training data, LRM achieves efficient and highly general 3D reconstruction.

LRM: Large Reconstruction Model for Single Image to 3D

MeshLRM: Large Reconstruction Model for High-Quality Mesh

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Multi-View Large Reconstruction Model via Geometry-Aware Positional Encoding and Attention

Real3D: Scaling Up Large Reconstruction Models with Real-World Images

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model

Large Spatial Model: End-to-end Unposed Images to Semantic 3D

LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

FaceLift: Single Image to 3D Head with View Generation and GS-LRM

3D-LFM: Lifting Foundation Model

Fast Radiance Field Reconstruction from Sparse Inputs