Abstract:Recent advancements in 2D/3D generative techniques have facilitated the generation of dynamic 3D objects from monocular videos. Previous methods mainly rely on the implicit neural radiance fields (NeRF) or explicit Gaussian Splatting as the underlying representation, and struggle to achieve satisfactory spatial-temporal consistency and surface appearance. Drawing inspiration from modern 3D animation pipelines, we introduce DreamMesh4D, a novel framework combining mesh representation with geometric skinning technique to generate high-quality 4D object from a monocular video. Instead of utilizing classical texture map for appearance, we bind Gaussian splats to triangle face of mesh for differentiable optimization of both the texture and mesh vertices. In particular, DreamMesh4D begins with a coarse mesh obtained through an image-to-3D generation procedure. Sparse points are then uniformly sampled across the mesh surface, and are used to build a deformation graph to drive the motion of the 3D object for the sake of computational efficiency and providing additional constraint. For each step, transformations of sparse control points are predicted using a deformation network, and the mesh vertices as well as the surface Gaussians are deformed via a novel geometric skinning algorithm, which is a hybrid approach combining LBS (linear blending skinning) and DQS (dual-quaternion skinning), mitigating drawbacks associated with both approaches. The static surface Gaussians and mesh vertices as well as the deformation network are learned via reference view photometric loss, score distillation loss as well as other regularizers in a two-stage manner. Extensive experiments demonstrate superior performance of our method. Furthermore, our method is compatible with modern graphic pipelines, showcasing its potential in the 3D gaming and film industry.

Content-based 3D Mosaics for Dynamic Urban Scenes

Content-Based Dynamic 3D Mosaics

3D Scene Modeling and Understanding from Image Sequences

3D and Moving Target Extraction from Dynamic Pushbroom Stereo Mosaics

Mosaic Based View Enlargement For Moving Objects In Moving Pictures

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

Fast Construction of Dynamic and Multi-Resolution 360° Panoramas from Video Sequences

Fast Generation Of Dynamic And Multi-Resolution 360 Degrees Panorama From Video Sequences

An automatic 2D to 3D conversion algorithm using multi-depth cues

Asymmetric representation for 3D panoramic video

Robust 3D Reconstruction with an RGB-D Camera

DP-MVS: Detail Preserving Multi-View Surface Reconstruction of Large-Scale Scenes

Contour Based Automatic Scene Segmentation in Image Sequences

Compression of Concentric Mosaic Scenery with Alignment and 3d Wavelet Transform

Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image

Three-Dimensional Dynamic Compressive Imaging System

DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation

A novel no-sensors 3D model reconstruction from monocular video frames for a dynamic environment

Construction 3 D Panoramic Model of Natural Scene from Real Image Sequences

3d sequential image mosaicing for underwater navigation and mapping

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos