Abstract:Dynamic reconstruction and spatiotemporal novel-view synthesis of non-rigidly deforming scenes recently gained increased attention. While existing work achieves impressive quality and performance on multi-view or teleporting camera setups, most methods fail to efficiently and faithfully recover motion and appearance from casual monocular captures. This paper contributes to the field by introducing a new method for dynamic novel view synthesis from monocular video, such as casual smartphone captures. Our approach represents the scene as a $\textit{dynamic neural point cloud}$, an implicit time-conditioned point distribution that encodes local geometry and appearance in separate hash-encoded neural feature grids for static and dynamic regions. By sampling a discrete point cloud from our model, we can efficiently render high-quality novel views using a fast differentiable rasterizer and neural rendering network. Similar to recent work, we leverage advances in neural scene analysis by incorporating data-driven priors like monocular depth estimation and object segmentation to resolve motion and depth ambiguities originating from the monocular captures. In addition to guiding the optimization process, we show that these priors can be exploited to explicitly initialize our scene representation to drastically improve optimization speed and final image quality. As evidenced by our experimental evaluation, our dynamic point cloud model not only enables fast optimization and real-time frame rates for interactive applications, but also achieves competitive image quality on monocular benchmark sequences. Our project page is available at <a class="link-external link-https" href="https://moritzkappel.github.io/projects/dnpc" rel="external noopener nofollow">this https URL</a>.

Novel View Synthesis of Human Interactions from Sparse Multi-view Videos

Ivs-Net: Learning Human View Synthesis from Internet Videos

Novel View Synthesis of Dynamic Human with Sparse Cameras.

Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans

Novel View Synthesis of Humans using Differentiable Rendering

Novel-View Human Action Synthesis

View Synthesis of Dynamic Scenes based on Deep 3D Mask Volume

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

Human Pose Manipulation and Novel View Synthesis using Differentiable Rendering

Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras

Fast View Synthesis of Casual Videos with Soup-of-Planes

Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views

Reconstructing Close Human Interactions from Multiple Views

View Synthesis from Multi-View RGB Data Using Multilayered Representation and Volumetric Estimation

Novel View Synthesis from only a 6-DoF Camera Pose by Two-stage Networks

Replay: Multi-modal Multi-view Acted Videos for Casual Holography

DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras

D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video

Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis

Free-viewpoint video of human actors using multiple handheld Kinects.