TPA-Net: Generate A Dataset for Text to Physics-based Animation

Yuxing Qiu,Feng Gao,Minchen Li,Govind Thattai,Yin Yang,Chenfanfu Jiang

DOI: https://doi.org/10.48550/arXiv.2211.13887

2022-11-25

Abstract:Recent breakthroughs in Vision-Language (V&L) joint research have achieved remarkable results in various text-driven tasks. High-quality Text-to-video (T2V), a task that has been long considered mission-impossible, was proven feasible with reasonably good results in latest works. However, the resulting videos often have undesired artifacts largely because the system is purely data-driven and agnostic to the physical laws. To tackle this issue and further push T2V towards high-level physical realism, we present an autonomous data generation technique and a dataset, which intend to narrow the gap with a large number of multi-modal, 3D Text-to-Video/Simulation (T2V/S) data. In the dataset, we provide high-resolution 3D physical simulations for both solids and fluids, along with textual descriptions of the physical phenomena. We take advantage of state-of-the-art physical simulation methods (i) Incremental Potential Contact (IPC) and (ii) Material Point Method (MPM) to simulate diverse scenarios, including elastic deformations, material fractures, collisions, turbulence, etc. Additionally, high-quality, multi-view rendering videos are supplied for the benefit of T2V, Neural Radiance Fields (NeRF), and other communities. This work is the first step towards fully automated Text-to-Video/Simulation (T2V/S). Live examples and subsequent work are at <a class="link-external link-https" href="https://sites.google.com/view/tpa-net" rel="external noopener nofollow">this https URL</a>.

Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition,Graphics,Image and Video Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the text - to - video (T2V) generation task, due to the lack of high - quality datasets, the generated videos often contain unwanted artifacts, and these systems are usually purely data - driven and have little knowledge of physical laws. To overcome these problems, the paper proposes an automatic data generation technique and the corresponding dataset, aiming to narrow this gap through a large amount of multi - modal 3D text - to - video/simulation (T2V/S) data, thus promoting T2V to a higher level of physical realism. Specifically, the main contributions of the paper include: - Proposing a method that can automatically generate high - quality 3D physically realistic animations, accompanied by sentences describing physical phenomena, covering a wide range of common real - world dynamics. - Using the state - of - the - art physical simulation methods and rendering tools, providing high - quality T2V and 3D T2S datasets for the first time, which will widely benefit T2I, T2V, T2 - 3D, T2S and T2A research. Through this method, the paper not only solves the data shortage problem existing in current T2V research, but also improves the authenticity and physical consistency of the generated videos by introducing physical simulation, laying the foundation for future research and development.

TPA-Net: Generate A Dataset for Text to Physics-based Animation

A Recipe for Scaling Up Text-to-Video Generation with Text-free Videos

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation

T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data

Text-Animator: Controllable Visual Text Video Generation

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

VideoTetris: Towards Compositional Text-to-Video Generation

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes

Searching Priors Makes Text-to-Video Synthesis Better

PhysMotion: Physics-Grounded Dynamics From a Single Image

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation

TPA3D: Triplane Attention for Fast Text-to-3D Generation

Dynamic 6-Dof Volumetric Video Generation: Software Toolkit and Dataset

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis

InTraGen: Trajectory-controlled Video Generation for Object Interactions