TPA-Net: Generate A Dataset for Text to Physics-based Animation

Yuxing Qiu,Feng Gao,Minchen Li,Govind Thattai,Yin Yang,Chenfanfu Jiang
DOI: https://doi.org/10.48550/arXiv.2211.13887
2022-11-25
Abstract:Recent breakthroughs in Vision-Language (V&L) joint research have achieved remarkable results in various text-driven tasks. High-quality Text-to-video (T2V), a task that has been long considered mission-impossible, was proven feasible with reasonably good results in latest works. However, the resulting videos often have undesired artifacts largely because the system is purely data-driven and agnostic to the physical laws. To tackle this issue and further push T2V towards high-level physical realism, we present an autonomous data generation technique and a dataset, which intend to narrow the gap with a large number of multi-modal, 3D Text-to-Video/Simulation (T2V/S) data. In the dataset, we provide high-resolution 3D physical simulations for both solids and fluids, along with textual descriptions of the physical phenomena. We take advantage of state-of-the-art physical simulation methods (i) Incremental Potential Contact (IPC) and (ii) Material Point Method (MPM) to simulate diverse scenarios, including elastic deformations, material fractures, collisions, turbulence, etc. Additionally, high-quality, multi-view rendering videos are supplied for the benefit of T2V, Neural Radiance Fields (NeRF), and other communities. This work is the first step towards fully automated Text-to-Video/Simulation (T2V/S). Live examples and subsequent work are at <a class="link-external link-https" href="https://sites.google.com/view/tpa-net" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition,Graphics,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the text - to - video (T2V) generation task, due to the lack of high - quality datasets, the generated videos often contain unwanted artifacts, and these systems are usually purely data - driven and have little knowledge of physical laws. To overcome these problems, the paper proposes an automatic data generation technique and the corresponding dataset, aiming to narrow this gap through a large amount of multi - modal 3D text - to - video/simulation (T2V/S) data, thus promoting T2V to a higher level of physical realism. Specifically, the main contributions of the paper include: - Proposing a method that can automatically generate high - quality 3D physically realistic animations, accompanied by sentences describing physical phenomena, covering a wide range of common real - world dynamics. - Using the state - of - the - art physical simulation methods and rendering tools, providing high - quality T2V and 3D T2S datasets for the first time, which will widely benefit T2I, T2V, T2 - 3D, T2S and T2A research. Through this method, the paper not only solves the data shortage problem existing in current T2V research, but also improves the authenticity and physical consistency of the generated videos by introducing physical simulation, laying the foundation for future research and development.