Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space

Yuan Wang,Zhao Wang,Junhao Gong,Di Huang,Tong He,Wanli Ouyang,Jile Jiao,Xuetao Feng,Qi Dou,Shixiang Tang,Dan Xu
2024-06-17
Abstract:In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space. Traditional methods have primarily generated human motions in 3D, which, while detailed and realistic, are often limited by the scope of available 3D motion data in terms of both the size and the diversity. To address these limitations, we exploit extensive availability of 2D motion data. We present $\textbf{Holistic-Motion2D}$, the first comprehensive and large-scale benchmark for 2D whole-body motion generation, which includes over 1M in-the-wild motion sequences, each paired with high-quality whole-body/partial pose annotations and textual descriptions. Notably, Holistic-Motion2D is ten times larger than the previously largest 3D motion dataset. We also introduce a baseline method, featuring innovative $\textit{whole-body part-aware attention}$ and $\textit{confidence-aware modeling}$ techniques, tailored for 2D $\underline{\text T}$ext-driv$\underline{\text{EN}}$ whole-bo$\underline{\text D}$y motion gen$\underline{\text{ER}}$ation, namely $\textbf{Tender}$. Extensive experiments demonstrate the effectiveness of $\textbf{Holistic-Motion2D}$ and $\textbf{Tender}$ in generating expressive, diverse, and realistic human motions. We also highlight the utility of 2D motion for various downstream applications and its potential for lifting to 3D motion. The page link is: <a class="link-external link-https" href="https://holistic-motion2d.github.io" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the limitations of existing 3D human motion generation methods in terms of data volume and diversity. Specifically, although traditional 3D human motion generation methods can generate detailed and realistic motions, they are restricted by the scale and diversity of available 3D motion data. These limitations mainly stem from the high cost of high - precision 3D motion capture systems and their dependence on indoor environments, which restricts the wide - scale acquisition and application of 3D motion data. To overcome these limitations, the paper proposes a new direction, that is, generating complete human motions in 2D space. The paper introduces Holistic - Motion2D, which is the first large - scale 2D full - body motion generation benchmark dataset, containing more than 1,000,000 natural motion sequences, each of which is accompanied by high - quality full - body or partial pose annotations and text descriptions. In addition, the paper also proposes a baseline method - Tender, which adopts innovative Part - Aware Attention and Confidence - Aware Modeling techniques, specifically for 2D text - driven full - body motion generation. Through this new method, the paper aims to provide a more accessible and diverse 2D human motion generation solution to complement existing 3D motion generation methods and promote the use of 2D motions in various downstream applications, such as robot control, video games, and VR/AR, etc. At the same time, the paper also emphasizes the importance of 2D motions for enhancing the potential of 3D motion generation.