Retrieval-Based Natural 3D Human Motion Generation

Zehan Tan,Weidong Yang,Shuai Wu
DOI: https://doi.org/10.1109/ICASSP49357.2023.10096666
2023-01-01
Abstract:It is challenging to generate 3D human motions automatically from text. In an ideal scenario, the generated motions should explore the text-grounded motion space while accurately depicting the content in the prescribed text descriptions. Text2length and text2motion training have been used to address this problem in previous research. There is, however, a lack of knowledge about the relationship between motion length and text. In this work, context-aware retrieval-based approaches are proposed for predicting motion lengths and generating proper 3D motions (C-MO). Specifically, we train a context-aware encoder-decoder model that uses the previous output of the decoder or the embedding vector of a ground truth motion as context so that the model becomes increasingly aware of previous alignments. Retrieving the most similar motions from the training set is based on the trained model given a text. Finally, we use the retrieval motion to guide the probability distribution for the final generated motions. Our method combines the advantages of both information retrieval and neural machine translation. C-MO is evaluated on a large-scale dataset, KIT, and its experimental results show that it achieves great improvements over the state-of-the-art.
What problem does this paper attempt to address?