Multi-person/Group Interactive Video Generation

Wang Zhan,Taiping Yao,Huawei Wei,Shanyan Guan,Bingbing Ni
DOI: https://doi.org/10.1007/978-3-030-00767-6_29
2018-01-01
Abstract:Human motion generation from caption is a fast-growing and promising technique. Recent methods employ the latest hidden states of a recurrent neural network (RNN) to encode the skeletons, which can only address Coarse-grained motions generation. In this work, we propose a novel human motion generation framework which can simultaneously consider the temporal coherence of each individual action. Our model consists of two components: Semantic Extractor, Motion Generator. The Semantic Extractor can map caption into semantical guidance for fine motion generation. The Motion Generator can model the long-term tendency of each individual action. In addition, the Motion Generator can capture global location and local dynamics of each individual action such that more fine-grained activity generation can be guaranteed. Extensive experiments show that our method achieves a superior performance gain over previous methods on two benchmark datasets.
What problem does this paper attempt to address?