Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Li Siyao,Tianpei Gu,Zhitao Yang,Zhengyu Lin,Ziwei Liu,Henghui Ding,Lei Yang,Chen Change Loy
2024-03-28
Abstract:We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between the two participants, requiring delicate coordination in both pose and position. To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers' performances. To address the challenges inherent in this task, we propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements. To further enhance the GPT's capabilities of generating stable results on unseen conditions (music and leader motions), we devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, we establish a benchmark with several carefully designed metrics.
Computer Vision and Pattern Recognition,Graphics,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
This paper attempts to address the problem of generating follower dance movements in a duet that can respond to the leader's actions and synchronize with the background music. Specifically, the paper proposes a method called "Duolando," which generates follower dance movements using a model based on GPT (Generative Pre-trained Transformer) and Off-Policy Reinforcement Learning. ### Main Issues 1. **Generating Responsive Movements**: How to generate follower dance movements that can respond to the leader's actions. 2. **Synchronizing with Music Rhythm**: How to ensure that the generated dance movements are synchronized with the rhythm of the background music. 3. **Handling Unseen Data**: How to generate stable and reasonable dance movements when faced with unseen music or leader actions. ### Solutions 1. **Large-Scale Dataset**: Constructed a large-scale duet dance dataset named DD100, which includes 10 different dance styles performed by 5 pairs of professional dancers, with a total duration of approximately 117 minutes. 2. **GPT Model**: Uses the GPT model to autoregressively predict subsequent dance movements, conditioned on music signals, the leader's actions, and the follower's previous movements. 3. **Off-Policy Reinforcement Learning**: Introduces an off-policy reinforcement learning strategy to enable the model to generate more stable results when faced with unseen music or leader actions. The learning process is guided by a human-defined reward function. ### Contributions 1. **Introducing a New Task**: Proposes a new multimodal task—dance accompaniment—and provides a large-scale and diverse dataset for training and testing. 2. **Establishing Benchmarks**: Establishes new benchmarks based on the collected dataset and proposed method, including multiple carefully designed evaluation metrics. 3. **Improving the Model**: Constructs a GPT-based network capable of generating motion sequences that consider partner coordination, serving as a strong baseline for this task. 4. **Handling Unseen Data**: Introduces an off-policy reinforcement learning strategy to address the challenges posed by unseen music or leader actions and demonstrates its successful application in the task. Through these methods, the paper aims to provide effective solutions for duet dance accompaniment tasks in fields such as Virtual Reality (VR) and Augmented Reality (AR).