Abstract:We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between the two participants, requiring delicate coordination in both pose and position. To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers' performances. To address the challenges inherent in this task, we propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements. To further enhance the GPT's capabilities of generating stable results on unseen conditions (music and leader motions), we devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, we establish a benchmark with several carefully designed metrics.

What problem does this paper attempt to address?

This paper attempts to address the problem of generating follower dance movements in a duet that can respond to the leader's actions and synchronize with the background music. Specifically, the paper proposes a method called "Duolando," which generates follower dance movements using a model based on GPT (Generative Pre-trained Transformer) and Off-Policy Reinforcement Learning. ### Main Issues 1. **Generating Responsive Movements**: How to generate follower dance movements that can respond to the leader's actions. 2. **Synchronizing with Music Rhythm**: How to ensure that the generated dance movements are synchronized with the rhythm of the background music. 3. **Handling Unseen Data**: How to generate stable and reasonable dance movements when faced with unseen music or leader actions. ### Solutions 1. **Large-Scale Dataset**: Constructed a large-scale duet dance dataset named DD100, which includes 10 different dance styles performed by 5 pairs of professional dancers, with a total duration of approximately 117 minutes. 2. **GPT Model**: Uses the GPT model to autoregressively predict subsequent dance movements, conditioned on music signals, the leader's actions, and the follower's previous movements. 3. **Off-Policy Reinforcement Learning**: Introduces an off-policy reinforcement learning strategy to enable the model to generate more stable results when faced with unseen music or leader actions. The learning process is guided by a human-defined reward function. ### Contributions 1. **Introducing a New Task**: Proposes a new multimodal task—dance accompaniment—and provides a large-scale and diverse dataset for training and testing. 2. **Establishing Benchmarks**: Establishes new benchmarks based on the collected dataset and proposed method, including multiple carefully designed evaluation metrics. 3. **Improving the Model**: Constructs a GPT-based network capable of generating motion sequences that consider partner coordination, serving as a strong baseline for this task. 4. **Handling Unseen Data**: Introduces an off-policy reinforcement learning strategy to address the challenges posed by unseen music or leader actions and demonstrates its successful application in the task. Through these methods, the paper aims to provide effective solutions for duet dance accompaniment tasks in fields such as Virtual Reality (VR) and Augmented Reality (AR).

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Bailando++: 3D Dance GPT With Choreographic Memory

Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory

DeepDance: Music-to-Dance Motion Choreography With Adversarial Learning

Robust Dancer: Long-term 3D Dance Synthesis Using Unpaired Data

Dual Learning Music Composition and Dance Choreography

LongDanceDiff: Long-term Dance Generation with Conditional Diffusion Model

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning

Dance with You: The Diversity Controllable Dancer Generation via Diffusion Models

Example-Based Automatic Music-Driven Conventional Dance Motion Synthesis

Pose Estimation-Assisted Dance Tracking System Based on Convolutional Neural Network

Music2Dance: DanceNet for Music-Driven Dance Generation

DisCo: Disentangled Control for Realistic Human Dance Generation

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation