Abstract:Dancing to music is an instinctive move by humans. Learning to model the music-to-dance generation process is, however, a challenging problem. It requires significant efforts to measure the correlation between music and dance as one needs to simultaneously consider multiple aspects, such as style and beat of both music and dance. Additionally, dance is inherently multimodal and various following movements of a pose at any moment are equally likely. In this paper, we propose a synthesis-by-analysis learning framework to generate dance from music. In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move. In the synthesis phase, the model learns how to compose a dance by organizing multiple basic dancing movements seamlessly according to the input music. Experimental qualitative and quantitative results demonstrate that the proposed method can synthesize realistic, diverse,style-consistent, and beat-matching dances from music.

What problem does this paper attempt to address?

The problem this paper attempts to address is music-to-dance generation. Specifically, the authors aim to develop a computational model that can automatically generate dance movements that match the style and rhythm of the input music. This problem is challenging because it requires consideration of multiple aspects, such as the style and beat of the music and dance, as well as the multimodal nature of dance itself. Traditional similarity-based retrieval methods are limited in creativity, so this paper approaches the problem from a generative perspective. ### Main Contributions of the Paper: 1. **Propose a new cross-modal generation task**: Music-to-dance generation. 2. **Design a decomposition-combination framework**: Decompose complex dances into basic dance units and recombine these units based on the music. 3. **Generate realistic and diverse dances**: The generated dances can well match the style and rhythm of the music. 4. **Provide a large-scale paired music and dance dataset**: Contains over 360,000 video clips, with a total duration of 71 hours. ### Method to Solve the Problem: 1. **Decomposition Stage**: - Use a motion beat detector to extract motion beats from the dance. - Normalize the dance sequence into a series of basic dance units. - Use DU-VAE (Dance Unit Variational Autoencoder) to encode and decode the basic dance units, decomposing them into initial pose space and motion space. 2. **Combination Stage**: - Use MM-GAN (Music-to-Movement Generative Adversarial Network) to generate a series of basic dance units based on the input music. - Extract the style features of the music using a music style extractor and combine them with noise vectors to generate latent dance codes. - Use a recurrent dance decoder to decode the latent dance codes into actual motion sequences. 3. **Testing Stage**: - Given a piece of music, first extract the beat and style features of the music. - Use the trained model to generate a series of basic dance units and combine these units into a complete dance sequence. - Finally, adjust the generated motion sequence to align with the music beats, generating the final dance. ### Experimental Results: - **Qualitative Comparison**: The generated dances outperform baseline methods in terms of realism, coherence, and diversity. - **Quantitative Comparison**: Evaluated through user studies and FID metrics, the results show that the generated dances perform well in terms of realism and style consistency. - **Beat Matching**: The generated dances can well match the beats of the music. In summary, this paper successfully addresses the problem of music-to-dance generation by proposing an innovative decomposition-combination framework. The generated dances are not only realistic and diverse but also well match the style and rhythm of the music.

Dancing to Music

Example-Based Automatic Music-Driven Conventional Dance Motion Synthesis

DanceIt: Music-Inspired Dancing Video Synthesis

Music2Dance: DanceNet for Music-Driven Dance Generation

Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning

Dance with Melody: An LSTM-autoencoder Approach to Music-oriented Dance Synthesis.

Dance revolution: long-term dance genera-

A deep learning model of dance generation for young children based on music rhythm and beat

Towards 3D Dance Motion Synthesis and Control

ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit

DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

DanceMeld: Unraveling Dance Phrases with Hierarchical Latent Codes for Music-to-Dance Synthesis

Genre-Conditioned Long-Term 3D Dance Generation Driven by Music

Dance2MIDI: Dance-driven multi-instrument music generation

Dance Generation with Style Embedding: Learning and Transferring Latent Representations of Dance Styles

Bidirectional Autoregressive Diffusion Model for Dance Generation

Dual Learning Music Composition and Dance Choreography

Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization

EnchantDance: Unveiling the Potential of Music-Driven Dance Movement

Dance2Music-Diffusion: leveraging latent diffusion models for music generation from dance videos