Dancing to Music

Hsin-Ying Lee,Xiaodong Yang,Ming-Yu Liu,Ting-Chun Wang,Yu-Ding Lu,Ming-Hsuan Yang,Jan Kautz
DOI: https://doi.org/10.48550/arXiv.1911.02001
2019-11-06
Abstract:Dancing to music is an instinctive move by humans. Learning to model the music-to-dance generation process is, however, a challenging problem. It requires significant efforts to measure the correlation between music and dance as one needs to simultaneously consider multiple aspects, such as style and beat of both music and dance. Additionally, dance is inherently multimodal and various following movements of a pose at any moment are equally likely. In this paper, we propose a synthesis-by-analysis learning framework to generate dance from music. In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move. In the synthesis phase, the model learns how to compose a dance by organizing multiple basic dancing movements seamlessly according to the input music. Experimental qualitative and quantitative results demonstrate that the proposed method can synthesize realistic, diverse,style-consistent, and beat-matching dances from music.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem this paper attempts to address is music-to-dance generation. Specifically, the authors aim to develop a computational model that can automatically generate dance movements that match the style and rhythm of the input music. This problem is challenging because it requires consideration of multiple aspects, such as the style and beat of the music and dance, as well as the multimodal nature of dance itself. Traditional similarity-based retrieval methods are limited in creativity, so this paper approaches the problem from a generative perspective. ### Main Contributions of the Paper: 1. **Propose a new cross-modal generation task**: Music-to-dance generation. 2. **Design a decomposition-combination framework**: Decompose complex dances into basic dance units and recombine these units based on the music. 3. **Generate realistic and diverse dances**: The generated dances can well match the style and rhythm of the music. 4. **Provide a large-scale paired music and dance dataset**: Contains over 360,000 video clips, with a total duration of 71 hours. ### Method to Solve the Problem: 1. **Decomposition Stage**: - Use a motion beat detector to extract motion beats from the dance. - Normalize the dance sequence into a series of basic dance units. - Use DU-VAE (Dance Unit Variational Autoencoder) to encode and decode the basic dance units, decomposing them into initial pose space and motion space. 2. **Combination Stage**: - Use MM-GAN (Music-to-Movement Generative Adversarial Network) to generate a series of basic dance units based on the input music. - Extract the style features of the music using a music style extractor and combine them with noise vectors to generate latent dance codes. - Use a recurrent dance decoder to decode the latent dance codes into actual motion sequences. 3. **Testing Stage**: - Given a piece of music, first extract the beat and style features of the music. - Use the trained model to generate a series of basic dance units and combine these units into a complete dance sequence. - Finally, adjust the generated motion sequence to align with the music beats, generating the final dance. ### Experimental Results: - **Qualitative Comparison**: The generated dances outperform baseline methods in terms of realism, coherence, and diversity. - **Quantitative Comparison**: Evaluated through user studies and FID metrics, the results show that the generated dances perform well in terms of realism and style consistency. - **Beat Matching**: The generated dances can well match the beats of the music. In summary, this paper successfully addresses the problem of music-to-dance generation by proposing an innovative decomposition-combination framework. The generated dances are not only realistic and diverse but also well match the style and rhythm of the music.