POPDG: Popular 3D Dance Generation with PopDanceSet

Zhenye Luo,Min Ren,Xuecai Hu,Yongzhen Huang,Li Yao
2024-05-06
Abstract:Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. Moreover, the proposed POPDG model within the iDDPM framework enhances dance diversity and, through the Space Augmentation Algorithm, strengthens spatial physical connections between human body joints, ensuring that increased diversity does not compromise generation quality. A streamlined Alignment Module is also designed to improve the temporal alignment between dance and music. Extensive experiments show that POPDG achieves SOTA results on two datasets. Furthermore, the paper also expands on current evaluation metrics. The dataset and code are available at
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to generate dances that are both realistic and highly aligned with music, which has always been a challenging task in the cross - modal field. Specifically: 1. **Limitations of existing datasets**: Although the existing AIST++ dataset contains a large amount of data, it is deficient in the diversity of dance and music types, the complexity and depth of dance movements, especially lacking dance content for the aesthetic preferences of young audiences. 2. **Deficiencies of generation models**: - Previous generation models mainly focused on the temporal alignment of music and dance, but ignored the spatial physical connections between human joints. - Many models have problems in the complexity of training steps, the stability of generation, and diversity. - The existing evaluation metrics are not comprehensive enough for music - driven dance generation tasks and cannot reasonably evaluate the quality and diversity of generated dances. To solve these problems, the paper makes two main contributions: 1. **Constructing the PopDanceSet dataset**: - PopDanceSet is the first dataset specifically for the aesthetics of young audiences, covering a wider range of dance and music types and enhancing the diversity and complexity of dance movements. - The dataset screens dance videos that meet popular aesthetics through a popularity function, ensuring the high quality and consistency of the data. 2. **Proposing the POPDG model**: - POPDG is based on the improved Denoising Diffusion Probability Model (iDDPM). By introducing the Space Augmentation Algorithm, it strengthens the spatial physical connections between human joints, ensuring that the generated dances do not sacrifice quality while increasing diversity. - A simplified Alignment Module is designed to encode the spatio - temporal features of music and dance, significantly improving the rhythmic synchronization between them. Through these improvements, POPDG achieves state - of - the - art results on two datasets and extends the existing evaluation metrics, making the evaluation of dance generation more comprehensive and objective.