FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation

Ronghui Li,Junfan Zhao,Yachao Zhang,Mingyang Su,Zeping Ren,Han Zhang,Yansong Tang,Xiu Li
2023-08-30
Abstract:Generating full-body and multi-genre dance sequences from given music is a challenging task, due to the limitations of existing datasets and the inherent complexity of the fine-grained hand motion and dance genres. To address these problems, we propose FineDance, which contains 14.6 hours of music-dance paired data, with fine-grained hand motions, fine-grained genres (22 dance genres), and accurate posture. To the best of our knowledge, FineDance is the largest music-dance paired dataset with the most dance genres. Additionally, to address monotonous and unnatural hand movements existing in previous methods, we propose a full-body dance generation network, which utilizes the diverse generation capabilities of the diffusion model to solve monotonous problems, and use expert nets to solve unreal problems. To further enhance the genre-matching and long-term stability of generated dances, we propose a Genre&Coherent aware Retrieval Module. Besides, we propose a novel metric named Genre Matching Score to evaluate the genre-matching degree between dance and music. Quantitative and qualitative experiments demonstrate the quality of FineDance, and the state-of-the-art performance of FineNet. The FineDance Dataset and more qualitative samples can be found at our website.
Computer Vision and Pattern Recognition,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the challenging task of generating high-quality, multi-style dance sequences from given music. Specifically, the paper focuses on solving the following two core issues: 1. **Insufficient Expression of Hand Movements**: Existing dance datasets typically lack detailed descriptions of hand movements, resulting in generated hand movements that are unnatural or monotonous. This issue arises because hands and bodies exist in different feature spaces, while existing methods often treat them as the same part. 2. **Multi-Style Dance Generation**: Existing datasets contain limited dance styles, leading to generated dances that do not match various music styles well. Additionally, existing methods have limitations in handling coarse dance style classifications and lack effective metrics to evaluate the matching degree between generated dances and music styles. To address these issues, the paper proposes two main contributions: - **FineDance Dataset**: A large-scale, professional-grade 3D motion capture dance dataset containing over 14.6 hours of music-dance paired data, covering 22 fine-grained dance styles, and featuring accurate body and hand movement information. - **FineNet Network**: A two-stage generation-synthesis network that uses an expert network and a refinement network to generate expressive full-body dances, and enhances the style matching degree and long-term stability of the generated dances through a cross-modal retrieval module.