Music-oriented Dance Video Synthesis with Pose Perceptual Loss

Xuanchi Ren,Haoran Li,Zijian Huang,Qifeng Chen
DOI: https://doi.org/10.48550/arXiv.1912.06606
2019-12-13
Computer Vision and Pattern Recognition
Abstract:We present a learning-based approach with pose perceptual loss for automatic music video generation. Our method can produce a realistic dance video that conforms to the beats and rhymes of almost any given music. To achieve this, we firstly generate a human skeleton sequence from music and then apply the learned pose-to-appearance mapping to generate the final video. In the stage of generating skeleton sequences, we utilize two discriminators to capture different aspects of the sequence and propose a novel pose perceptual loss to produce natural dances. Besides, we also provide a new cross-modal evaluation to evaluate the dance quality, which is able to estimate the similarity between two modalities of music and dance. Finally, a user study is conducted to demonstrate that dance video synthesized by the presented approach produces surprisingly realistic results. The results are shown in the supplementary video at https://youtu.be/0rMuFMZa_K4
What problem does this paper attempt to address?