Music2Play: Audio-Driven Instrumental Animation

Ruijian Jia,Shanmin Pang
DOI: https://doi.org/10.1109/cac59555.2023.10450842
2023-01-01
Abstract:Sounds are produced by the vibration of a medium driven by the motion of objects. Inspired by the human ability to visually interpret the trajectory of objects from sound sources, we propose a framework, Audio-Driven Instrumental Animation network (ADIA). Given a music clip, ADIA animates the instrumental player in the corresponding image and finally harvests a complete matched instrumental video. This task is very challenging as it needs to dynamically transfer information from noisy low-dimensional audio signals to high-dimensional visual representations. To complete the task, we design three elaborate modules in ADIA, namely, audio, flow and image modules. To be specific, the audio module begins to drive the initial pose to gain a sequence of poses in an auto-regressive way, where a novel limb loss is proposed to constrain the location of each key-point of generated poses. Then, the flow module estimates the dense flow field information from the pair of poses. Finally, the image module fuses multi-modal information (audio, flow and image) to synthesize the output frame. In the experiment phase, we demonstrate the effectiveness of our method according to the comparison with several closely related works. In addition, we also show it can synthesize realistic, diverse and rhythm-matching videos from music through a user study. The supplementary video is available at https://youtu.be/F1rZxgu4B_A
What problem does this paper attempt to address?