MCLEMCD: multimodal collaborative learning encoder for enhanced music classification from dances

Wenjuan Gong,Qingshuang Yu,Haoran Sun,Wendong Huang,Peng Cheng,Jordi Gonzàlez
DOI: https://doi.org/10.1007/s00530-023-01207-6
IF: 3.9
2024-01-23
Multimedia Systems
Abstract:Music classification is widely applied in the automatic organization of music archives and intelligent music interfaces. Music is frequently accompanied by other media, such as image sequences. Combining various types of media for various tasks is natural for humans but extremely difficult for machines. In this work, we propose a collaborative learning method to combine dancing motions and music cues for music classification and apply it to music recommendations from dancing motions. Dancing motions in the form of 3D joint positions contain cyclic motions synchronized with music beats, and a collaborative autoencoder is designed to fuse music cues into a dancing motion feature extraction module. The proposed method achieved on the MusicToDance data set and on the AIST++ data set. The code to run all experiments is available at https://github.com/wenjgong/musicmotion.
computer science, information systems, theory & methods
What problem does this paper attempt to address?