COSMIC: Music Emotion Recognition Combining Structure Analysis and Modal Interaction

Liang Yang,Zhexu Shen,Jingjie Zeng,Xi Luo,Hongfei Lin
DOI: https://doi.org/10.1007/s11042-023-15376-z
IF: 2.577
2024-01-01
Multimedia Tools and Applications
Abstract:As a common multi-modal information carrier, music is frequently used to deliver emotions with lyrics and melodies. Besides lyrics (text) and melodies (audio), the structure of a song is another indicator of emotions creating a strong resonance for listeners. Typically, a pop song is composed of verses and choruses. To improve the performance of existing music emotion recognition models, we first propose a hierarchical model to analyze music structure. Then, a cross-modal interaction method is developed to extract and interact emotions from different modalities. Finally, we perform music emotion recognition by combining music structure analysis and cross-modal interaction. Adequate experiments are conducted on a dataset crawled from Netease Cloud Music, and results demonstrate the effectiveness of music structure analysis and cross-modal interaction. The proposed model COSMIC achieves state-of-the-art performance on music emotion recognition tasks.
What problem does this paper attempt to address?