Multimodal Music Mood Classification by Fusion of Audio and Lyrics.

Hao Xue,Like Xue,Feng Su
DOI: https://doi.org/10.1007/978-3-319-14442-9_3
2015-01-01
Abstract:Mood analysis from music data attracts both increasing research and application attentions in recent years. In this paper, we propose a novel multimodal approach for music mood classification incorporating audio and lyric information, which consists of three key components: 1) lyric feature extraction with a recursive hierarchical deep learning model, preceded by lyric filtering with discriminative reduction of vocabulary and synonymous lyric expansion; 2) saliency based audio feature extraction; 3) a Hough forest based fusion and classification scheme that fuses two modalities at the more fine-grained sentence level, utilizing the time alignment cross modalities. The effectiveness of the proposed model is verified by the experiments on a real dataset containing more than 3000 minutes of music.
What problem does this paper attempt to address?