Multi-scale Context Based Attention for Dynamic Music Emotion Prediction

Ye Ma,Xinxing Li,Mingxing Xu,Jia,Lianhong Cai
DOI: https://doi.org/10.1145/3123266.3123408
2017-01-01
Abstract:Dynamic music emotion prediction is to recognize the continuous emotion information in music, which is necessary for music retrieval and recommendation. In this paper, we adopt the dimensional valence-arousal (V-A) emotion model to represent the dynamic emotion in music. In our opinion, music and V-A emotion label do not have the one-to-one correspondence in the time domain, while the expression of music emotion at one moment is the accumulation of previous music content for a period of time, so we propose Long Short-Term Memory (LSTM) based sequence-to-one mapping for dynamic music emotion prediction. Based on this sequence-to-one music emotion mapping, it is proved that different time scales' preceding content has an influence on the LSTM model's performance, so we further propose the Multi-scale Context based Attention (MCA) for dynamic music emotion prediction. We evaluate our proposed method on the database of Emotion in Music task at MediaEval 2015 , and the results show that our proposed method outperforms most of the models using the same features and achieves a competitive performance with the state-of-the-art methods.
What problem does this paper attempt to address?