GlocalEmoNet: An optimized neural network for music emotion classification and segmentation using timbre and chroma features

Yagya Raj Pandeya,Joonwhoan Lee
DOI: https://doi.org/10.1007/s11042-024-18246-4
IF: 2.577
2024-02-16
Multimedia Tools and Applications
Abstract:Music is a powerful language capable of eliciting a variety of emotions in individuals. Understanding and recognizing these emotions is pivotal for applications ranging from personalized music recommendations and music therapy to automatic music composition and affective computing. Presently, deep learning for music emotion recognition is gaining popularity, primarily relying on timbre features to capture local spatial information. However, there is an untapped potential in incorporating other pertinent audio features and global correlations in the feature space to capture the repetitive temporal information of music for emotion classification. This study introduces GlocalEmoNet as a method to capture both local and global correlations in music, utilizing timbre and Chroma audio features for tasks related to emotion classification and segmentation. The neural network underwent training and testing on approximately six thousand music audio samples, encompassing six music-emotion categories. The utilization of a genetic algorithm is employed for optimizing the hyperparameters of the proposed neural networks, aiming to attain optimal performance, efficiency, and generalization. The best classifier demonstrated superior performance, surpassing previously published results by a significant margin of approximately 14%. The optimal classifier achieved an accuracy score of 81.66%, an f1-score of 0.812, and an area under the curve score of 0.956. The evaluation of classification and segmentation outcomes also involved the use of visual representations.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?