Feature Aggregation with Two-Layer Ensemble Framework for Multilingual Speech Emotion Recognition

Sangho Ough,Sejong Pyo,Taeyong Kim
DOI: https://doi.org/10.1155/2023/8837465
IF: 1.43
2023-12-12
Mathematical Problems in Engineering
Abstract:In this study, we present a framework for improving the accuracy of speech emotion recognition in a multilingual environment. In our prior experiments, where machine learning (ML) models were trained to predict emotions in Korean and then tested in English, as well as vice versa, we observed a dependency on language in emotion recognition, resulting in poor accuracy. We suspect that this may be related to the spectral differences in certain emotions between Korean and English and to the tendency for different formant values to have different acoustic frequencies. For this study, we investigated several different methods, including models with mixed databases, a single database, and bagging, boosting, and voting ML algorithms. Finally, we developed a framework consisting of two branches: one for the aggregation of high-dimensional features from multilingual data and one for a two-layered ensemble framework for emotion classification. In the ensemble framework for Korean and English (EF-KEN), features are extracted and ensemble models are trained, boosted, and evaluated by applying them to different spoken languages (English and Korean). The final experimental result demonstrates a meaningful improvement in an environment with two different languages.
engineering, multidisciplinary,mathematics, interdisciplinary applications
What problem does this paper attempt to address?