A novel decomposition-based architecture for multilingual speech emotion recognition

Ravi,Sachin Taran
DOI: https://doi.org/10.1007/s00521-024-09577-2
2024-03-03
Neural Computing and Applications
Abstract:Multilingual speech emotion recognition (MLSER) is a significant and demanding research domain to improve the utility of human–computer interaction systems. Identifying the emotions from the spoken sentence is one of the most challenging tasks due to the dependency of the MLSER system on spoken languages. This study proposes a novel decomposition-based architecture for MLSER. The architecture includes silence removal, mode tuning, signal reconstruction, feature extraction, feature optimization and classification. In preprocessing, the silence part is removed using short-time energy and spectral centroid. After that, variational mode decomposition is applied for signal decomposition, where the improved Bhattacharyya distance is explored for the decomposition mode tuning. The tuned modes are examined for noise removal, and the signal is reconstructed using denoised modes. The spectral and prosodic features are computed from the reconstructed signal. The optimized features are obtained from the extracted features using the ReliefF algorithm. Finally, the fine k-nearest neighbor classifier is explored with optimized features to identify the emotions. For the experiment, three publicly available emotion databases, namely the English language-based Ryerson audio–visual database (RAVDESS), German language-based emotional speech Berlin database (Emo-DB) and Italian emotional speech database (EMOVO), are used. The proposed method yielded 90.7%, 94% and 91.1% accuracy for English, German and Italian language-based database, respectively. A multilingual database is created with these three databases, and the proposed method yields 93.4% accuracy for this database. The proposed framework provides more efficient and minimum language dependency compared to available traditional and deep learning-based approaches.
computer science, artificial intelligence
What problem does this paper attempt to address?