Speech emotion recognition using multimodal feature fusion with machine learning approach
Sandeep Kumar Panda,Ajay Kumar Jena,Mohit Ranjan Panda,Susmita Panda
DOI: https://doi.org/10.1007/s11042-023-15275-3
IF: 2.577
2023-04-21
Multimedia Tools and Applications
Abstract:Speech-based emotional state recognition must have a significant impact on artificial intelligence as machine learning advances. When it comes to emotion recognition, proper feature selection is critical. As a result, feature fusion technology is offered in this work as a means of achieving high prediction accuracy by emphasizing the extraction of sole features. Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), Mel Spectrogram, Short-time Fourier transform (STFT) and Root Mean Square (RMS) are extracted, and four different feature fusion techniques are used on five standard machine learning classifiers: XGBoost, Support Vector Machine (SVM), Random Forest, Decision-Tree (D-Tree), and K Nearest Neighbor (KNN). The successful use of feature fusion techniques on our suggested classifier yields a satisfactory recognition rate of 99.64% on the female only dataset (TESS), 91% on SAVEE (male only dataset) and 86% on CREMA-D (both male and female) dataset. The proposed model shows that effective feature fusion improves the accuracy and applicability of emotion detection systems.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering