A Light-Weight Artificial Neural Network for Speech Emotion Recognition using Average Values of MFCCs and Their Derivatives

Panuwit Nantasri,Ekachai Phaisangittisagul,Jessada Karnjana,Surasak Boonkla,Suthum Keerativittayanun,Anocha Rugchatjaroen,Sasiporn Usanavasin,Takahiro Shinozaki
DOI: https://doi.org/10.1109/ecti-con49241.2020.9158221
2020-06-01
Abstract:Due to the limitation of memory and computational power in the embedded system, this work proposes a novel approach to create a useful set of features for improving speech emotion recognition (SER) system. Typically, Mel Frequency Cepstral Coefficients ( MFCCs) i s w idely u sed a s f eatures of SER system. In order to reduce the number of parameters and computational burden in SER applications, average values of MFCCs that are concatenated with delta and delta-delta coefficients a re u sed a s t he f eatures f or a n a rtificial neural network model (ANN) in classification. The results demonstrate that the use of the proposed features are comparable to the state-of-the-art methods with 87.8% for the EmoDB database and 82.3% for the RAVDESS database, respectively. Moreover, the number of parameters used in the classification m odel has been significantly reduced.
What problem does this paper attempt to address?