Speech emotion recognition via ensembling neural networks.

Danqing Luo,Yuexian Zou,Dongyan Huang
DOI: https://doi.org/10.1109/APSIPA.2017.8282242
2017-01-01
Abstract:Deep Neural Network (DNN) based speech emotion recognition (SER) methods have demonstrated competitive performance compared to traditional SER approaches. However, from literatures, it can be seen that the confusion matrices of different SER methods varied a lot, which indicates that different DNN architecture has different capability of modeling different emotion cues from speech. It also means that single classifier hardly performs well on all speech emotion categories, which may be possibly due to data imbalance and the limitation of classifier. Motivated by the improved research results of ensemble learning, this paper investigates an ensemble method for SER via aggregating results from several base classifiers. In this study, considering the outstanding performance of Recurrent Neural Network (RNN) in different speech tasks and Residual network (ResNet) in image related classification, we chose RNN and ResNet acting as base classifiers. Experiments show that our proposed ensemble SER system outperforms the state-of-art single classifier-based SER system.
What problem does this paper attempt to address?