Gender-Aware Cnn-Blstm For Speech Emotion Recognition

Linjuan Zhang,Longbiao Wang,Jianwu Dang,Lili Guo,Qiang Yu
DOI: https://doi.org/10.1007/978-3-030-01418-6_76
2018-01-01
Abstract:Gender information has been widely used to improve the performance of speech emotion recognition (SER) due to different expressing styles of men and women. However, conventional methods cannot adequately utilize gender information by simply representing gender characteristics with a fixed unique integer or one-hot encoding. In order to emphasize the gender factors for SER, we propose two types of features for our framework, namely distributed-gender feature and gender-driven feature. The distributed-gender feature is constructed in a way to represent the gender distribution as well as individual differences, while the gender-driven feature is extracted from acoustic signals through a deep neural network (DNN). These two proposed features are then augmented into the original spectrogram respectively to serve as the input for the following decision-making network, where we construct a hybrid one by combining convolutional neural network (CNN) and bi-directional long short-term memory (BLSTM). Compared with spectrogram only, adding the distributed-gender feature and gender-driven feature in gender-aware CNN-BLSTM improved unweighted accuracy by relative error reduction of 14.04% and 45.74%, respectively.
What problem does this paper attempt to address?