Speech Emotion Recognition Via Attention-based DNN from Multi-Task Learning

Fei Ma,Weixi Gu,Wei Zhang,Shiguang Ni,Shao-Lun Huang,Lin Zhang
DOI: https://doi.org/10.1145/3274783.3275184
2018-01-01
Abstract:Speech unlocks the huge potentials in emotion recognition. High accurate and real-time understanding of human emotion via speech assists Human-Computer Interaction. Previous works are often limited in either coarse-grained emotion learning tasks or the low precisions on the emotion recognition. To solve these problems, we construct a real-world large-scale corpus composed of 4 common emotions (i.e., anger, happiness, neutral and sadness). We also propose a multi-task attention-based DNN model (i.e., MT-A-DNN) on the emotion learning. MT-A-DNN efficiently learns the high-order dependency and non-linear correlations underlying in the audio data. Extensive experiments show that MT-A-DNN outperforms conventional methods on the emotion recognition. It could take one step further on the real-time acoustic emotion recognition in many smart audio-devices.
What problem does this paper attempt to address?