Inferring Users' Emotions For Human-Mobile Voice Dialogue Applications

Boya Wu,Jia Jia,Tao He,Juan Du,Xiaoyuan Yi,Yishuang Ning
DOI: https://doi.org/10.1109/ICME.2016.7552890
2016-01-01
Abstract:In this paper, we tackle the problem of inferring users' emotions in real-world Voice Dialogue Applications (VDAs, Siri(1), Cortana(2), etc.). We first conduct an investigation, indicating that besides the text information of users' queries, the acoustic information and query attributes are very important in inferring emotions in VDAs. To integrate the information above, we propose a Hybrid Emotion Inference Model (HEIM), which involves a Latent Dirichlet Allocation (LDA) to extract text features and a Long Short-Term Memory (LSTM) to model the acoustic features. To further improve accuracy, a Recurrent Autoencoder Guided by Query Attributes (RAGQA) which incorporates other emotion-related query attributes is proposed in HEIM to pre-train LSTM. The accuracy of HEIM on a data set collected from Sogou Voice Assistant(3) (Chinese Siri) containing 93,000 utterances achieves 75.2%, which outperforms state-of-the-art methods for 33.5-38.5%. Specifically, we discover that on average, the acoustic information enhances the performance for 46.6%, while query attributes further enhance the performance for 6.5%.
What problem does this paper attempt to address?