Multi-modal fusion emotion recognition based on HMM and ANN

Chao Xu,Tianyi Cao,Zhiyong Feng,Caichao Dong
DOI: https://doi.org/10.1007/978-3-642-34447-3_48
2013-01-01
Abstract:Emotional states play an important role in Human-Computer Interaction. An emotion recognition framework is proposed to extract and fuse features from both video sequences and speech signals. This framework is constructed from two Hidden Markov Models (HMMs) represented to achieve emotional states with video and audio respectively; Artificial Neural Network (ANN) is applied as the whole fusion mechanism. Two important phases for HMMs are Facial Animation Parameters (FAPs) extraction from video sequences based on Active Appearance Model (AAM), and pitch and energy features extraction from speech signals. Experiments indicate that the proposed approach has better performance and robustness than methods using video or audio separately.
What problem does this paper attempt to address?