Dual-modality video emotion recognition based on facial expression and BVP physiological signal
Fuji Ren,Manli Yu,Min Hu,Yanqiu Li
DOI: https://doi.org/10.11834/jig.170401
2018-01-01
Journal of Image and Graphics
Abstract:Objective With the continuous development of artificial intelligence,researchers and scholars from other fields have become increasingly interested in providing computers with the capability to understand the emotions conveyed by (human beings and naturally interact with them.Therefore,emotion recognition has gradually become one of the key points of research to achieve harmonious human-computer interaction.The performance of video emotion recognition algorithms critically depends on the quality of the extracted emotion information.Previous research showed that facial expression is the most direct method to convey emotional information.Thus,current works usually rely on facial expressions only to complete emotion recognition.Feature extraction methods based on facial expression images are mostly based on gray images.However,during the conversion of color images into gray images,the latent physiological signals in the color information and the hidden physiological signals contained in facial videos that have discriminant information for emotion recognition are lost.In this study,a novel dual-modality video emotion recognition method for fusion decision,which combines facial expressions and blood volume pulse (BVP) physiological signals that can be extracted from facial videos,is introduced to overcome this problem.Method First,the video is preprocessed (including face detection and normalization) to acquire a sequence of video frames that contain only the face image.The LBP-TOP feature is an effective local texture descriptor,whereas the HOG-TOP feature is a gradient-based local shape descriptor that can compensate for the lack of LBP-TOP feature extraction in image edge and direction information.Thus,in this study,we extract the LBP-TOP and HOG-TOP features from the video frames and fuse the two facial expression features.We use video color amplification technology to process the original video and extract the BVP physiological signal from the processed video.Then,the emotional feature of physiological signals can be extracted from the BVP physiological signal.Afterward,the two features are inputted into the BP classifier to train the classification models.Finally,the fuzzy integral is used to fuse the posterior probability information obtained by the two classifiers to obtain the final emotion recognition result.Result Considering that the current commonly used video emotion databases cannot satisfy the requirements for extracting the BVP signal,we conduct experimental verification by using the self-built facial expression video database.Each group of experiments was cross-validated,and the final results were averaged to increase the credibility of the experiment.The average recognition rates of single modality,i.e.,facial expression or physiological signal,are 80% and 63.75%,respectively,whereas the emotion recognition result of the fusion of the two modalities is up to 83.33%,which is higher than that of each single modality before fusion.This finding indicates that the fusion decision algorithm with facial expression and BVP physiological signal is effective for emotion recognition.The experimental results of other fusion methods,namely,the D-S evidence theory and the maximum value rule,are 71% and 80%,respectively,which are lower than that of the fuzzy integral method.In addition,the recognition rate of our method is 2% and 2.5% higher than the results of the two existing video emotion recognition methods.Conclusion The dual-modality space-time feature fusion method proposed in this study characterizes the emotion information contained in the facial videos from two aspects,i.e.,the facial expression and the physiological signals,to make full use of the emotional information of the video.The experimental results show that this algorithm can make full use of the emotion information of the video and effectively improve the classification performance of video emotion recognition.The effectiveness of our proposed method in comparison to that of similar video emotion recognition algorithms is verified.In addition,the fuzzy integral is used to fuse two different modalities at the decision level.The reliability of different classifiers in the fusion process is considered and compared with that of D-S evidence theory and the maximum value rule.The influence of unreliable decision-making information on the fusion decision is effectively reduced.Finally,a high recognition accuracy is obtained by the proposed fusion method.The contrast experiment with other fusion methods also proves the superiority of the proposed fusion method.