Multimodal Human Facial Emotion Recognition Using ­Densenet-161 and Image Feature Stabilization Algorithm

ANGELINE R,ALICE NITHYA A
DOI: https://doi.org/10.2139/ssrn.4154901
2022-01-01
SSRN Electronic Journal
Abstract:Human Facial Emotion Recognition (FER) is the technology to predict listener’s emotion of static images and videos to uncover data on one's enthusiastic state like happy, sad, frustration, anxiety, surprise, hate and neutral states. It's a part of the affective computing technology, which may be a collaborative area of research on listener’s emotion. The standard steps of Facial Emotion Recognition are i) Face RoI identification ii) Feature Extraction and iii) Emotion Recognition. In this paper, an Image Feature Stabilization Algorithm (IFSA) is proposed to improve the efficiency of facial emotion recognition by implementing Deep Convolutional Neural Network (DCNN) model using the Transfer Learning (TL) technique. The architecture entails employing a FER-compatible pre-trained Densenet-161 based DCNN model and then fine-tuning the model for face emotion data. Initially, the dense layer(s) is/are trained, followed by the fine-tuning of each of the pre-trained DCNN blocks, resulting in an improvement in FER accuracy, particularly for difficult front face views like partial view. Experiments when performed on CK+ dataset by employing a 10-fold cross validation method using various pre-train models like VGG 16, VGG19, ResNet -18, 34, 50, 152, Inception- V3 and DenseNet-161 showed accuracy of 85.9%, 94.6%, 91%, 91%, 91.1%, 95.1%, 97.1% and 98.7% respectively. Thus, Facial Emotion Recognition performed using the finetuned DenseNet-161 architecture demonstrated exceptionally improved accuracy compared to other pretrained models along with the proposed image feature stabilization algorithm. The proposed architecture using Densenet-161 showed improved accuracy of 98.78% and 97.52% in other challenging FER datasets like KDEF and JAFEE.
What problem does this paper attempt to address?