Enhancing masked facial expression recognition with multimodal deep learning

H.M Shahzad,Sohail Masood Bhatti,Arfan Jaffar,Sheeraz Akram
DOI: https://doi.org/10.1007/s11042-024-18362-1
IF: 2.577
2024-02-14
Multimedia Tools and Applications
Abstract:Facial expression recognition (FER) is an essential field for intelligent human-computer interaction, but the COVID-19 pandemic has made unimodal techniques less effective due to masks. Multimodal approaches that combine information from multiple modalities are more robust at recognizing emotions from facial expressions. The need to accurately recognize human emotions based on facial expressions is still significant. The study proposed a multimodal methodology based on deep learning for facial recognition under masks and vocal expressions. The proposed approach used two standard datasets, M-LFW-F and CREMA-D to capture facial and vocal emotional cues. The resulting dataset was used to train a multimodal neural network using fusion techniques that outperformed unimodal methods. The proposed approach achieved an accuracy of 79.05%, while the unimodal approach achieved 68.76%, demonstrating that the proposed approach outperforms unimodal techniques in facial expression recognition under masked conditions. This highlights the potential of multimodal techniques for improving FER in challenging scenarios.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?