Emotion Classification Based on Pulsatile Images Extracted from Short Facial Videos via Deep Learning

Shlomi Talala,Shaul Shvimmer,Rotem Simhon,Michael Gilead,Yitzhak Yitzhaky
DOI: https://doi.org/10.3390/s24082620
IF: 3.9
2024-04-20
Sensors
Abstract:Most human emotion recognition methods largely depend on classifying stereotypical facial expressions that represent emotions. However, such facial expressions do not necessarily correspond to actual emotional states and may correspond to communicative intentions. In other cases, emotions are hidden, cannot be expressed, or may have lower arousal manifested by less pronounced facial expressions, as may occur during passive video viewing. This study improves an emotion classification approach developed in a previous study, which classifies emotions remotely without relying on stereotypical facial expressions or contact-based methods, using short facial video data. In this approach, we desire to remotely sense transdermal cardiovascular spatiotemporal facial patterns associated with different emotional states and analyze this data via machine learning. In this paper, we propose several improvements, which include a better remote heart rate estimation via a preliminary skin segmentation, improvement of the heartbeat peaks and troughs detection process, and obtaining a better emotion classification accuracy by employing an appropriate deep learning classifier using an RGB camera input only with data. We used the dataset obtained in the previous study, which contains facial videos of 110 participants who passively viewed 150 short videos that elicited the following five emotion types: amusement, disgust, fear, sexual arousal, and no emotion, while three cameras with different wavelength sensitivities (visible spectrum, near-infrared, and longwave infrared) recorded them simultaneously. From the short facial videos, we extracted unique high-resolution spatiotemporal, physiologically affected features and examined them as input features with different deep-learning approaches. An EfficientNet-B0 model type was able to classify participants' emotional states with an overall average accuracy of 47.36% using a single input spatiotemporal feature map obtained from a regular RGB camera.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy of emotion classification based on facial videos, especially without relying on stereotypical facial expressions or contact - based methods. Specifically, the researchers proposed an improved method to remotely classify emotions by extracting pulsatile images from short facial videos, and these pulsatile images are obtained from physiological signals in the skin area using deep - learning techniques. This method aims to capture cross - cortical cardiovascular spatio - temporal patterns related to different emotional states and analyze these data through machine learning. The main contributions of the paper are as follows: 1. **Improved the accuracy of heart rate estimation**: By first performing skin area segmentation and focusing on the region of interest, more reliable heart rate estimation results are provided. 2. **Improved the detection process of peak and valley values of the heartbeat signal**: After applying a band - pass filter, the peak and valley values are detected more accurately from the extracted pulsatile signal. 3. **Improved the accuracy of emotion classification using a single input feature map**: By using only a single input spatio - temporal feature map obtained by an ordinary RGB camera, an overall average accuracy of 47.36% was achieved, which is an improvement compared to the 44% accuracy achieved by the previous seven - input feature map using multispectral signals (including thermal imaging and near - infrared cameras). 4. **Increased the spatial resolution of physiological features**: This helps to capture information about micro - expressions, which may improve the accuracy of emotion recognition. Overall, the goal of this research is to develop a simpler, lower - cost, and more widely applicable non - contact emotion classification method to overcome the limitations of traditional methods that rely on stereotypical facial expressions or contact - based sensors.