Abstract:Depression is a prevalent mental health disorder that significantly impacts individuals' lives and well-being. Early detection and intervention are crucial for effective treatment and management of depression. Recently, there are many end-to-end deep learning methods leveraging the facial expression features for automatic depression detection. However, most current methods overlook the temporal dynamics of facial expressions. Although very recent 3DCNN methods remedy this gap, they introduce more computational cost due to the selection of CNN-based backbones and redundant facial features. To address the above limitations, by considering the timing correlation of facial expressions, we propose a novel framework called FacialPulse, which recognizes depression with high accuracy and speed. By harnessing the bidirectional nature and proficiently addressing long-term dependencies, the Facial Motion Modeling Module (FMMM) is designed in FacialPulse to fully capture temporal features. Since the proposed FMMM has parallel processing capabilities and has the gate mechanism to mitigate gradient vanishing, this module can also significantly boost the training speed. Besides, to effectively use facial landmarks to replace original images to decrease information redundancy, a Facial Landmark Calibration Module (FLCM) is designed to eliminate facial landmark errors to further improve recognition accuracy. Extensive experiments on the AVEC2014 dataset and MMDA dataset (a depression dataset) demonstrate the superiority of FacialPulse on recognition accuracy and speed, with the average MAE (Mean Absolute Error) decreased by 21% compared to baselines, and the recognition speed increased by 100% compared to state-of-the-art methods. Codes are released at <a class="link-external link-https" href="https://github.com/volatileee/FacialPulse" rel="external noopener nofollow">this https URL</a>.

Learning Content-Adaptive Feature Pooling for Facial Depression Recognition in Videos

Dynamic Facial Features in Positive-Emotional Speech for Identification of Depressive Tendencies

Hybrid Network Feature Extraction for Depression Assessment from Speech

An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection.

Automatic Depression Prediction Via Cross-Modal Attention-Based Multi-Modal Fusion in Social Networks

Automatic Assessment of Depression from Speech Via a Hierarchical Attention Transfer Network and Attention Autoencoders

Multi-Scale and Multi-Region Facial Discriminative Representation for Automatic Depression Level Prediction.

Depressioner: Facial dynamic representation for automatic depression level prediction

DepNet: An automated industrial intelligent system using deep learning for video‐based depression analysis

Automatic diagnosis of depression based on attention mechanism and feature pyramid model

FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks

Improving Depression estimation from facial videos with face alignment, training optimization and scheduling

TCEDN: A Lightweight Time-Context Enhanced Depression Detection Network

Dual‐task enhanced global–local temporal–spatial network for depression recognition from facial videos

Automatic Depression Detection via Learning and Fusing Features from Visual Cues

Learning Expression Features via Deep Residual Attention Networks for Facial Expression Recognition From Video Sequences

A Deep Multiscale Spatiotemporal Network for Assessing Depression from Facial Dynamics

Neural Architecture Searching for Facial Attributes-based Depression Recognition

PRA-Net: Part-and-Relation Attention Network for depression recognition from facial expression

Automatic Depression Level Detection Via ℓp-Norm Pooling

Two-stage Temporal Modelling Framework for Video-based Depression Recognition using Graph Representation