Abstract:Introduction: Real-time evaluations of the severity of depressive symptoms are of great significance for the diagnosis and treatment of patients with major depressive disorder (MDD). In clinical practice, the evaluation approaches are mainly based on psychological scales and doctor-patient interviews, which are time-consuming and labor-intensive. Also, the accuracy of results mainly depends on the subjective judgment of the clinician. With the development of artificial intelligence (AI) technology, more and more machine learning methods are used to diagnose depression by appearance characteristics. Most of the previous research focused on the study of single-modal data; however, in recent years, many studies have shown that multi-modal data has better prediction performance than single-modal data. This study aimed to develop a measurement of depression severity from expression and action features and to assess its validity among the patients with MDD. Methods: We proposed a multi-modal deep convolutional neural network (CNN) to evaluate the severity of depressive symptoms in real-time, which was based on the detection of patients' facial expression and body movement from videos captured by ordinary cameras. We established behavioral depression degree (BDD) metrics, which combines expression entropy and action entropy to measure the depression severity of MDD patients. Results: We found that the information extracted from different modes, when integrated in appropriate proportions, can significantly improve the accuracy of the evaluation, which has not been reported in previous studies. This method presented an over 74% Pearson similarity between BDD and self-rating depression scale (SDS), self-rating anxiety scale (SAS), and Hamilton depression scale (HAMD). In addition, we tracked and evaluated the changes of BDD in patients at different stages of a course of treatment and the results obtained were in agreement with the evaluation from the scales. Discussion: The BDD can effectively measure the current state of patients' depression and its changing trend according to the patient's expression and action features. Our model may provide an automatic auxiliary tool for the diagnosis and treatment of MDD.

Dual‐task enhanced global–local temporal–spatial network for depression recognition from facial videos

Automatic Depression Prediction Via Cross-Modal Attention-Based Multi-Modal Fusion in Social Networks

An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection.

TCEDN: A Lightweight Time-Context Enhanced Depression Detection Network

Hybrid Network Feature Extraction for Depression Assessment from Speech

A Deep Multiscale Spatiotemporal Network for Assessing Depression from Facial Dynamics

Depressformer: Leveraging Video Swin Transformer and fine-grained local features for depression scale estimation

An Improved Global-Local Fusion Network for Depression Detection Telemedicine Framework

Neural Architecture Searching for Facial Attributes-based Depression Recognition

FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks

Dual Attention and Element Recalibration Networks for Automatic Depression Level Prediction

Two-stage Temporal Modelling Framework for Video-based Depression Recognition using Graph Representation

Transformer-based multimodal feature enhancement networks for multimodal depression detection integrating video, audio and remote photoplethysmograph signals

Depressioner: Facial dynamic representation for automatic depression level prediction

Improving Depression estimation from facial videos with face alignment, training optimization and scheduling

Multi-Scale and Multi-Region Facial Discriminative Representation for Automatic Depression Level Prediction.

DepNet: An automated industrial intelligent system using deep learning for video‐based depression analysis

LSCAformer: Long and short-term cross-attention-aware transformer for depression recognition from video sequences

Local Second-Order Gradient Cross Pattern for Automatic Depression Detection

Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level

Measuring depression severity based on facial expression and body movement using deep convolutional neural network