Abstract:Background and Objective. Currently, depression is a widespread global issue that imposes a significant burden and disability on individuals, families, and society. Deep learning (DL) has emerged as a valuable approach for automatically detecting depression by extracting cues from audiovisual data and making a diagnosis. PHQ-8 is considered a validated diagnostic tool for depressive disorders in clinical studies, and the objective of this experiment is to improve the accuracy of PHQ-8 prediction. Furthermore, this paper aims to demonstrate the effectiveness of expert knowledge in depression diagnosis and discuss a novel multimodal network architecture. Methods. This research paper focuses on multimodal depression analysis, proposing a flexible parallel transformer (FPT) model capable of extracting data from three distinct modalities (i.e., one video and two audio descriptors). The FPT-Former model incorporates three paths, each using expert-knowledge-based descriptors from one modality as inputs. These descriptors are represented into 32 features by the encoder part of a transformer module, and these features are fused to realize the final regression of PHQ-8 score. The extended distress analysis interview corpus (E-DAIC) is an expansion of WOZ-DAIC which comprises semiclinical interviews intended to assist in the diagnosis of psychological distress conditions. It encompasses a sample size of 275 participants, and in this study, it was utilized to test the model in a way of 10-fold cross-validation. Results. The FPT presented herein achieved comparable performance to the state-of-the-art works, with a root mean square error (RMSE) of 4.80 and a mean absolute error (MAE) of 4.58. The ablation experiments demonstrate that the three-modality-fused model outperforms other two-modality-fused and single-modality models. While using a PHQ-8 score threshold of 10, the accuracy of the depression classification is 0.79. Conclusions. Leveraging the strength of expert-knowledge-based multimodal measures and parallel transformer structure, the FPT model exhibits promising performance in depression detection. This model improved the accuracy of depression diagnosis through audio and video, and it also proved the effectiveness of using expert-knowledge in the diagnosis of depression. The traits of flexible structure, high predictive efficiency, and secure privacy protection make our model a promotable intelligent system in mental healthcare.

Paying attention to uncertainty: A stochastic multimodal transformers for post-traumatic stress disorder detection using video

A Novel Stochastic Transformer-based Approach for Post-Traumatic Stress Disorder Detection using Audio Recording of Clinical Interviews

PTSD-MDNN : Fusion tardive de réseaux de neurones profonds multimodaux pour la détection du trouble de stress post-traumatique

A deep transfer learning approach for improved post-traumatic stress disorder diagnosis

Psychological disorder detection: A multimodal approach using a transformer-based hybrid model

PTSD in the Wild: A Video Database for Studying Post-Traumatic Stress Disorder Recognition in Unconstrained Environments

Transformer-based multimodal feature enhancement networks for multimodal depression detection integrating video, audio and remote photoplethysmograph signals

Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text Data

Investigating the Perceived Precision and validity of a Field-Deployable Machine Learning-based Tool to Detect Post-Traumatic Stress Disorder (PTSD) Hyperarousal Events

Multimodal temporal machine learning for Bipolar Disorder and Depression Recognition

A Multimodal Approach for Detection and Assessment of Depression Using Text, Audio and Video

FPT-Former: A Flexible Parallel Transformer of Recognizing Depression by Using Audiovisual Expert-Knowledge-Based Multimodal Measures

Multimodal Mental Health Digital Biomarker Analysis from Remote Interviews using Facial, Vocal, Linguistic, and Cardiovascular Patterns

DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression Detection

Speech-based recognition and estimating severity of PTSD using machine learning

A Multimodal Non-Intrusive Stress Monitoring from the Pleasure-Arousal Emotional Dimensions

Interdisciplinary approach to identify language markers for post-traumatic stress disorder using machine learning and deep learning

Enhancing PTSD Outcome Prediction with Ensemble Models in Disaster Contexts

A transformer-based unified multimodal framework for Alzheimer's disease assessment

Multimodal fusion diagnosis of depression and anxiety based on CNN-LSTM model

Multimodal Spatiotemporal Representation for Automatic Depression Level Detection