Abstract:Depression is a prevalent mental health disorder. The absence of specific biomarkers makes clinical diagnosis highly subjective. This makes it difficult to make a definitive diagnosis for the patient. Recently, deep learning methods have shown promise for depression detection. However, current methods tend to focus solely on the connections within or between audio signals, leading to limitations in the model's ability to recognize depression-related cues in audio signals and affecting its classification performance. To address these limitations, we propose a graph neural network approach for depression recognition that incorporates potential connections within and between audio signals. Specifically, we first use a gated recurrent unit (GRU) to extract time-series information between frame-level features of audio signals. We then construct two graph neural network modules sequentially to explore the potential connections within and between audio signals. The first graph network module constructs a graph using the frame-level features of each audio sample as nodes. The output is obtained as a graph-embedded feature vector representation after the graph convolution layers. Subsequently, the output graph embedding feature vector representation of the first graph network model is used as the nodes of the graph to construct the second graph network. The internal relationship between audio signals is encoded by the property of node neighborhood information propagation. In addition, we use a pre-trained emotion recognition network to extract emotional features that are highly correlated with depression. By further strengthening the connection weights among nodes in the second graph network through a self-attention mechanism, relevant cues are provided for the model to complete depression detection from audio signals. We conducted extensive experiments on three depression datasets, including DAIC-WOZ, MODMA, and D-Vlog. The proposed model achieves better results on several performance evaluation metrics such as accuracy, F1-score, precision, and recall compared to all the compared algorithms, validating its effectiveness.

A Deep Learning Method on Audio and Text Sequences for Automatic Depression Detection

Automatic Assessment of Depression from Speech Via a Hierarchical Attention Transfer Network and Attention Autoencoders

Hybrid Network Feature Extraction for Depression Assessment from Speech

An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection.

Hierarchical Attention Transfer Networks for Depression Assessment from Speech

Prediction of Depression Severity Based on the Prosodic and Semantic Features with Bidirectional LSTM and Time Distributed CNN

Fusing Multi-Level Features from Audio and Contextual Sentence Embedding from Text for Interview-Based Depression Detection

A novel study for depression detecting using audio signals based on graph neural network

Deep learning for depression recognition with audiovisual cues: A review

A Multimodal Approach for Detection and Assessment of Depression Using Text, Audio and Video

Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech

Attention-Based Acoustic Feature Fusion Network for Depression Detection

Depression Detection in Speech Using Transformer and Parallel Convolutional Neural Networks

Automatic recognition of depression based on audio and video: A review

Deep learning for Depression Recognition from Speech

Unaligned Multimodal Sequences for Depression Assessment From Speech

Attention guided learnable time-domain filterbanks for speech depression detection

Automated depression analysis using convolutional neural networks from speech

Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

A deep learning-based model for detecting depression in senior population

Depression detection using cascaded attention based deep learning framework using speech data