Abstract:Depression is a prevalent mental health disorder. The absence of specific biomarkers makes clinical diagnosis highly subjective. This makes it difficult to make a definitive diagnosis for the patient. Recently, deep learning methods have shown promise for depression detection. However, current methods tend to focus solely on the connections within or between audio signals, leading to limitations in the model's ability to recognize depression-related cues in audio signals and affecting its classification performance. To address these limitations, we propose a graph neural network approach for depression recognition that incorporates potential connections within and between audio signals. Specifically, we first use a gated recurrent unit (GRU) to extract time-series information between frame-level features of audio signals. We then construct two graph neural network modules sequentially to explore the potential connections within and between audio signals. The first graph network module constructs a graph using the frame-level features of each audio sample as nodes. The output is obtained as a graph-embedded feature vector representation after the graph convolution layers. Subsequently, the output graph embedding feature vector representation of the first graph network model is used as the nodes of the graph to construct the second graph network. The internal relationship between audio signals is encoded by the property of node neighborhood information propagation. In addition, we use a pre-trained emotion recognition network to extract emotional features that are highly correlated with depression. By further strengthening the connection weights among nodes in the second graph network through a self-attention mechanism, relevant cues are provided for the model to complete depression detection from audio signals. We conducted extensive experiments on three depression datasets, including DAIC-WOZ, MODMA, and D-Vlog. The proposed model achieves better results on several performance evaluation metrics such as accuracy, F1-score, precision, and recall compared to all the compared algorithms, validating its effectiveness.

Speech-based Depression Detection Using Unsupervised Autoencoder

Hybrid Network Feature Extraction for Depression Assessment from Speech

Automatic Assessment of Depression from Speech Via a Hierarchical Attention Transfer Network and Attention Autoencoders

Depression Detection in Speech Using Transformer and Parallel Convolutional Neural Networks

An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection.

Towards Automatic Depression Detection: A BiLSTM/1D CNN-Based Model

Multimodal Depression Detection Using a Deep Feature Fusion Network

DEPA: Self-Supervised Audio Embedding for Depression Detection

Prediction of Depression Severity Based on Transformer Encoder and CNN Model

Automated depression analysis using convolutional neural networks from speech

A novel study for depression detecting using audio signals based on graph neural network

A Deep Learning Method on Audio and Text Sequences for Automatic Depression Detection

Depression Speech Recognition With a Three-Dimensional Convolutional Network

Multimodal Depression Detection based on Factorized Representation

Automatic Depression Level Assessment from Speech by Long-Term Global Information Embedding

Automatic Detection of Depression from Stratified Samples of Audio Data

Automatic Depression Recognition by Intelligent Speech Signal Processing: A Systematic Survey

Deep learning for Depression Recognition from Speech

WavDepressionNet: Automatic Depression Level Prediction Via Raw Speech Signals

Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

IntervoxNet: a novel dual-modal audio-text fusion network for automatic and efficient depression detection from interviews