A novel study for depression detecting using audio signals based on graph neural network

Chenjian Sun,Min Jiang,Linlin Gao,Yu Xin,Yihong Dong
DOI: https://doi.org/10.1016/j.bspc.2023.105675
IF: 5.1
2023-11-06
Biomedical Signal Processing and Control
Abstract:Depression is a prevalent mental health disorder. The absence of specific biomarkers makes clinical diagnosis highly subjective. This makes it difficult to make a definitive diagnosis for the patient. Recently, deep learning methods have shown promise for depression detection. However, current methods tend to focus solely on the connections within or between audio signals, leading to limitations in the model's ability to recognize depression-related cues in audio signals and affecting its classification performance. To address these limitations, we propose a graph neural network approach for depression recognition that incorporates potential connections within and between audio signals. Specifically, we first use a gated recurrent unit (GRU) to extract time-series information between frame-level features of audio signals. We then construct two graph neural network modules sequentially to explore the potential connections within and between audio signals. The first graph network module constructs a graph using the frame-level features of each audio sample as nodes. The output is obtained as a graph-embedded feature vector representation after the graph convolution layers. Subsequently, the output graph embedding feature vector representation of the first graph network model is used as the nodes of the graph to construct the second graph network. The internal relationship between audio signals is encoded by the property of node neighborhood information propagation. In addition, we use a pre-trained emotion recognition network to extract emotional features that are highly correlated with depression. By further strengthening the connection weights among nodes in the second graph network through a self-attention mechanism, relevant cues are provided for the model to complete depression detection from audio signals. We conducted extensive experiments on three depression datasets, including DAIC-WOZ, MODMA, and D-Vlog. The proposed model achieves better results on several performance evaluation metrics such as accuracy, F1-score, precision, and recall compared to all the compared algorithms, validating its effectiveness.
engineering, biomedical
What problem does this paper attempt to address?