Abstract:BACKGROUND: Depressive disorder is a common affective disorder, also known as depression, which is characterized by sadness, loss of interest, feelings of guilt or low self-worth and poor concentration. As speech is easy to obtain non-offensively with low-cost, many researchers explore the possibility of depression prediction through speech. Adopting speech signals to recognize depression has important practical significance. Aiming at the problem of the complex structure of the deep neural network method used in the recognition of speech depression and the traditional machine learning methods need to manually extract the features and the low recognition rate.METHODS: This paper proposes a model that combines residual thinking and attention mechanism. First, depression corpus is designed based on the classic psychological experimental paradigm self-reference effect (SRE), and the speech dataset is labeled; then the attention module is introduced into the residual, and the channel attention is used to learn the features of the channel dimension, the spatial attention feedback the features of the spatial dimension, and the combination of the two to obtain the attention residual unit; finally the stacking unit constructs a speech depression recognition model based on the attention residual network.RESULTS: Experimental results show that compared with traditional machine learning methods, this model obtains better results in the recognition of depression, which can meet the need for actual recognition application of depression.CONCLUSIONS: In this study, we not only predict whether person is depressed, but also estimate the severity of depression. In the designed corpus, the depression binary classification of an individual is given based on the severity of depression which is measured using BDI-II scores. Experimental results show that spontaneous speech can obtain better results than automatic speech, and the classification of speech features corresponding to negative questions is better than other tasks under negative emotions. Besides, the recognition accuracy rate of both male and female subjects is higher than that under other emotions.

Hybrid Network Feature Extraction for Depression Assessment from Speech

Hierarchical Attention Transfer Networks for Depression Assessment from Speech

Automatic Assessment of Depression from Speech Via a Hierarchical Attention Transfer Network and Attention Autoencoders

Dynamic Facial Features in Positive-Emotional Speech for Identification of Depressive Tendencies

Automatic Depression Prediction Via Cross-Modal Attention-Based Multi-Modal Fusion in Social Networks

An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection.

Automated depression analysis using convolutional neural networks from speech

Attention-Based Acoustic Feature Fusion Network for Depression Detection

Fusing features of speech for depression classification based on higher-order spectral analysis

Multi-feature deep supervised voiceprint adversarial network for depression recognition from speech

Depression Speech Recognition With a Three-Dimensional Convolutional Network

WavDepressionNet: Automatic Depression Level Prediction Via Raw Speech Signals

Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders

Speech depression recognition based on attentional residual network

The Verbal and Non Verbal Signals of Depression -- Combining Acoustics, Text and Visuals for Estimating Depression Level

A time-frequency channel attention and vectorization network for automatic depression level prediction

Depression Detection in Speech Using Transformer and Parallel Convolutional Neural Networks

Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

Fusing Multi-Level Features from Audio and Contextual Sentence Embedding from Text for Interview-Based Depression Detection

Multimodal Spatiotemporal Representation for Automatic Depression Level Detection

Prediction of Depression Severity Based on the Prosodic and Semantic Features with Bidirectional LSTM and Time Distributed CNN