Abstract:BACKGROUND: Depressive disorder is a common affective disorder, also known as depression, which is characterized by sadness, loss of interest, feelings of guilt or low self-worth and poor concentration. As speech is easy to obtain non-offensively with low-cost, many researchers explore the possibility of depression prediction through speech. Adopting speech signals to recognize depression has important practical significance. Aiming at the problem of the complex structure of the deep neural network method used in the recognition of speech depression and the traditional machine learning methods need to manually extract the features and the low recognition rate.METHODS: This paper proposes a model that combines residual thinking and attention mechanism. First, depression corpus is designed based on the classic psychological experimental paradigm self-reference effect (SRE), and the speech dataset is labeled; then the attention module is introduced into the residual, and the channel attention is used to learn the features of the channel dimension, the spatial attention feedback the features of the spatial dimension, and the combination of the two to obtain the attention residual unit; finally the stacking unit constructs a speech depression recognition model based on the attention residual network.RESULTS: Experimental results show that compared with traditional machine learning methods, this model obtains better results in the recognition of depression, which can meet the need for actual recognition application of depression.CONCLUSIONS: In this study, we not only predict whether person is depressed, but also estimate the severity of depression. In the designed corpus, the depression binary classification of an individual is given based on the severity of depression which is measured using BDI-II scores. Experimental results show that spontaneous speech can obtain better results than automatic speech, and the classification of speech features corresponding to negative questions is better than other tasks under negative emotions. Besides, the recognition accuracy rate of both male and female subjects is higher than that under other emotions.

Negative Emotion Recognition In Spoken Dialogs

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Deep Spectrum Feature Representations for Speech Emotion Recognition

Cost-Sensitive Learning for Emotion Robust Speaker Recognition

Emotional Speech Clustering Based Robust Speaker Recognition System

Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Speech Emotion Recognition Based on Linear Discriminant Analysis and Support Vector Machine Decision Tree

Visual-Audio Emotion Recognition Based on Multi-Task and Ensemble Learning with Multiple Features

Deep Learning Based Affective Model for Speech Emotion Recognition

Emotion Recognition From Noisy Speech

Manifolds Based Emotion Recognition in Speech.

A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition

Deep Learning and SVM-based Emotion Recognition from Chinese Speech for Smart Affective Services

EmoEars: an emotion recognition system for mandarin speech

Speech depression recognition based on attentional residual network

Research on Chinese Speech Emotion Recognition Based on Deep Neural Network and Acoustic Features

Speech Emotion Recognition Based on EMD in Noisy Environments

Towards Robust Deep Neural Networks for Affect and Depression Recognition from Speech

Fuzzy speech emotion recognition considering semantic awareness

An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning