Detecting Depression In Speech: Comparison And Combination Between Different Speech Types
Hailiang Long,Zhenghao Guo,Xia Wu,Bin Hu,Zhenyu Liu,Hanshu Cai
DOI: https://doi.org/10.1109/BIBM.2017.8217802
2017-01-01
Abstract:Depression is a mental disorder of high prevalence, leading to a negative effect on individuals, their families, society and the economy. In recent years, the problem of automatic detection of depression from the speech signal has gained more interest. In this paper, a new multiple classifier system for depression recognition was developed and tested. The novel aspect of this methodology is the combination of different speech types and emotions. First of all, using a sample of 74 subjects (37 depressed patients and 37 healthy controls), we examined the discriminative power of different speech types (interview, picture description, and reading) and speech emotions (positive, neutral, and negative). Some voice features (e.g. short time energy, intensity, loudness, zero-crossing rate (ZCR), F0, jitter, shimmer, formants, mel frequency cepstral coefficients (MFCC), linear prediction coefficient (LPC), line spectrum pair (LSP), and perceptual linear predictive coefficients (PLP)) were tested. Then, a new multiple classifier method was proposed to detect depression. It was observed that the overall recognition rate using interview speech was higher than employing picture description speech and reading speech. Furthermore, neutral speech showed better performance than positive and negative speech. Among these features, short time energy, ZCR, LPC, MFCC and LSP were the robust features that gave high accuracy in different types of speech. Finally, this new approach showed a high accuracy of 78.02%, giving high encouragement for detecting depression in speech.