Predicting Individual Depression Symptoms from Acoustic Features During Speech

Sebastian Rodriguez,Sri Harsha Dumpala,Katerina Dikaios,Sheri Rempel,Rudolf Uher,Sageev Oore
2024-06-23
Abstract:Current automatic depression detection systems provide predictions directly without relying on the individual symptoms/items of depression as denoted in the clinical depression rating scales. In contrast, clinicians assess each item in the depression rating scale in a clinical setting, thus implicitly providing a more detailed rationale for a depression diagnosis. In this work, we make a first step towards using the acoustic features of speech to predict individual items of the depression rating scale before obtaining the final depression prediction. For this, we use convolutional (CNN) and recurrent (long short-term memory (LSTM)) neural networks. We consider different approaches to learning the temporal context of speech. Further, we analyze two variants of voting schemes for individual item prediction and depression detection. We also include an animated visualization that shows an example of item prediction over time as the speech progresses.
Sound,Artificial Intelligence,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The problem this paper attempts to address is predicting specific items of individual depression symptoms through the acoustic features of speech. Existing automatic depression detection systems typically provide overall prediction results directly, without relying on specific symptom items in clinical depression rating scales. In contrast, clinicians assess each item in detail when evaluating depression, thereby providing more detailed evidence for diagnosis. Therefore, the goal of this paper is to use the acoustic features of speech to first predict each item in the depression rating scale, and then make the final depression prediction. Specifically, the main contributions of the paper include: 1. **Using acoustic features to predict individual symptoms**: The research team attempts to use Convolutional Neural Networks (CNN) and Long Short-Term Memory networks (LSTM) to predict each item in the depression rating scale. 2. **Analyzing different temporal context learning methods**: The study explores different methods to learn the temporal context information of speech. 3. **Voting scheme analysis**: The paper analyzes the impact of two different voting schemes (hard voting and soft voting) on individual item prediction and depression detection. 4. **Visualizing prediction results**: An animated visualization example is provided, showing the item prediction process as the speech progresses. Through these methods, the research team hopes to better understand the decision-making process of machine learning models in depression detection and provide more detailed diagnostic evidence.