MASON-NLP at eRisk 2023: Deep Learning-Based Detection of Depression Symptoms from Social Media Texts

Fardin Ahsan Sakib,Ahnaf Atef Choudhury,Ozlem Uzuner
2023-10-17
Abstract:Depression is a mental health disorder that has a profound impact on people's lives. Recent research suggests that signs of depression can be detected in the way individuals communicate, both through spoken words and written texts. In particular, social media posts are a rich and convenient text source that we may examine for depressive symptoms. The Beck Depression Inventory (BDI) Questionnaire, which is frequently used to gauge the severity of depression, is one instrument that can aid in this study. We can narrow our study to only those symptoms since each BDI question is linked to a particular depressive symptom. It's important to remember that not everyone with depression exhibits all symptoms at once, but rather a combination of them. Therefore, it is extremely useful to be able to determine if a sentence or a piece of user-generated content is pertinent to a certain condition. With this in mind, the eRisk 2023 Task 1 was designed to do exactly that: assess the relevance of different sentences to the symptoms of depression as outlined in the BDI questionnaire. This report is all about how our team, Mason-NLP, participated in this subtask, which involved identifying sentences related to different depression symptoms. We used a deep learning approach that incorporated MentalBERT, RoBERTa, and LSTM. Despite our efforts, the evaluation results were lower than expected, underscoring the challenges inherent in ranking sentences from an extensive dataset about depression, which necessitates both appropriate methodological choices and significant computational resources. We anticipate that future iterations of this shared task will yield improved results as our understanding and techniques evolve.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of detecting depression symptoms from social media texts. Specifically, the research team participated in eRisk 2023 Task 1, which aims to assess the relevance of different sentences to the depression symptoms listed in the Beck Depression Inventory (BDI) questionnaire. Through this approach, the research team hopes to identify potential depression patients early, allowing for timely intervention and treatment. ### Main Issues: 1. **Early Detection of Depression**: By analyzing user-generated content on social media, detect whether there are language features related to depression. 2. **Relevance Assessment**: Evaluate the relevance of different sentences to specific depression symptoms to better understand the user's mental health status. 3. **Large-scale Data Processing**: Process a large amount of social media data to filter out sentences related to depression, improving the accuracy and efficiency of detection. ### Research Background: - **Impact of Depression**: Depression is a mental health disorder that seriously affects people's lives, impacting millions of people each year. However, many do not seek medical help due to a lack of awareness or the stigma associated with mental health. - **Relationship Between Language and Mental Health**: Studies have shown that there is a correlation between language use and mental health. Natural Language Processing (NLP) technology can be used to analyze depression symptoms. - **Role of Social Media**: Social media platforms provide a large amount of user-generated content, which can be an important resource for the early detection of depression. ### Research Methods: - **Datasets**: The research team used two datasets: the official eRisk 2023 dataset and Kaggle's "Depression: Reddit Dataset." - **Models**: The research team adopted three models: MentalBERT, RoBERTa, and LSTM, using a two-stage approach to accelerate system performance. - **First Stage**: The RoBERTa model was used to filter out 830,151 depression-related sentences from 3,807,115 sentences. - **Second Stage**: The LSTM model further reduced the number of sentences to 305,118, and the MentalBERT model was used to compare each sentence with 21 different depression symptoms through cosine similarity. ### Results and Discussion: - **Performance Evaluation**: The research team's system performed moderately on multiple evaluation metrics, particularly scoring lower than other top teams in metrics such as Average Precision (AP), R-Precision, Precision at 10, and NDCG at 1000. - **Potential Factors**: The research team believes that the model's performance may be influenced by factors such as training data, evaluation standards, model tuning, and voting methods. ### Conclusion and Future Work: - **Improvement Directions**: Fine-tune the MentalBERT model with task-specific data, optimize model parameters, better understand evaluation metrics, and adjust strategies to improve performance. - **Future Research**: Continue to explore how to use AI technology to more effectively identify depression symptoms in social media texts, providing support for research and applications in the field of mental health.