Detecting Reddit Users with Depression Using a Hybrid Neural Network SBERT-CNN

Ziyi Chen,Ren Yang,Sunyang Fu,Nansu Zong,Hongfang Liu,Ming Huang
DOI: https://doi.org/10.1109/ICHI57859.2023.00035
2024-01-30
Abstract:Depression is a widespread mental health issue, affecting an estimated 3.8% of the global population. It is also one of the main contributors to disability worldwide. Recently it is becoming popular for individuals to use social media platforms (e.g., Reddit) to express their difficulties and health issues (e.g., depression) and seek support from other users in online communities. It opens great opportunities to automatically identify social media users with depression by parsing millions of posts for potential interventions. Deep learning methods have begun to dominate in the field of machine learning and natural language processing (NLP) because of their ease of use, efficient processing, and state-of-the-art results on many NLP tasks. In this work, we propose a hybrid deep learning model which combines a pretrained sentence BERT (SBERT) and convolutional neural network (CNN) to detect individuals with depression with their Reddit posts. The sentence BERT is used to learn the meaningful representation of semantic information in each post. CNN enables the further transformation of those embeddings and the temporal identification of behavioral patterns of users. We trained and evaluated the model performance to identify Reddit users with depression by utilizing the Self-reported Mental Health Diagnoses (SMHD) data. The hybrid deep learning model achieved an accuracy of 0.86 and an F1 score of 0.86 and outperformed the state-of-the-art documented result (F1 score of 0.79) by other machine learning models in the literature. The results show the feasibility of the hybrid model to identify individuals with depression. Although the hybrid model is validated to detect depression with Reddit posts, it can be easily tuned and applied to other text classification tasks and different clinical applications.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the automatic identification of users with depression on Reddit. Specifically, the authors propose a hybrid deep - learning model (SBERT - CNN) that combines pre - trained Sentence - BERT (SBERT) and Convolutional Neural Network (CNN) to detect depression by analyzing users' Reddit posts. This research aims to utilize the large amount of user - generated content on social media platforms to provide technical support for the early detection and intervention of depression. Through this hybrid model, researchers hope to more accurately identify individuals who may be suffering from depression from a large amount of social media data, thereby promoting the effective provision of mental health services. ### Main contributions of the paper - **Proposing a new hybrid model**: The SBERT - CNN model combines the semantic representation ability of SBERT and the behavior pattern recognition ability of CNN, improving the accuracy of depression detection. - **Superior performance**: The experimental results on the Self - reported Mental Health Diagnoses (SMHD) dataset show that the SBERT - CNN model achieves an F1 score of 0.86 in identifying users with depression, significantly outperforming other existing machine - learning and deep - learning models. - **Wide applicability**: The study also points out that this model is not only applicable to the detection of depression, but can also be easily adjusted and applied to other text classification tasks and different clinical applications, such as the identification of anxiety disorders. ### Methods and technical details - **Data source**: The SMHD dataset was used, which contains public posts of users who self - reported having mental health problems (including depression) on Reddit and their matched control group users from January 2006 to December 2017. - **Data pre - processing**: Steps include noise removal, conversion to lowercase, expansion of abbreviations, word segmentation, etc., to improve the purity of the text and the accuracy of the model. - **Model architecture**: - **SBERT**: Used to generate semantic embedding vectors for each post. - **CNN**: Performs further feature extraction and behavior pattern recognition on these embedding vectors. - **Experimental setup**: Model performance was evaluated by setting different post - quantity thresholds (such as the 50th percentile, 75th percentile, 90th percentile, etc.), and Accuracy, Precision, Recall and F1 Score were used as evaluation metrics. ### Conclusion The research shows that the SBERT - CNN model performs excellently in identifying users with depression on Reddit, and its performance is significantly better than existing methods. This provides strong technical support for using social media data for mental health monitoring and intervention. Future work can further explore different - sized SBERT models and combinations with other pre - trained models to further improve the performance of the model.