LonXplain: Lonesomeness as a Consequence of Mental Disturbance in Reddit Posts

Muskan Garg,Chandni Saxena,Debabrata Samanta,Bonnie J. Dorr
2023-05-30
Abstract:Social media is a potential source of information that infers latent mental states through Natural Language Processing (NLP). While narrating real-life experiences, social media users convey their feeling of loneliness or isolated lifestyle, impacting their mental well-being. Existing literature on psychological theories points to loneliness as the major consequence of interpersonal risk factors, propounding the need to investigate loneliness as a major aspect of mental disturbance. We formulate lonesomeness detection in social media posts as an explainable binary classification problem, discovering the users at-risk, suggesting the need of resilience for early control. To the best of our knowledge, there is no existing explainable dataset, i.e., one with human-readable, annotated text spans, to facilitate further research and development in loneliness detection causing mental disturbance. In this work, three experts: a senior clinical psychologist, a rehabilitation counselor, and a social NLP researcher define annotation schemes and perplexity guidelines to mark the presence or absence of lonesomeness, along with the marking of text-spans in original posts as explanation, in 3,521 Reddit posts. We expect the public release of our dataset, LonXplain, and traditional classifiers as baselines via GitHub.
Computation and Language,Social and Information Networks
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the relationship between users expressing lonesomeness on social media and mental health issues. Specifically, the authors focus on how to detect users' lonesomeness from Reddit posts using natural language processing (NLP) techniques and consider it as an important indicator of mental health problems. The main objectives of the paper include: 1. **Constructing an interpretable dataset**: Creating an explanatory dataset containing human-readable annotated text fragments to facilitate research and development in lonesomeness detection. 2. **Early identification of high-risk users**: Detecting lonesomeness to identify high-risk users who may need psychological intervention, thereby preventing potential self-harm or suicidal behavior. 3. **Formulating annotation schemes**: Developing detailed annotation guidelines and complexity guidelines by combining the expertise of clinical psychologists, rehabilitation counselors, and social NLP researchers to ensure consistency and reliability of annotations. ### Background and Motivation According to data from the World Health Organization, 1 in 3 elderly people feels lonely. Loneliness not only affects the physical and mental health of the elderly but also impacts their quality of life and lifespan. Additionally, about 61% of people in the United States report feeling lonely, up from 54% in 2018. During the pandemic, the elderly faced higher health risks due to prolonged isolation. Lonesomeness is closely related to mental health issues such as depression, anxiety, and stress, affecting cognitive function, sleep quality, and overall well-being. ### Methods and Contributions 1. **Data Collection**: Collected 3521 posts through the Reddit API, primarily from subreddits related to depression and suicide risk. 2. **Annotation Scheme**: Developed detailed annotation guidelines, including three clinical questionnaires (UCLA Loneliness Scale, De Jong Gierveld Loneliness Scale, and Loneliness and Social Dissatisfaction Scale) to mark the presence of lonesomeness and its explanatory text fragments. 3. **Data Annotation**: Data annotation was performed by graduate students trained by three experts, and the reliability of the dataset was ensured through Fleiss' Kappa consistency study. 4. **Model Evaluation**: Experiments were conducted using various classifiers (such as Word2Vec, GloVe + LSTM, GloVe + BiLSTM, GloVe + GRU, GloVe + BiGRU) and the interpretability of the models was evaluated using the LIME method. ### Main Findings 1. **Dataset Statistics**: In the LonXplain dataset, 54.71% of the posts were marked as containing lonesomeness. 2. **Model Performance**: The GloVe + BiGRU model performed the best among all recurrent neural network models, with an F1 score of 0.77 and an accuracy of 0.78. 3. **Interpretability**: The interpretability results provided by the LIME method indicate that the model can effectively capture key text fragments that lead to lonesomeness. ### Conclusion and Future Work The paper successfully constructed a new, interpretable lonesomeness detection dataset, LonXplain, and demonstrated its potential in the early identification of mental health issues. Future work will include increasing the sample size of the dataset, developing new models specifically for lonesomeness detection, and further improving the interpretability of the models. Additionally, the authors emphasize ethical considerations to ensure the privacy and anonymity of the data.