Abstract:Social media is a potential source of information that infers latent mental states through Natural Language Processing (NLP). While narrating real-life experiences, social media users convey their feeling of loneliness or isolated lifestyle, impacting their mental well-being. Existing literature on psychological theories points to loneliness as the major consequence of interpersonal risk factors, propounding the need to investigate loneliness as a major aspect of mental disturbance. We formulate lonesomeness detection in social media posts as an explainable binary classification problem, discovering the users at-risk, suggesting the need of resilience for early control. To the best of our knowledge, there is no existing explainable dataset, i.e., one with human-readable, annotated text spans, to facilitate further research and development in loneliness detection causing mental disturbance. In this work, three experts: a senior clinical psychologist, a rehabilitation counselor, and a social NLP researcher define annotation schemes and perplexity guidelines to mark the presence or absence of lonesomeness, along with the marking of text-spans in original posts as explanation, in 3,521 Reddit posts. We expect the public release of our dataset, LonXplain, and traditional classifiers as baselines via GitHub.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the relationship between users expressing lonesomeness on social media and mental health issues. Specifically, the authors focus on how to detect users' lonesomeness from Reddit posts using natural language processing (NLP) techniques and consider it as an important indicator of mental health problems. The main objectives of the paper include: 1. **Constructing an interpretable dataset**: Creating an explanatory dataset containing human-readable annotated text fragments to facilitate research and development in lonesomeness detection. 2. **Early identification of high-risk users**: Detecting lonesomeness to identify high-risk users who may need psychological intervention, thereby preventing potential self-harm or suicidal behavior. 3. **Formulating annotation schemes**: Developing detailed annotation guidelines and complexity guidelines by combining the expertise of clinical psychologists, rehabilitation counselors, and social NLP researchers to ensure consistency and reliability of annotations. ### Background and Motivation According to data from the World Health Organization, 1 in 3 elderly people feels lonely. Loneliness not only affects the physical and mental health of the elderly but also impacts their quality of life and lifespan. Additionally, about 61% of people in the United States report feeling lonely, up from 54% in 2018. During the pandemic, the elderly faced higher health risks due to prolonged isolation. Lonesomeness is closely related to mental health issues such as depression, anxiety, and stress, affecting cognitive function, sleep quality, and overall well-being. ### Methods and Contributions 1. **Data Collection**: Collected 3521 posts through the Reddit API, primarily from subreddits related to depression and suicide risk. 2. **Annotation Scheme**: Developed detailed annotation guidelines, including three clinical questionnaires (UCLA Loneliness Scale, De Jong Gierveld Loneliness Scale, and Loneliness and Social Dissatisfaction Scale) to mark the presence of lonesomeness and its explanatory text fragments. 3. **Data Annotation**: Data annotation was performed by graduate students trained by three experts, and the reliability of the dataset was ensured through Fleiss' Kappa consistency study. 4. **Model Evaluation**: Experiments were conducted using various classifiers (such as Word2Vec, GloVe + LSTM, GloVe + BiLSTM, GloVe + GRU, GloVe + BiGRU) and the interpretability of the models was evaluated using the LIME method. ### Main Findings 1. **Dataset Statistics**: In the LonXplain dataset, 54.71% of the posts were marked as containing lonesomeness. 2. **Model Performance**: The GloVe + BiGRU model performed the best among all recurrent neural network models, with an F1 score of 0.77 and an accuracy of 0.78. 3. **Interpretability**: The interpretability results provided by the LIME method indicate that the model can effectively capture key text fragments that lead to lonesomeness. ### Conclusion and Future Work The paper successfully constructed a new, interpretable lonesomeness detection dataset, LonXplain, and demonstrated its potential in the early identification of mental health issues. Future work will include increasing the sample size of the dataset, developing new models specifically for lonesomeness detection, and further improving the interpretability of the models. Additionally, the authors emphasize ethical considerations to ensure the privacy and anonymity of the data.

LonXplain: Lonesomeness as a Consequence of Mental Disturbance in Reddit Posts

An Annotated Dataset for Explainable Interpersonal Risk Factors of Mental Disturbance in Social Media Posts

LOST: A Mental Health Dataset of Low Self-esteem in Reddit Posts

Reliability Analysis of Psychological Concept Extraction and Classification in User-penned Text

WellXplain: Wellness Concept Extraction and Classification in Reddit Posts for Mental Health Analysis

Many Ways to Be Lonely: Fine-Grained Characterization of Loneliness and Its Potential Changes in COVID-19

Mental Health Analysis in Social Media Posts: A Survey

Identifying discernible indications of psychological well-being using ML: explainable AI in reddit social media interactions

Examining the Public Messaging on 'Loneliness' over Social Media: An Unsupervised Machine Learning Analysis of Twitter Posts over the Past Decade

Analyzing Online Conversations on Reddit: A Study of Stress and Anxiety Through Topic Modeling and Sentiment Analysis

Mental Health Diagnosis in the Digital Age: Harnessing Sentiment Analysis on Social Media Platforms upon Ultra-Sparse Feature Content

Exploring Social Media Posts for Depression Identification: A Study on Reddit Dataset

Reddit social media text analysis for depression prediction: using logistic regression with enhanced term frequency-inverse document frequency features

Social and Web Data Framework for Understanding Loneliness

Detection of Depression-Related Posts in Reddit Social Media Forum

Understanding Mental Health Issues in Different Subdomains of Social Networking Services: Computational Analysis of Text-Based Reddit Posts

Head versus heart: social media reveals differential language of loneliness from depression

LonelyText: A Short Messaging Based Classification of Loneliness

MultiWD: Multi-label wellness dimensions in social media posts

Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data

Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study