Abstract:BACKGROUND Eating disorders (EDs) are a group of mental illnesses that have an adverse effect on both mental and physical health. As social media platforms (eg, Twitter) have become an important data source for public health research, some studies have qualitatively explored the ways in which EDs are discussed on these platforms. Initial results suggest that such research offers a promising method for further understanding this group of diseases. Nevertheless, an efficient computational method is needed to further identify and analyze tweets relevant to EDs on a larger scale. OBJECTIVE This study aims to develop and validate a machine learning–based classifier to identify tweets related to EDs and to explore factors (ie, topics) related to EDs using a topic modeling method. METHODS We collected potential ED-relevant tweets using keywords from previous studies and annotated these tweets into different groups (ie, ED relevant vs irrelevant and then promotional information vs laypeople discussion). Several supervised machine learning methods, such as convolutional neural network (CNN), long short-term memory (LSTM), support vector machine, and naïve Bayes, were developed and evaluated using annotated data. We used the classifier with the best performance to identify ED-relevant tweets and applied a topic modeling method—Correlation Explanation (CorEx)—to analyze the content of the identified tweets. To validate these machine learning results, we also collected a cohort of ED-relevant tweets on the basis of manually curated rules. RESULTS A total of 123,977 tweets were collected during the set period. We randomly annotated 2219 tweets for developing the machine learning classifiers. We developed a CNN-LSTM classifier to identify ED-relevant tweets published by laypeople in 2 steps: first relevant versus irrelevant (F1 score=0.89) and then promotional versus published by laypeople (F1 score=0.90). A total of 40,790 ED-relevant tweets were identified using the CNN-LSTM classifier. We also identified another set of tweets (ie, 17,632 ED-relevant and 83,557 ED-irrelevant tweets) posted by laypeople using manually specified rules. Using CorEx on all ED-relevant tweets, the topic model identified 162 topics. Overall, the coherence rate for topic modeling was 77.07% (1264/1640), indicating a high quality of the produced topics. The topics were further reviewed and analyzed by a domain expert. CONCLUSIONS A developed CNN-LSTM classifier could improve the efficiency of identifying ED-relevant tweets compared with the traditional manual-based method. The CorEx topic model was applied on the tweets identified by the machine learning–based classifier and the traditional manual approach separately. Highly overlapping topics were observed between the 2 cohorts of tweets. The produced topics were further reviewed by a domain expert. Some of the topics identified by the potential ED tweets may provide new avenues for understanding this serious set of disorders.

Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts

A Novel Site-Agnostic Multimodal Deep Learning Model to Identify Pro-Eating Disorder Content on Social Media

Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)-Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study

Approaching what and how people with mental disorders communicate in social media–Introducing a multi-channel representation

RevealED: Uncovering Pro-Eating Disorder Content on Twitter Using Deep Learning

Exploring Eating Disorder Topics on Twitter: Machine Learning Approach.

Investigating machine learning and natural language processing techniques applied for detecting eating disorders: a systematic literature review

Detecting and Characterizing Eating-Disorder Communities on Social Media

Leveraging Medical Knowledge Graphs and Large Language Models for Enhanced Mental Disorder Information Extraction

Novel Transformer Based Contextualized Embedding and Probabilistic Features for Depression Detection From Social Media

Machine learning and natural language processing to assess the emotional impact of influencers’ mental health content on Instagram

Towards Preemptive Detection of Depression and Anxiety in Twitter

Towards Safer Online Spaces: Simulating and Assessing Intervention Strategies for Eating Disorder Discussions

Depression detection in social media posts using transformer-based models and auxiliary features

Exploring Hybrid and Ensemble Models for Multiclass Prediction of Mental Health Status on Social Media

A Novel Text Mining Approach for Mental Health Prediction Using Bi-LSTM and BERT Model

Explainable Depression Symptom Detection in Social Media

Detecting mental disorder on social media: a ChatGPT-augmented explainable approach

Quantifying Mental Health from Social Media with Neural User Embeddings

Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation