Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts

José Alberto Benítez-Andrades, María Teresa García-Ordás, Mayra Russo, Ahmad Sakor, Luis Daniel Fernandes Rotger, Maria-Esther Vidal
DOI: https://doi.org/10.3233/SW-223269
2024-02-09
Abstract:Social networks are vital for information sharing, especially in the health sector for discussing diseases and treatments. These platforms, however, often feature posts as brief texts, posing challenges for Artificial Intelligence (AI) in understanding context. We introduce a novel hybrid approach combining community-maintained knowledge graphs (like Wikidata) with deep learning to enhance the categorization of social media posts. This method uses advanced entity recognizers and linkers (like Falcon 2.0) to connect short post entities to knowledge graphs. Knowledge graph embeddings (KGEs) and contextualized word embeddings (like BERT) are then employed to create rich, context-based representations of these posts. Our focus is on the health domain, particularly in identifying posts related to eating disorders (e.g., anorexia, bulimia) to aid healthcare providers in early diagnosis. We tested our approach on a dataset of 2,000 tweets about eating disorders, finding that merging word embeddings with knowledge graph information enhances the predictive models' reliability. This methodology aims to assist health experts in spotting patterns indicative of mental disorders, thereby improving early detection and accurate diagnosis for personalized medicine.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the problem of effectively classifying short texts in social media posts that involve eating disorders (such as anorexia or bulimia). Specifically, the main goal of the study is to generate a vector embedding representation that can encode contextual knowledge from structured data in knowledge graphs (such as Wikidata) and unstructured corpora (such as scientific publications or social media posts). By combining these two types of contextual knowledge, the authors aim to enhance the accuracy of machine learning models in detecting patterns of eating disorders from social media posts. To achieve this goal, the paper proposes a hybrid framework that combines word embeddings generated by a pre-trained BERT model and knowledge graph embeddings learned from Wikidata. In this way, the resulting embeddings (referred to as Context-Based Embeddings, CBEs) are used to train and test predictive models to accomplish specific classification tasks. Additionally, the paper evaluates the performance of its method on a dataset containing 2000 tweets related to eating disorders and demonstrates how combining contextual knowledge from both structured and unstructured data can improve the model's performance.