Correlating Twitter Language with Community-Level Health Outcomes

Arno Schneuwly,Ralf Grubenmann,Séverine Rion Logean,Mark Cieliebak,Martin Jaggi
DOI: https://doi.org/10.48550/arXiv.1906.06465
2019-06-25
Abstract:We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.
Computation and Language,Machine Learning,Social and Information Networks
What problem does this paper attempt to address?