Comparing Themes Extracted via Topic Modeling and Manual Content Analysis: Korean-Language Discussions of Dementia on Twitter

Haeyoung Lee,Sun Joo Jang,Frederick F Sun,Peter Broadwell,Sunmoo Yoon
DOI: https://doi.org/10.3233/SHTI220704
2022-06-29
Abstract:We randomly examined Korean-language Tweets mentioning dementia/Alzheimer's disease (n= 12,413) posted from November 28 to December 9, 2020, without limiting geographical locations. We independently applied Latent Dirichlet Allocation (LDA) topic modeling and qualitative content analysis to the texts of the Tweets. We compared the themes extracted by LDA topic modeling to those identified via manual coding methods. A total of 16 themes were detected from manual coding, with inter-rater reliability (Cohen's kappa) of 0.842. The proportions of the most prominent themes were: burdens of family caregiving (48.50%), reports of wandering/missing family members with dementia (18.12%), stigma (13.64%), prevention strategies (5.07%), risk factors (4.91%), healthcare policy (3.26%), and elder abuse/safety issues (1.75%). Seven themes whose contents were similar to themes derived from manual coding were extracted from the LDA topic modeling results (perplexity: -6.39, coherence score: 0.45). Our findings suggest that applying LDA topic modeling can be fairly effective at extracting themes from Korean Twitter discussions, in a manner analogous to qualitative coding, to gain insights regarding caregiving for family members with dementia, and our approach can be applied to other languages.
What problem does this paper attempt to address?