Mining user-generated content in an online smoking cessation community to identify smoking status: A machine learning approach.

Xi Wang,Kang Zhao,Sarah Cha,Michael S. Amato,Amy M. Cohn,Jennifer Pearson,George D. Papandonatos,Amanda Lenhart
DOI: https://doi.org/10.1016/j.dss.2018.10.005
IF: 6.969
2019-01-01
Decision Support Systems
Abstract:Online smoking cessation communities help hundreds of thousands of smokers quit smoking and stay abstinent each year. Content shared by users of such communities may contain important information that could enable more effective and personally tailored cessation treatment recommendations. This study demonstrates a novel approach to determine individuals' smoking status by applying machine learning techniques to classify user-generated content in an online cessation community. Study data were from BecomeAnEX.org, a large, online smoking cessation community. We extracted three types of novel features from a post: domain-specific features, author-based features, and thread-based features. These features helped to improve the smoking status identification (quit vs. not) performance by 9.7% compared to using only text features of a post's content. In other words, knowledge from domain experts, data regarding the post author's patterns of online engagement, and other community member reactions to the post can help to determine the focal post author's smoking status, over and above the actual content of a focal post. We demonstrated that machine learning methods can be applied to user-generated data from online cessation communities to validly and reliably discern important user characteristics, which could aid decision support on intervention tailoring.
What problem does this paper attempt to address?