Abstract:BACKGROUND: Health care social media used for health information exchange and emotional communication involves different types of users, including patients, caregivers, and health professionals. However, it is difficult to identify different stakeholders because user identification data are lacking due to privacy protection and proprietary interests. Therefore, identifying the concerns of different stakeholders and how they use health care social media when confronted with huge amounts of health-related messages posted by users is a critical problem.OBJECTIVE: We aimed to develop a new content analysis method using text mining techniques applied in health care social media to (1) identify different health care stakeholders, (2) determine hot topics of concern, and (3) measure sentiment expression by different stakeholders.METHODS: We collected 138,161 messages posted by 39,606 members in lung cancer, diabetes, and breast cancer forums in the online community MedHelp.org over 10 years (January 2007 to October 2016) as experimental data. We used text mining techniques to process text data to identify different stakeholders and determine health-related hot topics, and then analyzed sentiment expression.RESULTS: We identified 3 significantly different stakeholder groups using expectation maximization clustering (3 performance metrics: Rand=0.802, Jaccard=0.393, Fowlkes-Mallows=0.537; P<.001), in which patients (24,429/39,606, 61.68%) and caregivers (12,232/39,606, 30.88%) represented the majority of the population, in contrast to specialists (2945/39,606, 7.43%). We identified 5 significantly different health-related topics: symptom, examination, drug, procedure, and complication (Rand=0.783, Jaccard=0.369, Fowlkes-Mallows=0.495; P<.001). Patients were concerned most about symptom topics related to lung cancer (536/1657, 32.34%), drug topics related to diabetes (1883/5904, 31.89%), and examination topics related to breast cancer (8728/23,934, 36.47%). By comparison, caregivers were more concerned about drug topics related to lung cancer (300/2721, 11.03% vs 109/1657, 6.58%), procedure topics related to breast cancer (3952/13,954, 28.32% vs 5822/23,934, 24.33%), and complication topics (4449/25,701, 17.31% vs 4070/31,495, 12.92%). In addition, patients (9040/36,081, 25.05%) were more likely than caregivers (2659/18,470, 14.39%) and specialists (17,943/83,610, 21.46%) to express their emotions. However, patients' sentiment intensity score (2.46) was lower than those of caregivers (4.66) and specialists (5.14). In particular, for caregivers, negative sentiment scores were higher than positive scores (2.56 vs 2.18), with the opposite among specialists (2.62 vs 2.46). Overall, the proportion of negative messages was greater than that of positive messages related to symptom, complication, and examination. The pattern was opposite for drug and procedure topics. A trend analysis showed that patients and caregivers gradually changed their emotional state in a positive direction.CONCLUSIONS: The hot topics of interest and sentiment expression differed significantly among different stakeholders in different disease forums. These findings could help improve social media services to facilitate diverse stakeholder engagement for health information sharing and social interaction more effectively.

On Mining Latent Topics from Healthcare Chat Logs.

Query Subtopic Mining Via Subtractive Initialization of Non-negative Sparse Latent Semantic Analysis

Mining Twitter to Assess the Determinants of Health Behavior towards Palliative Care in the United States.

Analyzing Patient Experience on Weibo: Machine Learning Approach to Topic Modeling and Sentiment Analysis

On mining latent treatment patterns from electronic medical records

Multiple-Perspective Data-Driven Analysis of Online Health Communities

From Text to Topics in Healthcare Records: An Unsupervised Graph Partitioning Methodology

A latent topic model for mining heterogenous non-randomly missing electronic health records data

Enriching Consumer Health Vocabulary Through Mining a Social Q&A Site: A Similarity-Based Approach

Understanding Health Care Social Media Use From Different Stakeholder Perspectives: A Content Analysis of an Online Health Community

Dynamic Semantic Clustering Approach For Web User Interest

Discovering key topics from short, real-world medical inquiries via natural language processing and unsupervised learning

Mining Patterns of Disease Progression: A Topic-Model-Based Approach.

Finding Users' Voice on Social Media: an Investigation of Online Support Groups for Autism-Affected Users on Facebook.

Mining User Profiles from Query Log

Network-based modeling and intelligent data mining of social media for improving care

Unsupervised Machine Learning for the Discovery of Latent Disease Clusters and Patient Subgroups Using Electronic Health Records

Evaluation of clustering and topic modeling methods over health-related tweets and emails

Social media mining for identification and exploration of health-related information from pregnant women

Utilizing Electronic Medical Records to Discover Changing Trends of Medical Behaviors over Time.

The detection of community health surveillance using distributed semantic assisted non-negative matrix factorization on topic modeling through sentiment analysis