Using Large Language Models for sentiment analysis of health-related social media data: empirical evaluation and practical tips

Lu He,Samaneh Omranian,Susan McRoy,Kai Zheng
DOI: https://doi.org/10.1101/2024.03.19.24304544
2024-03-20
Abstract:Health-related social media data generated by patients and the public provide valuable insights into patient experiences and opinions toward health issues such as vaccination and medical treatments. Using Natural Language Processing (NLP) methods to analyze such data, however, often requires high-quality annotations that are difficult to obtain. The recent emergence of Large Language Models (LLMs) such as the Generative Pre-trained Transformers (GPTs) has shown promising performance on a variety of NLP tasks in the health domain with little to no annotated data. However, their potential in analyzing health-related social media data remains underexplored. In this paper, we report empirical evaluations of LLMs (GPT-3.5-Turbo, FLAN-T5, and BERT-based models) on a common NLP task of health-related social media data: sentiment analysis for identifying opinions toward health issues. We explored how different prompting and fine-tuning strategies affect the performance of LLMs on social media datasets across diverse health topics, including Healthcare Reform, vaccination, mask wearing, and healthcare service quality. We found that LLMs outperformed VADER, a widely used off-the-shelf sentiment analysis tool, but are far from being able to produce accurate sentiment labels. However, their performance can be improved by data-specific prompts with information about the context, task, and targets. The highest performing LLMs are BERT-based models that were fine-tuned on aggregated data. We provided practical tips for researchers to use LLMs on health-related social media data for optimal outcomes. We also discuss future work needed to continue to improve the performance of LLMs for analyzing health-related social media data with minimal annotations.
Health Informatics
What problem does this paper attempt to address?
This paper focuses on the application and effectiveness of large-scale language models (LLMs) in sentiment analysis of health-related social media data. The researchers evaluated the performance of GPT-3.5-Turbo, FLAN-T5, and BERT-based models in sentiment analysis tasks for identifying health issue viewpoints. They found that LLMs performed better than the commonly used off-the-shelf tool VADER in certain cases, but accuracy still needs improvement. LLMs can be enhanced by specific data prompts and fine-tuning strategies, such as providing context, task, and objective information. After fine-tuning the BERT base model on aggregated data, it demonstrated the best performance. The paper points out that health data on social media provides valuable patient experiences and perspectives on health issues, but analyzing this data using natural language processing (NLP) often requires high-quality annotated data, which is difficult to obtain. Although LLMs have shown potential in various NLP tasks with little or no annotated data, their potential in analyzing health-related social media data has not been fully explored. The researchers conducted multiple experiments, including zero-shot and few-shot settings, to compare the effects of different prompts and fine-tuning methods on LLMs' performance. The experiments showed that although LLMs outperformed VADER on some datasets, their accuracy in inferring sentiment towards health topics is still insufficient. LLMs' performance significantly improved with data-specific prompts. Additionally, the study found that the BERT-based model performed the best after fine-tuning on integrated data. The paper also discusses future work on how to further improve LLMs' performance in analyzing health-related social media data, especially in limited annotated data scenarios. Overall, this work provides practical recommendations for researchers to optimize the application of LLMs in this field and emphasizes the need for further research to overcome challenges in analyzing such data.