Abstract:Background: Understanding the multifaceted nature of health outcomes requires a comprehensive examination of the social, economic, and environmental determinants that shape individual well-being. Among these determinants, behavioral factors play a crucial role, particularly the consumption patterns of psychoactive substances, which have important implications on public health. The Global Burden of Disease Study shows a growing impact in disability-adjusted life years due to substance use. The successful identification of patients' substance use information equips clinical care teams to address substance-related issues more effectively, enabling targeted support and ultimately improving patient outcomes. Objective: Traditional natural language processing methods face limitations in accurately parsing diverse clinical language associated with substance use. Large language models offer promise in overcoming these challenges by adapting to diverse language patterns. This study investigates the application of the generative pretrained transformer (GPT) model in specific GPT-3.5 for extracting tobacco, alcohol, and substance use information from patient discharge summaries in zero-shot and few-shot learning settings. This study contributes to the evolving landscape of health care informatics by showcasing the potential of advanced language models in extracting nuanced information critical for enhancing patient care. Methods: The main data source for analysis in this paper is Medical Information Mart for Intensive Care III data set. Among all notes in this data set, we focused on discharge summaries. Prompt engineering was undertaken, involving an iterative exploration of diverse prompts. Leveraging carefully curated examples and refined prompts, we investigate the model's proficiency through zero-shot as well as few-shot prompting strategies. Results: Results show GPT's varying effectiveness in identifying mentions of tobacco, alcohol, and substance use across learning scenarios. Zero-shot learning showed high accuracy in identifying substance use, whereas few-shot learning reduced accuracy but improved in identifying substance use status, enhancing recall and F1-score at the expense of lower precision. Conclusions: Excellence of zero-shot learning in precisely extracting text span mentioning substance use demonstrates its effectiveness in situations in which comprehensive recall is important. Conversely, few-shot learning offers advantages when accurately determining the status of substance use is the primary focus, even if it involves a trade-off in precision. The results contribute to enhancement of early detection and intervention strategies, tailor treatment plans with greater precision, and ultimately, contribute to a holistic understanding of patient health profiles. By integrating these artificial intelligence-driven methods into electronic health record systems, clinicians can gain immediate, comprehensive insights into substance use that results in shaping interventions that are not only timely but also more personalized and effective.

Utilizing Large Language Models to Identify Reddit Users Considering Vaping Cessation for Digital Interventions

Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach

Characterizing Anti-Vaping Posts for Effective Communication on Instagram Using Multimodal Deep Learning

Topics and Sentiment Surrounding Vaping on Twitter and Reddit During the 2019 e-Cigarette and Vaping Use–Associated Lung Injury Outbreak: Comparative Study

Exploring Large Language Models for Detecting Online Vaccine Reactions

Identifying Topics for E-Cigarette User-Generated Contents: A Case Study From Multiple Social Media Platforms

How is Vaping Framed on Online Knowledge Dissemination Platforms?

Understanding the Dynamics between Vaping and Cannabis Legalization Using Twitter Opinions

Using Large Language Models for sentiment analysis of health-related social media data: empirical evaluation and practical tips

A Machine Learning Approach to Identify Predictors of Frequent Vaping and Vulnerable Californian Youth Subgroups

Electronic cigarette usage patterns: a case study combining survey and social media data

Artificial Intelligence Simulation of Adolescents’ Responses to Vaping-Prevention Messages

Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection

Seeing through the smoke : a world-wide comparative study of e-cigarette flavors, brands and markets using data from Reddit and Twitter

Tracking e-cigarette warning label compliance on Instagram with deep learning

Use of large language models as a scalable approach to understanding public health discourse

Identifying e-cigarette content on TikTok: Using a BERTopic Modeling approach

Leveraging Large Language Models and Weak Supervision for Social Media data annotation: an evaluation using COVID-19 self-reported vaccination tweets

US News and Social Media Framing around Vaping

Extraction of Substance Use Information From Clinical Notes: Generative Pretrained Transformer-Based Investigation

Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data