Abstract:Background Large language models (LLMs) are generative artificial intelligence that have ignited much interest and discussion about their utility in clinical and research settings. Despite this interest there is sparse analysis of their use in qualitative thematic analysis comparing their current ability to that of human coding and analysis. In addition, there has been no published analysis of their use in real-world, protected health information. Objective Here we fill that gap in the literature by comparing an LLM to standard human thematic analysis in real-world, semi-structured interviews of both patients and clinicians within a psychiatric setting. Methods Using a 70 billion parameter open-source LLM running on local hardware and advanced prompt engineering techniques, we produced themes that summarized a full corpus of interviews in minutes. Subsequently we used three different evaluation methods for quantifying similarity between themes produced by the LLM and those produced by humans. Results These revealed similarities ranging from moderate to substantial (Jaccard similarity coefficients 0.44–0.69), which are promising preliminary results. Conclusion Our study demonstrates that open-source LLMs can effectively generate robust themes from qualitative data, achieving substantial similarity to human-generated themes. The validation of LLMs in thematic analysis, coupled with evaluation methodologies, highlights their potential to enhance and democratize qualitative research across diverse fields.

An Examination of the Use of Large Language Models to Aid Analysis of Textual Data

An Examination of the Use of Large Language Models to Aid Analysis of Textual Data

Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding

LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis

Large Language Model for Qualitative Research -- A Systematic Mapping Study

Large Language Models in Qualitative Research: Can We Do the Data Justice?

Using Large Language Models for Qualitative Analysis can Introduce Serious Bias

Using Large Language Models to Support Thematic Analysis in Empirical Legal Studies

Apprentices to Research Assistants: Advancing Research with Large Language Models

Applications and Implications of Large Language Models in Qualitative Analysis: A New Frontier for Empirical Software Engineering

Leveraging Large Language Models for Automating Inductive Qualitative Coding: A Comparative Study of Prompt Engineering Techniques

Evaluating Large Language Models in Analysing Classroom Dialogue

Performing an Inductive Thematic Analysis of Semi-Structured Interviews With a Large Language Model: An Exploration and Provocation on the Limits of the Approach

Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis

Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning Matches Human Performance in Some Hermeneutic Tasks

Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?