LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis

Shih-Chieh Dai,Aiping Xiong,Lun-Wei Ku
2023-10-24
Abstract:Thematic analysis (TA) has been widely used for analyzing qualitative data in many disciplines and fields. To ensure reliable analysis, the same piece of data is typically assigned to at least two human coders. Moreover, to produce meaningful and useful analysis, human coders develop and deepen their data interpretation and coding over multiple iterations, making TA labor-intensive and time-consuming. Recently the emerging field of large language models (LLMs) research has shown that LLMs have the potential replicate human-like behavior in various tasks: in particular, LLMs outperform crowd workers on text-annotation tasks, suggesting an opportunity to leverage LLMs on TA. We propose a human-LLM collaboration framework (i.e., LLM-in-the-loop) to conduct TA with in-context learning (ICL). This framework provides the prompt to frame discussions with a LLM (e.g., GPT-3.5) to generate the final codebook for TA. We demonstrate the utility of this framework using survey datasets on the aspects of the music listening experience and the usage of a password manager. Results of the two case studies show that the proposed framework yields similar coding quality to that of human coders but reduces TA's labor and time demands.
Computation and Language
What problem does this paper attempt to address?
The paper primarily explores how to leverage Large Language Models (LLM) to enhance the efficiency and effectiveness of Thematic Analysis (TA). Specifically, the paper attempts to address the following key issues: 1. **Improving TA Efficiency**: Traditional thematic analysis methods typically require at least 2 human coders with relevant expertise to participate in the entire process, which is both time-consuming and labor-intensive. The paper proposes a human-LLM collaboration framework (LLM-in-the-loop) aimed at reducing the human and time resources needed for TA. 2. **Ensuring Analysis Quality**: To ensure the reliability of the analysis, traditionally the same data is assigned to at least 2 coders for independent coding. By introducing LLM as a Machine Coder (MC) and collaborating with a Human Coder (HC), the paper explores whether this collaborative model can maintain coding quality while reducing human resource requirements. 3. **Addressing LLM Input Limitations**: Considering the input size limitations of LLMs when processing long texts, the paper proposes a solution of using only a portion of the data to generate the codebook to tackle this challenge. In summary, the main objective of this study is to verify whether a more efficient thematic analysis method can be designed by combining human and LLM capabilities, while maintaining or even improving the quality of the analysis. Through two case studies—Music Shuffle and Password Manager usage surveys—the paper demonstrates that the proposed framework can effectively reduce the time and labor costs of TA while achieving a quality of work comparable to that of 2 human coders.