Using Large Language Models to Support Thematic Analysis in Empirical Legal Studies

Jakub Drápal,Hannes Westermann,Jaromir Savelka
2023-10-28
Abstract:Thematic analysis and other variants of inductive coding are widely used qualitative analytic methods within empirical legal studies (ELS). We propose a novel framework facilitating effective collaboration of a legal expert with a large language model (LLM) for generating initial codes (phase 2 of thematic analysis), searching for themes (phase 3), and classifying the data in terms of the themes (to kick-start phase 4). We employed the framework for an analysis of a dataset (n=785) of facts descriptions from criminal court opinions regarding thefts. The goal of the analysis was to discover classes of typical thefts. Our results show that the LLM, namely OpenAI's GPT-4, generated reasonable initial codes, and it was capable of improving the quality of the codes based on expert feedback. They also suggest that the model performed well in zero-shot classification of facts descriptions in terms of the themes. Finally, the themes autonomously discovered by the LLM appear to map fairly well to the themes arrived at by legal experts. These findings can be leveraged by legal researchers to guide their decisions in integrating LLMs into their thematic analyses, as well as other inductive coding projects.
Artificial Intelligence,Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use large - language models (LLMs) in Empirical Legal Studies (ELS) to support Thematic Analysis. Specifically, the author proposes a new framework, aiming to promote effective cooperation between legal experts and large - language models to generate initial codes (the second stage of thematic analysis), find themes (the third stage), and classify data according to themes (the start of the fourth stage). Through this method, the paper hopes to utilize the capabilities of LLMs to assist legal experts in their coding work when dealing with large amounts of text data, thereby improving work efficiency and accuracy. The core problem of the paper is to evaluate the ability of a state - of - the - art large - language model (such as OpenAI's GPT - 4) to support selected stages of thematic analysis, specifically including the following research questions: - **RQ1**: Can the LLM successfully perform initial coding on data? - **RQ2**: Can subject - matter experts improve the quality of initial codes through natural - language feedback? - **RQ3**: Can the LLM successfully predict the themes of data points for analysis? - **RQ4**: Can the LLM discover themes independently and associate these themes with the analyzed data? Through these questions, the paper aims to explore the application potential of LLMs in Empirical Legal Studies, especially how to effectively combine machine learning and human expertise in the process of thematic analysis.