Abstract:Semi-structured interviews (SSIs) are a commonly employed data-collection method in healthcare research, offering in-depth qualitative insights into subject experiences. Despite their value, the manual analysis of SSIs is notoriously time-consuming and labor-intensive, in part due to the difficulty of extracting and categorizing emotional responses, and challenges in scaling human evaluation for large populations. In this study, we develop RACER, a Large Language Model (LLM) based expert-guided automated pipeline that efficiently converts raw interview transcripts into insightful domain-relevant themes and sub-themes. We used RACER to analyze SSIs conducted with 93 healthcare professionals and trainees to assess the broad personal and professional mental health impacts of the COVID-19 crisis. RACER achieves moderately high agreement with two human evaluators (72%), which approaches the human inter-rater agreement (77%). Interestingly, LLMs and humans struggle with similar content involving nuanced emotional, ambivalent/dialectical, and psychological statements. Our study highlights the opportunities and challenges in using LLMs to improve research efficiency and opens new avenues for scalable analysis of SSIs in healthcare research.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the manual analysis problem of Semi-Structured Interviews (SSIs) in healthcare research. Specifically, while SSIs can provide in-depth qualitative insights, manually analyzing these interview data is very time-consuming and labor-intensive. The challenges of extracting and categorizing emotional responses and the large-scale assessment of human evaluations make this process particularly difficult. To solve these problems, the authors developed RACER (an expert-guided automated pipeline based on Large Language Models (LLM)), which can efficiently convert raw interview transcripts into meaningful themes and sub-themes. Using RACER, the authors analyzed SSIs of 93 healthcare professionals and trainees to assess the impact of the COVID-19 crisis on their personal and professional mental health. ### Main Contributions 1. **Increased Efficiency**: RACER significantly improves the efficiency of SSIs analysis by automating the processing of large amounts of interview data. 2. **High Consistency**: RACER achieved a consistency of 72% with two human evaluators, close to the 77% consistency between human evaluators. 3. **Handling Complex Emotions**: Although LLM and humans face similar challenges in dealing with complex emotions, contradictory/dialectical statements, and psychological states, RACER demonstrated potential and limitations in these areas. 4. **Extended Applications**: RACER opens new avenues for large-scale analysis of SSIs in healthcare research. ### Research Background - **Semi-Structured Interviews (SSIs)**: Widely used in healthcare research, providing in-depth qualitative insights, but manual analysis is time-consuming and resource-intensive. - **Large Language Models (LLM)**: Models like GPT-4 offer new methods for extracting and interpreting data from text corpora. - **COVID-19 Crisis**: Brought significant personal and professional challenges to healthcare workers, including fear of infecting family members, grief over patient deaths, and moral dilemmas in resource allocation. ### Method - **RACER Pipeline**: 1. **Retrieve**: Use LLM to extract relevant responses from interview transcripts. 2. **Aggregate**: Summarize responses from all interviewees. 3. **Cluster with Expert Guidance**: Cluster responses into themes and sub-themes with expert guidance. 4. **Re-cluster**: Run the clustering process multiple times, determining the final clustering results through majority voting. ### Results - **Emotional and Psychological Impact**: Most respondents reported negative emotions such as anxiety, stress, sadness, or anger, but some expressed positive emotions like gratitude. - **Support and Coping Strategies**: Most respondents felt support from colleagues and family, with family dynamics also affected. - **Work Impact**: Most healthcare workers experienced increased working hours and changes in patient management methods. - **Future Outlook**: Some respondents were optimistic about the future, hoping to learn new opportunities and growth from the crisis; others were concerned about long-term personal and professional impacts. ### Discussion - **Advantages**: RACER significantly improves the efficiency and scalability of SSIs analysis. - **Limitations**: Both RACER and human evaluators face similar challenges in handling complex emotions and psychological states, highlighting the indispensable role of human expertise in reviewing and interpreting LLM outputs. In conclusion, this paper demonstrates the potential of RACER in analyzing SSIs in healthcare research, while also pointing out current limitations and future research directions.

RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews

Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?

Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective Approach

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study

Can AI Relate: Testing Large Language Model Response for Mental Health Support

Automating PTSD Diagnostics in Clinical Interviews: Leveraging Large Language Models for Trauma Assessments

Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

Scalable information extraction from free text electronic health records using large language models

A toolbox for surfacing health equity harms and biases in large language models

Psy-LLM: Scaling up Global Mental Health Psychological Services with AI-based Large Language Models

A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health

PARIKSHA: A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data

Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization

A Framework for Human Evaluation of Large Language Models in Healthcare Derived from Literature Review

Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis

Enhancing health assessments with large language models: A methodological approach

Bias patterns in the application of LLMs for clinical decision support: A comprehensive study

Supporting the Demand on Mental Health Services with AI-Based Conversational Large Language Models (LLMs)

Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models

LLM Questionnaire Completion for Automatic Psychiatric Assessment