Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

Ankit Aich,Avery Quynh,Pamela Osseyi,Amy Pinkham,Philip Harvey,Brenda Curtis,Colin Depp,Natalie Parde
2024-06-18
Abstract:NLP in mental health has been primarily social media focused. Real world practitioners also have high case loads and often domain specific variables, of which modern LLMs lack context. We take a dataset made by recruiting 644 participants, including individuals diagnosed with Bipolar Disorder (BD), Schizophrenia (SZ), and Healthy Controls (HC). Participants undertook tasks derived from a standardized mental health instrument, and the resulting data were transcribed and annotated by experts across five clinical variables. This paper demonstrates the application of contemporary language models in sequence-to-sequence tasks to enhance mental health research. Specifically, we illustrate how these models can facilitate the deployment of mental health instruments, data collection, and data annotation with high accuracy and scalability. We show that small models are capable of annotation for domain-specific clinical variables, data collection for mental-health instruments, and perform better then commercial large models.
Computation and Language
What problem does this paper attempt to address?
The main problem this paper attempts to address is the use of modern language models (LLMs) to assist in the clinical data collection and annotation for Bipolar Disorder (BD) and Schizophrenia (SZ). Specifically, the paper focuses on the following aspects: 1. **Improving the efficiency and accuracy of data collection and annotation**: Existing methods for collecting mental health data typically rely on manual processes, which are time-consuming and prone to errors. The paper proposes a method based on modern language models to automate data collection and annotation, thereby improving efficiency and accuracy. 2. **Addressing the limitations of existing methods**: Currently, many studies focus on social media data, which have drawbacks such as ethical issues, participant bias, poor generalizability, and reliance on self-disclosure. The paper attempts to overcome these limitations by using clinical data to ensure the quality and reliability of the data. 3. **Validating the effectiveness of small models**: The paper also explores the performance of small language models in annotating domain-specific variables, finding that they can outperform large commercial models like GPT-4 in certain tasks. 4. **Building a complete automated process**: The paper not only demonstrates how to generate high-quality interview dialogues but also shows how to automatically extract and annotate clinical variables from these dialogues, forming an end-to-end automated system. ### Main Contributions - **Dataset**: The paper provides a real-world dataset annotated by clinical experts, focusing on the language and speech deficits of patients with Bipolar Disorder and Schizophrenia. - **Interview Generation Model**: A model was developed that can engage in dialogues with participants to collect data. - **Annotation Generation Model**: A model was developed that can annotate real participant data based on domain-specific variables. - **Performance Evaluation**: By comparing with large commercial language models like GPT-4, the paper demonstrates the proposed models' advantages in terms of low error rates and high accuracy. ### Method Overview - **Data Collection and Annotation**: Interview data from 644 participants, including patients with Bipolar Disorder, Schizophrenia, and healthy controls, were used. The data were annotated by two clinical experts. - **Model Training**: Supervised fine-tuning (SFT) was used to train the models to generate dialogues aligned with real interviews and predict clinical variable scores. - **Performance Evaluation**: The models' performance was evaluated by calculating the syntactic similarity, semantic similarity, and alignment with human dialogues of the generated dialogues, as well as the root mean square error (RMSE) of the predicted scores. ### Conclusion The paper demonstrates the significant potential of modern language models in assisting clinical data collection and annotation, particularly in improving efficiency and accuracy. Despite some limitations, such as a small sample size and the exclusion of multimodal data, the study's results indicate that with appropriate data and training, language models can become powerful tools for clinical researchers.