Towards Interpretable Mental Health Analysis with Large Language Models

Kailai Yang,Shaoxiong Ji,Tianlin Zhang,Qianqian Xie,Ziyan Kuang,Sophia Ananiadou
2023-10-11
Abstract:The latest large language models (LLMs) such as ChatGPT, exhibit strong capabilities in automated mental health analysis. However, existing relevant studies bear several limitations, including inadequate evaluations, lack of prompting strategies, and ignorance of exploring LLMs for explainability. To bridge these gaps, we comprehensively evaluate the mental health analysis and emotional reasoning ability of LLMs on 11 datasets across 5 tasks. We explore the effects of different prompting strategies with unsupervised and distantly supervised emotional information. Based on these prompts, we explore LLMs for interpretable mental health analysis by instructing them to generate explanations for each of their decisions. We convey strict human evaluations to assess the quality of the generated explanations, leading to a novel dataset with 163 human-assessed explanations. We benchmark existing automatic evaluation metrics on this dataset to guide future related works. According to the results, ChatGPT shows strong in-context learning ability but still has a significant gap with advanced task-specific methods. Careful prompt engineering with emotional cues and expert-written few-shot examples can also effectively improve performance on mental health analysis. In addition, ChatGPT generates explanations that approach human performance, showing its great potential in explainable mental health analysis.
Computation and Language
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the limitations of current large - language models (LLMs) in mental health analysis, including insufficient evaluation, lack of effective prompting strategies, and neglect of the exploration of model interpretability. Specifically: 1. **Performance and Interpretability**: Although the latest large - language models such as ChatGPT have shown strong capabilities in automatic mental health analysis, there are several main problems in existing research: - **Insufficient Evaluation**: Most existing research has only been tested on a few binary - classification mental health condition detection tasks, lacking a comprehensive evaluation of more complex tasks (such as emotion reasoning and cause detection). - **Lack of Prompting Strategies**: Most research uses simple prompts to directly detect mental health conditions, ignoring the use of useful information such as emotional cues. - **Lack of Interpretability**: Existing research rarely explores how to generate interpretable mental health analysis results through large - language models, lacking transparency and credibility. 2. **Research Objectives**: - **RQ1**: How capable are large - language models in general mental health analysis and emotion reasoning in zero - shot / few - shot settings? - **RQ2**: How do different prompting strategies and emotional cues affect ChatGPT's mental health analysis capabilities? - **RQ3**: Can ChatGPT generate reasonable explanations for its mental health analysis decisions? To answer these research questions, the author has carried out the following work: - **Preliminary Research**: Evaluate the performance of four large - language models of different scales (including ChatGPT, InstructGPT - 3, LLaMA - 13B, and LLaMA - 7B) on mental health analysis and emotion reasoning tasks. - **Prompting Strategies**: Systematically explore different prompting strategies, including zero - shot prompting, chain - of - thought (CoT) prompting, emotion - enhanced prompting, and few - shot text emotion - enhanced prompting. - **Interpretability Exploration**: Instruct two representative models (ChatGPT and InstructGPT - 3) to generate natural - language explanations, and conduct manual evaluation through a strict annotation protocol, creating a new dataset containing 163 manually - evaluated explanations. - **Automatic Evaluation**: Benchmark existing automatic evaluation metrics to guide future research on automatic evaluation of interpretable mental health analysis. Through these studies, the author aims to improve the performance and interpretability of large - language models in mental health analysis and provide guidance for future related research.