ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification

Sehee Lim,Yejin Kim,Chi-Hyun Choi,Jy-yong Sohn,Byung-Hoon Kim

2024-03-21

Abstract:Improving the accessibility of psychotherapy with the aid of Large Language Models (LLMs) is garnering a significant attention in recent years. Recognizing cognitive distortions from the interviewee's utterances can be an essential part of psychotherapy, especially for cognitive behavioral therapy. In this paper, we propose ERD, which improves LLM-based cognitive distortion classification performance with the aid of additional modules of (1) extracting the parts related to cognitive distortion, and (2) debating the reasoning steps by multiple agents. Our experimental results on a public dataset show that ERD improves the multi-class F1 score as well as binary specificity score. Regarding the latter score, it turns out that our method is effective in debiasing the baseline method which has high false positive rate, especially when the summary of multi-agent debate is provided to LLMs.

Computation and Language,Machine Learning

What problem does this paper attempt to address?

The paper attempts to address two major challenges that existing large language models (LLMs) face in recognizing cognitive distortions: 1. **Over-diagnosis**: Existing methods (such as Diagnosis-of-Thought, DoT) tend to over-diagnose cognitive distortions, incorrectly inferring unreasonable thinking patterns even when the user's statements are harmless. 2. **Poor multi-class classification performance**: In a multi-class setting, the classification performance of the DoT method is close to random guessing, which limits its use in practical applications. To address these issues, the authors propose a new framework—ERD (Extraction-Reasoning-Debate), which improves the cognitive distortion classification performance of LLMs by introducing modules for extracting relevant parts and multi-agent debate. Specifically, the ERD framework includes the following three steps: 1. **Extraction**: Extracting parts related to cognitive distortions from the user's discourse. 2. **Reasoning**: Generating the thought process that estimates cognitive distortions. 3. **Debate**: Multiple agent LLMs discuss the reasoning process and make a final decision. Experimental results show that ERD significantly outperforms existing baseline methods in both multi-class F1 score and binary specificity score. In particular, ERD excels in reducing the false positive rate.

ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

CBEval: A framework for evaluating and interpreting cognitive biases in LLMs

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting

Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization

Cognitive Bias in Decision-Making with LLMs

HealMe: Harnessing Cognitive Reframing in Large Language Models for Psychotherapy

Dissociation of Faithful and Unfaithful Reasoning in LLMs

Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy?

PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation

CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy

Chain of Empathy: Enhancing Empathetic Response of Large Language Models Based on Psychotherapy Models

Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided

A Multi-LLM Debiasing Framework

ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models

Beyond Labels: Aligning Large Language Models with Human-like Reasoning

Reason from Fallacy: Enhancing Large Language Models' Logical Reasoning through Logical Fallacy Understanding

Rethinking Large Language Models in Mental Health Applications

People will agree what I think: Investigating LLM's False Consensus Effect