Abstract:Wider access to therapeutic care is one of the biggest challenges in mental health treatment. Due to institutional barriers, some people seeking mental health support have turned to large language models (LLMs) for personalized therapy, even though these models are largely unsanctioned and untested. We investigate the potential and limitations of using LLMs as providers of evidence-based therapy by using mixed methods clinical metrics. Using HELPERT, a prompt run on a large language model using the same process and training as a comparative group of peer counselors, we replicated publicly accessible mental health conversations rooted in Cognitive Behavioral Therapy (CBT) to compare session dynamics and counselor's CBT-based behaviors between original peer support sessions and their reconstructed HELPERT sessions. Two licensed, CBT-trained clinical psychologists evaluated the sessions using the Cognitive Therapy Rating Scale and provided qualitative feedback. Our findings show that the peer sessions are characterized by empathy, small talk, therapeutic alliance, and shared experiences but often exhibit therapist drift. Conversely, HELPERT reconstructed sessions exhibit minimal therapist drift and higher adherence to CBT methods but display a lack of collaboration, empathy, and cultural understanding. Through CTRS ratings and psychologists' feedback, we highlight the importance of human-AI collaboration for scalable mental health. Our work outlines the ethical implication of imparting human-like subjective qualities to LLMs in therapeutic settings, particularly the risk of deceptive empathy, which may lead to unrealistic patient expectations and potential harm.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to evaluate the effectiveness and limitations of large - language models (LLMs) as providers of psychotherapy services based on cognitive - behavioral therapy (CBT). Specifically, through comparing the performance of human peer counselors and an LLM - based system (called HELPERT) in providing a single - session CBT consultation, the research explored the following aspects: 1. **Comparison of CBT consulting capabilities provided by human peer counselors and LLMs**: The research aims to compare the differences in capabilities between human peer counselors and HELPERT in providing CBT - based consulting services through the professional evaluation of clinical psychologists, especially in terms of performance in the therapeutic alliance, cooperation, method - following degree, and impact on participants. 2. **Performance of LLMs in continuous interaction**: Current research on LLMs mostly focuses on users' preferences for single - interaction, ignoring the behavior of these models in continuous interaction. This study fills this gap by using CBT indicators established in the literature and having clinical psychologists evaluate the models. 3. **Ethical and technical challenges**: The research also focuses on the ethical and technical challenges of using LLMs in psychotherapy scenarios, especially the risks that LLMs may bring when simulating human subjective traits (such as empathy), which may lead to patients having unrealistic expectations or potential harm. 4. **Possibility of human - AI collaboration**: Finally, the research explored how to use the respective advantages of humans and AI through their cooperation to provide safer and more effective mental health support, rather than simply replacing one with the other. Through the exploration of these issues, the research hopes to provide new perspectives and solutions for future mental health support models and promote the development of a more equitable and effective mental health care approach.

Therapy as an NLP Task: Psychologists' Comparison of LLMs and Human Peers in CBT

Can AI Relate: Testing Large Language Model Response for Mental Health Support

Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions

A Computational Framework for Behavioral Assessment of LLM Therapists

CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy

Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy?

Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

HealMe: Harnessing Cognitive Reframing in Large Language Models for Psychotherapy

Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling

Can Large Language Models Replace Therapists? Evaluating Performance at Simple Cognitive Behavioral Therapy Tasks

Toward Large Language Models as a Therapeutic Tool: Comparing Prompting Techniques to Improve GPT-Delivered Problem-Solving Therapy

LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices

Challenges of Large Language Models for Mental Health Counseling

Towards a Client-Centered Assessment of LLM Therapists by Client Simulation

An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice

The Influence of Task and Group Disparities over Users' Attitudes Toward Using Large Language Models for Psychotherapy

A Framework for Evaluating Appropriateness, Trustworthiness, and Safety in Mental Wellness AI Chatbots

Do Large Language Models Align with Core Mental Health Counseling Competencies?

Advancing Conversational Psychotherapy: Integrating Privacy, Dual-Memory, and Domain Expertise with Large Language Models

Can an LLM-Powered Socially Assistive Robot Effectively and Safely Deliver Cognitive Behavioral Therapy? A Study With University Students

Multi-Level Feedback Generation with Large Language Models for Empowering Novice Peer Counselors