FEEL: A Framework for Evaluating Emotional Support Capability with Large Language Models

Huaiwen Zhang,Yu Chen,Ming Wang,Shi Feng

2024-07-21

Abstract:Emotional Support Conversation (ESC) is a typical dialogue that can effectively assist the user in mitigating emotional pressures. However, owing to the inherent subjectivity involved in analyzing emotions, current non-artificial methodologies face challenges in effectively appraising the emotional support capability. These metrics exhibit a low correlation with human judgments. Concurrently, manual evaluation methods extremely will cause high costs. To solve these problems, we propose a novel model FEEL (Framework for Evaluating Emotional Support Capability with Large Lan-guage Models), employing Large Language Models (LLMs) as evaluators to assess emotional support capabilities. The model meticulously considers various evaluative aspects of ESC to apply a more comprehensive and accurate evaluation method for ESC. Additionally, it employs a probability distribution approach for a more stable result and integrates an ensemble learning strategy, leveraging multiple LLMs with assigned weights to enhance evaluation accuracy. To appraise the performance of FEEL, we conduct extensive experiments on existing ESC model dialogues. Experimental results demonstrate our model exhibits a substantial enhancement in alignment with human evaluations compared to the baselines. Our source code is available at <a class="link-external link-https" href="https://github.com/Ansisy/FEEL" rel="external noopener nofollow">this https URL</a>.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the deficiencies in the current evaluation methods of Emotional Support Conversation (ESC). Specifically, the existing non - human evaluation methods have a low correlation with human judgment, while the human evaluation methods, although effective, are costly and time - consuming. In addition, the human evaluation methods rely on the subjective experience of evaluators and lack a systematic evaluation framework, which may lead to deviations in evaluation results. To overcome these problems, the paper proposes an emotional support ability evaluation framework FEEL (Framework for Evaluating Emotional Support Capability with Large Language Models) based on large language models (LLMs). This framework aims to provide a more comprehensive, accurate and stable ESC evaluation method by using multiple LLMs for integrated learning. FEEL not only takes into account various evaluation aspects of ESC, but also adopts self - CoT (self - Chain - of - Thought) and probability distribution methods to improve the stability of evaluation results, and enhances the evaluation accuracy by assigning weights to different LLMs. Overall, the goal of this paper is to develop an emotional support conversation evaluation tool that can better align with human evaluation results, so as to improve the quality and efficiency of ESC evaluation.

FEEL: A Framework for Evaluating Emotional Support Capability with Large Language Models

ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench

EmotionQueen: A Benchmark for Evaluating Empathy of Large Language Models

Towards Emotional Support Dialog Systems

Scoring with Large Language Models: A Study on Measuring Empathy of Responses in Dialogues

EmoBench: Evaluating the Emotional Intelligence of Large Language Models

AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation

Emotional intelligence of Large Language Models

Speak From Heart: An Emotion-Guided LLM-Based Multimodal Method for Emotional Dialogue Generation

Facilitating Multi-turn Emotional Support Conversation with Positive Emotion Elicitation: A Reinforcement Learning Approach

PCDialogEval: Persona and Context Aware Emotional Dialogue Evaluation

Enhancing the Emotional Generation Capability of Large Language Models Via Emotional Chain-of-Thought.

APTNESS: Incorporating Appraisal Theory and Emotion Support Strategies for Empathetic Response Generation

SweetieChat: A Strategy-Enhanced Role-playing Framework for Diverse Scenarios Handling Emotional Support Agent

AdaCLF: an Adaptive Curriculum Learning Framework for Emotional Support Conversation

Enhancing Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought