FEEL: A Framework for Evaluating Emotional Support Capability with Large Language Models

Huaiwen Zhang,Yu Chen,Ming Wang,Shi Feng
2024-07-21
Abstract:Emotional Support Conversation (ESC) is a typical dialogue that can effectively assist the user in mitigating emotional pressures. However, owing to the inherent subjectivity involved in analyzing emotions, current non-artificial methodologies face challenges in effectively appraising the emotional support capability. These metrics exhibit a low correlation with human judgments. Concurrently, manual evaluation methods extremely will cause high costs. To solve these problems, we propose a novel model FEEL (Framework for Evaluating Emotional Support Capability with Large Lan-guage Models), employing Large Language Models (LLMs) as evaluators to assess emotional support capabilities. The model meticulously considers various evaluative aspects of ESC to apply a more comprehensive and accurate evaluation method for ESC. Additionally, it employs a probability distribution approach for a more stable result and integrates an ensemble learning strategy, leveraging multiple LLMs with assigned weights to enhance evaluation accuracy. To appraise the performance of FEEL, we conduct extensive experiments on existing ESC model dialogues. Experimental results demonstrate our model exhibits a substantial enhancement in alignment with human evaluations compared to the baselines. Our source code is available at <a class="link-external link-https" href="https://github.com/Ansisy/FEEL" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the deficiencies in the current evaluation methods of Emotional Support Conversation (ESC). Specifically, the existing non - human evaluation methods have a low correlation with human judgment, while the human evaluation methods, although effective, are costly and time - consuming. In addition, the human evaluation methods rely on the subjective experience of evaluators and lack a systematic evaluation framework, which may lead to deviations in evaluation results. To overcome these problems, the paper proposes an emotional support ability evaluation framework FEEL (Framework for Evaluating Emotional Support Capability with Large Language Models) based on large language models (LLMs). This framework aims to provide a more comprehensive, accurate and stable ESC evaluation method by using multiple LLMs for integrated learning. FEEL not only takes into account various evaluation aspects of ESC, but also adopts self - CoT (self - Chain - of - Thought) and probability distribution methods to improve the stability of evaluation results, and enhances the evaluation accuracy by assigning weights to different LLMs. Overall, the goal of this paper is to develop an emotional support conversation evaluation tool that can better align with human evaluation results, so as to improve the quality and efficiency of ESC evaluation.