Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems

Aravind Sesagiri Raamkumar,Siyuan Brandon Loh
2024-07-26
Abstract:Empathetic Conversational Systems (ECS) are built to respond empathetically to the user's emotions and sentiments, regardless of the application domain. Current ECS studies evaluation approaches are restricted to offline evaluation experiments primarily for gold standard comparison & benchmarking, and user evaluation studies for collecting human ratings on specific constructs. These methods are inadequate in measuring the actual quality of empathy in conversations. In this paper, we propose a multidimensional empathy evaluation framework with three new methods for measuring empathy at (i) structural level using three empathy-related dimensions, (ii) behavioral level using empathy behavioral types, and (iii) overall level using an empathy lexicon, thereby fortifying the evaluation process. Experiments were conducted with the state-of-the-art ECS models and large language models (LLMs) to show the framework's usefulness.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the shortcomings of current evaluation methods for Empathetic Conversational Systems (ECS). Existing evaluation methods mainly rely on offline experiments and user evaluation studies, which cannot comprehensively measure the quality of empathy in conversations. Specifically, the following issues exist: 1. **Dataset Quality**: The quality of empathy in existing datasets is not clearly annotated, leading to models that cannot distinguish between strong empathetic responses and weak empathetic responses. 2. **Offline Evaluation Metrics**: Common offline evaluation metrics (such as BLEU, perplexity, etc.) mainly measure the similarity between generated responses and standard answers, rather than the level of empathy. 3. **User Evaluation**: User evaluation studies usually assess empathy through simple questionnaires, lacking in-depth measurement of the multidimensional aspects of empathy. To overcome these issues, the paper proposes a multidimensional empathy evaluation framework aimed at comprehensively measuring the quality of empathy from structural, behavioral, and holistic perspectives. This framework includes five evaluation components, three of which are novel empathy evaluation methods, and the other two are traditional offline evaluation and user studies. Experimental validation shows that this framework can more accurately assess the empathetic capabilities of ECS models and large language models (LLMs).