Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs

Muhammad Arslan Manzoor,Yuxia Wang,Minghan Wang,Preslav Nakov

2024-10-31

Abstract:Empathy plays a pivotal role in fostering prosocial behavior, often triggered by the sharing of personal experiences through narratives. However, modeling empathy using NLP approaches remains challenging due to its deep interconnection with human interaction dynamics. Previous approaches, which involve fine-tuning language models (LMs) on human-annotated empathic datasets, have had limited success. In our pursuit of improving empathy understanding in LMs, we propose several strategies, including contrastive learning with masked LMs and supervised fine-tuning with large language models. While these methods show improvements over previous methods, the overall results remain unsatisfactory. To better understand this trend, we performed an analysis which reveals a low agreement among annotators. This lack of consensus hinders training and highlights the subjective nature of the task. We also explore the cultural impact on annotations. To study this, we meticulously collected story pairs in Urdu language and find that subjectivity in interpreting empathy among annotators appears to be independent of cultural background. Our systematic exploration of LMs' understanding of empathy reveals substantial opportunities for further investigation in both task formulation and modeling.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to improve the ability of large - language models (LLMs) in understanding emotions and empathy, especially when evaluating the empathic similarity between two narratives. Although existing methods have attempted to improve the empathic understanding ability of language models through fine - tuning, the effects of these methods are still limited. The main reason is that the understanding of empathy is closely related to the dynamics of human interaction, which makes modeling complex. In addition, the existing dataset annotations have problems of high subjectivity and low consistency among annotators, which further hinders the effectiveness of model training. Specifically, the paper focuses on the following aspects: 1. **Improving empathic similarity estimation**: Explore strategies such as contrastive learning and enhancing the reasoning ability of large - language models to improve the accuracy of the model's estimation of empathic similarity. 2. **Analyzing the upper limit of the correlation between model predictions and manual annotations**: By collecting multiple human opinions and measuring the consistency among annotators, reveal the high subjectivity of empathy annotations. 3. **Exploring the influence of culture and language on empathy annotations**: Collect a new Urdu dataset and study the influence of cultural and linguistic backgrounds on empathy annotations. Through these studies, the paper aims to gain an in - depth understanding of the current limitations of models in empathic understanding and explore directions for future improvement.

Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs

Can Large Language Models Exhibit Cognitive and Affective Empathy as Humans?

Are Large Language Models More Empathetic than Humans?

Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench

Annotating and modeling empathy in spoken conversations

Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models

Trying to be human: Linguistic traces of stochastic empathy in language models

Multi-dimensional Evaluation of Empathetic Dialog Responses

An Interactional Account of Empathy in Human-Machine Communication

Large Language Models (LLMs) and Empathy - A Systematic Review

Identification and Description of Emotions by Current Large Language Models

HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs

Empathy Detection from Text, Audiovisual, Audio or Physiological Signals: A Systematic Review of Task Formulations and Machine Learning Methods

Large Language Models Produce Responses Perceived to be Empathic

Artificial Empathy Classification: A Survey of Deep Learning Techniques, Datasets, and Evaluation Scales

Learning Word Ratings for Empathy and Distress from Document-Level User Responses

Harnessing the Power of Large Language Models for Empathetic Response Generation: Empirical Investigations and Improvements

EmotionQueen: A Benchmark for Evaluating Empathy of Large Language Models

Modeling Empathic Similarity in Personal Narratives

Emotion-Aware Response Generation Using Affect-Enriched Embeddings with LLMs

Empathetic Conversational Systems: A Review of Current Advances, Gaps, and Opportunities