Abstract:In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capability has become a crucial prerequisite. Consequently, managing and understanding large-scale video datasets has gained increasing importance. However, empathetic data are typically trained without any quality selection, leading to inefficient data usage and wasted computational resources. Additionally, using raw data can result in low performance in empathetic dialogues. In this work, we present Efficient-Empathy, a sensibility and rationality score-based data selection algorithm that automatically selects sensibility and rationality data while discarding low-quality data. With only the sensibility data (59% of the full dataset), our trained sensibility model efficiently achieves state-of-the-art (SoTA) performance. Furthermore, with multiple data selection hyperparameters, the sensibility model demonstrates SoTA performance, showcasing the robustness of our method. By integrating sensibility and rationality data with a MoE structure, we achieve even higher performance, demonstrating the effectiveness of our Efficient-Empathy algorithm.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of efficient and effective empathy data selection in large-scale language models (LLMs). Specifically, the paper targets the following three main challenges: 1. **Inefficiency**: Existing model-centric approaches overlook the quality and distribution of data, typically using the entire dataset for fine-tuning. This leads to the use of some low-quality data, increasing training time and computational costs. 2. **Low Robustness**: Prompt-based methods enhance the model's empathetic response capability through carefully designed prompts, but these techniques are often specific to certain types of LLMs, limiting their generality and robustness. 3. **Poor Performance**: Previous research has not examined the distribution of empathy data, neglecting the roles of affective and cognitive empathy. While studies have shown that affective and cognitive empathy positively impact empathy, it remains unclear how to leverage these attributes to select data to further improve empathetic performance. To address these issues, the paper proposes a new empathy data selection method—Efficient-Empathy. This method achieves efficient, robust, and effective empathy data management by automatically evaluating the affective and cognitive scores of conversations and selecting high-quality data based on these scores. The specific steps include: - Using LLMs to automatically evaluate the affective and cognitive scores of empathy data. - Selecting and discarding data based on set thresholds. - Fine-tuning LLMs with the selected affective data to achieve state-of-the-art performance. - Testing the robustness of the method through multiple data selection thresholds. - Training affective expert and cognitive expert models with the selected affective and cognitive data. - Further training a mixture of experts (MoE) model to achieve higher performance. Through these steps, the Efficient-Empathy method not only improves data usage efficiency but also enhances the robustness and effectiveness of the model's empathetic responses.

Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data

Synth-Empathy: Towards High-Quality Synthetic Empathy Data

Rational Sensibility: LLM Enhanced Empathetic Response Generation Guided by Self-presentation Theory

Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible Knowledge Selection

EmPO: Emotion Grounding for Empathetic Response Generation through Preference Optimization

Multi-dimensional Evaluation of Empathetic Dialog Responses

APTNESS: Incorporating Appraisal Theory and Emotion Support Strategies for Empathetic Response Generation

Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models

Improving Empathetic Response Generation by Emotion Recognition and Information Filtration

The OMG-Empathy Dataset: Evaluating the Impact of Affective Behavior in Storytelling

EmpHi: Generating Empathetic Responses with Human-like Intents

MoEL: Mixture of Empathetic Listeners

FEEL: A Framework for Evaluating Emotional Support Capability with Large Language Models

Empathic Conversations: A Multi-level Dataset of Contextualized Conversations

Annotating and modeling empathy in spoken conversations

An Iterative Associative Memory Model for Empathetic Response Generation

MEDIC: A Multimodal Empathy Dataset in Counseling

Constructing Emotion Consensus and Utilizing Unpaired Data for Empathetic Dialogue Generation

Learning Word Ratings for Empathy and Distress from Document-Level User Responses

EmotionQueen: A Benchmark for Evaluating Empathy of Large Language Models

Automated Empathy Detection for Oncology Encounters