EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences

Jocelyn Shen,Yubin Kim,Mohit Hulse,Wazeer Zulfikar,Sharifa Alghowinem,Cynthia Breazeal,Hae Won Park
2024-05-25
Abstract:Modeling empathy is a complex endeavor that is rooted in interpersonal and experiential dimensions of human interaction, and remains an open problem within AI. Existing empathy datasets fall short in capturing the richness of empathy responses, often being confined to in-lab or acted scenarios, lacking longitudinal data, and missing self-reported labels. We introduce a new multimodal dataset for empathy during personal experience sharing: the EmpathicStories++ dataset (
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the shortcomings of current empathy datasets in capturing the richness of human empathetic responses. Specifically, existing empathy datasets often have the following limitations: 1. **Non-natural scenarios**: Most datasets are collected in laboratory, online, or performance settings, which differ significantly from empathy expressions in natural conversations. 2. **Lack of longitudinal data**: Existing datasets often contain only single interaction data, whereas empathy is a complex process based on multiple past experiences of an individual. 3. **Lack of self-annotated labels**: Empathy is a subjective process that requires self-reported labels to support user-centered or personalized modeling, which is often missing in existing datasets. To address these issues, the paper introduces the **EMPATHIC STORIES ++** dataset, a multimodal dataset that includes data from 41 participants who shared personal stories and read others' stories through a social robot over a month. The features of this dataset include: - **Natural scenarios**: Data is collected in participants' homes, making it closer to real-world settings. - **Longitudinal data**: The data collection spans one month, reflecting changes in participants over time. - **Self-annotation**: Participants self-assess their empathy levels towards the stories they read, providing more authentic empathy labels. With these features, the EMPATHIC STORIES ++ dataset aims to advance computational empathy research and provide valuable resources for developing more empathetic AI systems. Additionally, the paper proposes a new task of predicting an individual's empathy level towards others' stories based on personal experiences and benchmarks it using state-of-the-art models to promote future improvements in contextual and longitudinal empathy modeling.