Large Language Models Produce Responses Perceived to be Empathic

Yoon Kyung Lee,Jina Suh,Hongli Zhan,Junyi Jessy Li,Desmond C. Ong
2024-03-27
Abstract:Large Language Models (LLMs) have demonstrated surprising performance on many tasks, including writing supportive messages that display empathy. Here, we had these models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations. Across two studies (N=192, 202), we showed human raters a variety of responses written by several models (GPT4 Turbo, Llama2, and Mistral), and had people rate these responses on how empathic they seemed to be. We found that LLM-generated responses were consistently rated as more empathic than human-written responses. Linguistic analyses also show that these models write in distinct, predictable ``styles", in terms of their use of punctuation, emojis, and certain words. These results highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper attempts to explore the capabilities of large language models (LLMs) in generating empathetic responses and evaluate whether these responses can be perceived as empathetic by humans. Specifically, researchers let different LLMs (such as GPT4 Turbo, Llama2, and Mistral) generate empathetic messages for common life scenarios (such as workplace situations, parenting, interpersonal relationships, etc.), and then have humans evaluate the degree of empathy in these responses. The main objectives of the study include: 1. **Comparing empathetic responses generated by LLMs and humans**: - Researchers hope to understand whether empathetic responses generated by LLMs are more empathetic than those generated by humans. - They verify this through two experiments. In one of the experiments, human responses are written by research assistants trained in psychology to ensure a high - quality baseline. 2. **Exploring style differences among different LLMs**: - In addition to evaluating the degree of empathy, researchers also use language analysis (such as using the LIWC tool) to explore whether responses generated by different LLMs have unique style characteristics, for example, differences in the use of pronouns, punctuation marks, and emotional vocabulary. 3. **Verifying the generalization ability of LLMs in different situations**: - The second experiment expands the scope of the study to cover more life scenarios (such as anger management, anxiety, COVID - 19 - related support, etc.) to test whether LLMs perform consistently in different fields. ### Main findings 1. **Responses generated by LLMs are generally considered more empathetic**: - In the first experiment, responses generated by LLMs had significantly higher empathy scores than those generated by humans. - Specifically, responses generated by GPT4, Llama2, and Mistral were all rated as having high empathy, while human - generated responses were relatively low. 2. **Differences in empathy levels among different LLMs exist**: - There is no significant difference in empathy scores between responses generated by GPT4 and Llama2, but both are significantly higher than responses generated by Mistral. - This indicates that different LLMs have different capabilities in generating empathetic responses. 3. **Differences in language styles**: - Through language analysis, researchers found that different LLMs have obvious style differences in the use of pronouns, punctuation marks, and emotional vocabulary. - For example, some models may be more inclined to use first - person pronouns, while others may use second - person pronouns more often. 4. **Generalization ability of LLMs in different situations**: - The results of the second experiment further confirm the ability of LLMs to generate empathetic responses in different life scenarios, and these results are consistent with those of the first experiment. ### Conclusion This paper proves through experiments that LLMs perform excellently in generating empathetic responses, even better than human - generated responses. This indicates that LLMs have great potential in providing social and emotional support, especially in situations requiring large - scale support. However, researchers also emphasize that although LLMs can generate empathetic responses, this does not mean that they possess true empathy. These technologies should be regarded as a supplement to human empathy rather than a substitute.