Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

Ruiqi He,Yushu He,Longju Bai,Jiarui Liu,Zhenjie Sun,Zenghao Tang,He Wang,Hanchen Xia,Naihao Deng
2024-06-19
Abstract:Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evaluate human explanations against two state-of-the-art LLMs, GPT-4o and ERNIE Bot, through A/B testing by native Chinese speakers. Our evaluation shows that Chumor is challenging even for SOTA LLMs, and the human explanations for Chumor jokes are significantly better than explanations generated by the LLMs.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper aims to address the lack of datasets for humor understanding in non-English languages, particularly Chinese. Specifically, the research team constructed a dataset named Chumor, sourced from a Chinese Reddit-like platform called "弱智吧," which specializes in sharing challenging and culturally specific jokes. To evaluate the differences between human understanding of these jokes and the understanding by state-of-the-art large language models (such as GPT-4 and ERNIE Bot), the researchers manually annotated each joke and conducted A/B testing with native Chinese speakers to judge the quality of human explanations versus model explanations. The results indicate that even the most advanced language models still have significant shortcomings in understanding and interpreting Chinese humor, with human explanations being noticeably superior to model explanations. Additionally, the study analyzed specific types of errors made by the models when dealing with humor in the Chinese cultural context, including cultural insensitivity, improper handling of homophonic humor, and failure to recognize character-based humor. Overall, this research reveals that even the most advanced large language models face substantial challenges when dealing with non-English humor, especially humor with cultural specificity.