CC-Riddle: A Question Answering Dataset of Chinese Character Riddles

Fan Xu,Yunxiang Zhang,Xiaojun Wan
DOI: https://doi.org/10.48550/arXiv.2206.13778
2022-06-28
Computation and Language
Abstract:Chinese character riddle is a challenging riddle game which takes a single character as the solution. The riddle describes the pronunciation, shape and meaning of the solution character with rhetoric techniques. In this paper, we propose a Chinese character riddle dataset covering the majority of common simplified Chinese characters by crawling riddles from the Web and generating brand new ones. In the generation stage, we provide the Chinese phonetic alphabet, decomposition and explanation of the solution character for the generation model and get multiple riddle descriptions for each tested character. Then the generated riddles are manually filtered and the final dataset, CC-Riddle is composed of both human-written riddles and filtered generated riddles. Furthermore, we build a character riddle QA system based on our dataset and find that the existing models struggle to solve such tricky questions. CC-Riddle is now publicly available.
What problem does this paper attempt to address?