CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation

Yujie Shao,Xinrong Yao,Xingwei Qu,Chenghua Lin,Shi Wang,Stephen W. Huang,Ge Zhang,Jie Fu
2024-02-21
Abstract:Metaphor is a prominent linguistic device in human language and literature, as they add color, imagery, and emphasis to enhance effective communication. This paper introduces a large-scale high quality annotated Chinese Metaphor Corpus, which comprises around 28K sentences drawn from a diverse range of Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the accuracy and consistency of our annotations, we introduce a comprehensive set of guidelines. These guidelines address the facets of metaphor annotation, including identifying tenors, vehicles, and grounds to handling the complexities of similes, personifications, juxtapositions, and hyperboles. Breaking tradition, our approach to metaphor generation emphasizes grounds and their distinct features rather than the conventional combination of tenors and vehicles. By integrating "ground" as a CoT (Chain of Thoughts) input, we are able to generate metaphors that resonate more with real-world intuition. We test generative models such as Belle, Baichuan, and Chinese-alpaca-33B using our annotated corpus. These models are able to generate creative and fluent metaphor sentences more frequently induced by selected samples from our dataset, demonstrating the value of our corpus for Chinese metaphor research. The code is available in https://github.com/JasonShao55/Chinese_Metaphor_Explanation.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of Chinese metaphor generation and proposes a new high-quality annotated corpus, CMDAG (Chinese Metaphor Dataset with Annotated Grounds). Specifically, the goals of the paper include: 1. **Constructing a high-quality Chinese metaphor corpus**: Introducing a large-scale high-quality Chinese metaphor corpus containing approximately 28,000 sentences. These sentences come from various Chinese literary sources, such as poetry, prose, lyrics, etc. 2. **Proposing a new metaphor annotation framework**: The paper proposes a new metaphor annotation framework, focusing on annotating the three key elements of a metaphor: TENOR, VEHICLE, and GROUND. By introducing GROUND, the generated metaphors become more in line with actual intuition. 3. **Evaluating metaphor generation models**: The paper uses the CMDAG corpus to evaluate several existing generation models (such as Belle, Baichuan, and Chinese-alpaca-33B) and finds that by introducing GROUND information, the quality of metaphor generation by the models can be significantly improved. In summary, the paper mainly addresses the issue of how to improve the quality and naturalness of Chinese metaphor generation through a high-quality annotated corpus and a new annotation framework.