CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation
Yujie Shao,Xinrong Yao,Xingwei Qu,Chenghua Lin,Shi Wang,Stephen W. Huang,Ge Zhang,Jie Fu
2024-02-21
Abstract:Metaphor is a prominent linguistic device in human language and literature,
as they add color, imagery, and emphasis to enhance effective communication.
This paper introduces a large-scale high quality annotated Chinese Metaphor
Corpus, which comprises around 28K sentences drawn from a diverse range of
Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the
accuracy and consistency of our annotations, we introduce a comprehensive set
of guidelines. These guidelines address the facets of metaphor annotation,
including identifying tenors, vehicles, and grounds to handling the
complexities of similes, personifications, juxtapositions, and hyperboles.
Breaking tradition, our approach to metaphor generation emphasizes grounds and
their distinct features rather than the conventional combination of tenors and
vehicles. By integrating "ground" as a CoT (Chain of Thoughts) input, we are
able to generate metaphors that resonate more with real-world intuition. We
test generative models such as Belle, Baichuan, and Chinese-alpaca-33B using
our annotated corpus. These models are able to generate creative and fluent
metaphor sentences more frequently induced by selected samples from our
dataset, demonstrating the value of our corpus for Chinese metaphor research.
The code is available in
https://github.com/JasonShao55/Chinese_Metaphor_Explanation.
Computation and Language,Artificial Intelligence