PoetryBERT: Pre-training with Sememe Knowledge for Classical Chinese Poetry.

Jiaqi Zhao,Ting Bai,Yuting Wei,Bin Wu
DOI: https://doi.org/10.1007/978-981-19-8991-9_26
2022-01-01
Abstract:Classical Chinese poetry has a history of thousands of years and is a precious cultural heritage of humankind. Compared with the modern Chinese corpus, it is irrecoverable and specially organized, making it difficult to be learned by existing pre-trained language models. Besides, with the thousands of years of development, many words in classical Chinese poetry have changed their meanings or been out of use today, which further limiting the capability of existing pre-trained models to learn the semantics of classical Chinese poetry. To address these challenges, we construct a large-scale sememe knowledge graph of classical Chinese Poetry (SKG-Poetry), which connects the vocabularies in classical Chinese poetry and modern Chinese. By extracting the sememe knowledge from classical Chinese poetry, our model PoetryBERT not only enlarges the irrecoverable pre-training corpus but also enriches the semantics of the vocabularies in classical Chinese poetry, which enables PoetryBERT to be successfully used in downstream tasks. Specifically, we evaluate our model in two tasks in the field of Chinese classical poetry, which are poetry theme classification and poetry-modern Chinese translation. Extensive experiments are conducted on the two tasks to show the effectiveness of sememe knowledge based pre-training model.
What problem does this paper attempt to address?