A cross-temporal contrastive disentangled model for ancient Chinese understanding

Yuting Wei,Yangfu Zhu,Ting Bai,Bin Wu
DOI: https://doi.org/10.1016/j.neunet.2024.106559
2024-07-22
Abstract:Ancient Chinese is a crucial bridge for understanding Chinese history and culture. Most existing works utilize high-resource modern Chinese to understand low-resource ancient Chinese, but they fail to fully consider the semantic and syntactic gaps between them due to their changes over time, resulting in the misunderstanding of ancient Chinese. Hence, we propose a novel language pre-training framework for ancient Chinese understanding based on the Cross-temporal Contrastive Disentanglement Model (CCDM), which bridges the gap between modern and ancient Chinese with their parallel corpus. Specifically, we first explore a cross-temporal data augmentation method by disentangling and reconstructing the parallel ancient-modern corpus. It is noteworthy that the proposed decoupling strategy takes full account of the cross-temporal character between ancient and modern Chinese. Then, cross-temporal contrastive learning is exploited to train the model by fully leveraging the cross-temporal information. Finally, the trained language model is utilized for downstream tasks. We conduct extensive experiments on six ancient Chinese understanding tasks. Results demonstrate that our model outperforms the state-of-the-art baselines. Our framework also holds potential applicability to other languages that have undergone evolutionary changes, leading to shifts in syntax and semantics.1.
What problem does this paper attempt to address?