People name recognition from ancient Chinese literature using distant supervision and deep learning

Hailin Zhang,Hai Zhu,Junsong Ruan,Ruoyao Ding
DOI: https://doi.org/10.1145/3469213.3470270
2021-05-28
Abstract:Ancient Chinese literature records many great historical figures, which is significant to the understanding of Chinese history and culture. An important step in studying historical figures for researchers is to find out the names of their interest from ancient Chinese literature. In this paper, we propose an advanced neural network method of automatic name recognition from ancient Chinese literature based on distant supervision and deep learning. To address the limitation of insufficient annotated corpus, we propose a distant-supervision-based method which matches the original text of ancient Chinese literature with the dictionary of historical figure names to generate a training corpus. In terms of the deep learning model, we adopted a state-of-the-art BiLSTM+CRF model, introducing attention mechanism to determine the correlation between other Chinese characters and the target Chinese characters, and further identify the boundaries of people's names. Experimental results indicate that our model achieves an F1-score of 66.9, which outperforms the most commonly used LSTM+CRF model and Baidu's deep learning Chinese lexical analysis tool LAC (the only publicly available tool for comparison). Therefore, our model can effectively extract people's names from ancient Chinese literature.
What problem does this paper attempt to address?