Word Segmentation for Chinese Judicial Documents

Linxia Yao,Jidong Ge,Chuanyi Li,Yuan Yao,Zhenhao Li,Jin Zeng,Bin Luo,Victor Chang
DOI: https://doi.org/10.1007/978-981-15-0118-0_36
2019-01-01
Abstract:Word segmentation is an integral step in many knowledge discovery applications. However, existing word segmentation methods have problems when applying to Chinese judicial documents: (1) existing methods rely on large-scale labeled data which is typically unavailable in judicial documents, and (2) judicial document has its own language features and writing formats. In this paper, a word segmentation method is proposed for Chinese judicial documents. The proposed method consists of two steps: (1) automatically generating some labeled data as legal dictionaries, and (2) applying a hybrid multi-layer neural networks to do word segmentation incorporating legal dictionaries. Experiments are conducted on a dataset of Chinese judicial documents showing that the proposed model can achieve better results than the existing methods.
What problem does this paper attempt to address?