ETIP: a lengthy nested NER problem for Chinese insurance policy analysis

Lin Sun,Kai Zhang,Yuxuan Sun,Fangsheng Weng,Jianwei Zhang
DOI: https://doi.org/10.1007/s10044-020-00885-6
IF: 2.307
2020-01-01
Pattern Analysis and Applications
Abstract:Contract analysis can significantly ease the work for humans using AI techniques. This paper shows a lengthy nested NER problem of element tagging on insurance policy (ETIP). Compared to NER, ETIP deals with not only different types of entities which vary from a short phrase to a long sentence, but also phrase or clause entities that could be nested. We present a novel hybrid framework of deep learning and heuristic filtering method to recognize the lengthy nested elements. First, a convolutional neural network is constructed to obtain good initial candidates of sliding windows with high softmax probability. Then, the concatenation operator on adjacent candidate segments is introduced to create phrase, clause, or sentence candidates. We design an effective voting strategy to resolve the classification conflict of the concatenated candidates and present a theoretical proof of F 1-score optimization. In experiments, we have collected a large Chinese insurance contract dataset to test the performance of the proposed method. An extensive set of experiments is performed to investigate how sliding window candidates can work effectively in our filtering and voting strategy. The optimal parameters are determined by statistical analysis of the experimental data. The results show the promising performance of our method in the ETIP problem.
What problem does this paper attempt to address?