Document Structure Identification Method Based on Conditional Random Field

Lei Yang,Tian Yingai,Li Ning,Gao Xiaolong
DOI: https://doi.org/10.2991/icmcm-16.2016.71
2016-01-01
Abstract:On the basis of deep analysis on the structural features and heading features of documents, it has researched the classification method based on templates and the classification method based on statistics as well as the sequence labeling method based on CRF (Conditional Random Field), then proposed to treat document structure identification as sequential data labeling, built CRF training model with feature templates and finally realized document structure identification upon training model with existing way of supervision learning. Experimental results show that identifying paragraph roles from document sequence structure helps to ensure a higher accuracy and it also owns certain fault-tolerant ability. Besides, it is observed that using CRF for many times could further improve the accuracy of identification.
What problem does this paper attempt to address?