A Sequence Labeling Based Approach for Character Segmentation of Historical Documents

Liangcai Gao,Xiaode Zhang,Zhi Tang,Yaoxiong Huang,Lianwen Jin
DOI: https://doi.org/10.1109/das.2018.16
2018-01-01
Abstract:As an important prerequisite step of historical document image analysis, character segmentation is fundamental but challenging. In this paper, we propose a novel approach for the handwritten character segmentation of historical documents by treating it as a sequence labeling problem. In more detail, the proposed model first segments document image into lines, then each column in the line image is given a label to indicate it is a segmentation position or not. The segmentation labeling is achieved by a neural model, which combines a CNN for feature extraction, a LSTM for sequence modeling and a CRF for sequence labeling. The performance of our methods has been evaluated on a 300-page dataset including 96,479 characters. The experimental results demonstrate that the proposed methods achieve superior or highly competitive performance compared with other methods.
What problem does this paper attempt to address?