Deep Learning Based Semantic Page Segmentation of Document Images in Chinese and English

Yajun Zou,Jinwen Ma
DOI: https://doi.org/10.1007/978-3-030-84522-3_40
2021-01-01
Abstract:Semantic page segmentation of document images is a basic task for document layout analysis which is key to document reconstruction and digitalization. Previous work usually considers only a few semantic types in a page (e.g., text and non-text) and performs mainly on English document images and it is still challenging to make the finer semantic segmentation on Chinese and English document pages. In this paper, we propose a deep learning based method for semantic page segmentation in Chinese and English documents such that a document page can be decomposed into regions of four semantic types such as text, table, figure and formula. Specifically, a deep semantic segmentation neural network is designed to achieve the pixel-wise segmentation where each pixel of an input document page image is labeled as background or one of the four categories above. Then we can obtain the accurate locations of regions in different types by implementing the Connected Component Analysis algorithm on the prediction mask. Moreover, a Non-Intersecting Region Segmentation Algorithm is further designed to generate a series of regions which do not overlap each other, and thus improve the segmentation results and avoid possible location conflicts in the page reconstruction. For the training of the neural network, we manually annotate a dataset whose documents are from Chinese and English language sources and contain various layouts. The experimental results on our collected dataset demonstrate the superiority of our proposed method over the other existing methods. In addition, we utilize transfer learning on public POD dataset and obtain the promising results in comparison with the state-of-the-art methods.
What problem does this paper attempt to address?