Graphic Composite Segmentation for PDF Documents with Complex Layouts

Canhui Xu,Zhi Tang,Xin Tao,Cao Shi
DOI: https://doi.org/10.1117/12.2003705
2013-01-01
Abstract:Converting the PDF books to re-flowable format has recently attracted various interests in the area of e-book reading. Robust graphic segmentation is highly desired for increasing the practicability of PDF converters. To cope with various layouts, a multi-layer concept is introduced to segment graphic composites including photographic images, drawings with text insets or surrounded with text elements. Both image based analysis and inherent digital born document advantages are exploited in this multi-layer based layout analysis method. By combining low-level page elements clustering applied on PDF documents and connected component analysis on synthetically generated PNG image document, graphic composites can be segmented for PDF documents with complex layouts. The experimental results on graphic composite segmentation of PDF document pages have shown satisfactory performance.
What problem does this paper attempt to address?