Abstract:In this paper we study the task of document layout recognition for digital documents, requiring that the model should detect the exact physical object region without missing any text or containing any redundant text outside objects. It is the vital step to support high-quality information extraction, table understanding and knowledge base construction over the documents from various vertical domains (e.g. financial, legal, and government fields). Here, we consider digital documents, where characters and graphic elements are given with their exact texts, positions inside document pages, compared with image documents. Towards document layout recognition with pinpoint accuracy, we consider this problem as a document panoptic segmentation task, that each token in the document page must be assigned a class label and an instance id. Considering that two predicted objects may intersect under traditional visual panoptic segmentation method, like Mask R-CNN, however, document objects never intersect because most document pages follow manhattan layout. Therefore, we propose a novel framework, named document panoptic segmentation (DPS) model. It first splits the document page into column regions and groups tokens into line regions, then extracts the textual and visual features, and finally assigns class label and instance id to each line region. Additionally, we propose a novel metric based on the intersection over union (IoU) between the tokens contained in predicted and the ground-truth object, which is more suitable than metric based on the area IoU between predicted and the ground-truth bounding box. Finally, the empirical experiments based on PubLayNet, ArXiv and Financial datasets show that the proposed DPS model obtains 0.8833, 0.9205 and 0.8530 mAP scores on three datasets. The proposed model obtains great improvement on mAP score compared with Faster R-CNN and Mask R-CNN models.

Layout and Perspective Distortion Independent Recognition of Captured Chinese Document Image

Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization

Restoring Camera-Captured Distorted Document Images

Robust Math Formula Recognition in Degraded Chinese Document Images

Restoring Chinese documents images based on text boundary lines

Rule-based perspective rectification for Chinese text in natural scene images

A Scheme for Automatic Text Rectification in Real Scene Images.

Document Image Orientation Based on Both Text and Image

An Improved Perspective Transform for Image Distortion Correction

Orientation-Independent Chinese Text Recognition in Scene Images

An approach for handwritten Chinese text recognition unifying character segmentation and recognition

From Textline to Paragraph: A Promising Practice for Chinese Text Recognition

Fourier Document Restoration for Robust Document Dewarping and Recognition

An Improved Scene Text Extraction Method Using Conditional Random Field and Optical Character Recognition

Rethinking Irregular Scene Text Recognition

Design And Development Of An Ancient Chinese Document Recognition System

Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields

A method to restore Chinese warped document images based on binding characters and building curved lines

A Fusion Framework of Whitespace Smear Cutting and Swin Transformer for Document Layout Analysis

CTP-Net: Character Texture Perception Network for Document Image Forgery Localization

Towards Document Panoptic Segmentation with Pinpoint Accuracy: Method and Evaluation