Abstract:In this paper we study the task of document layout recognition for digital documents, requiring that the model should detect the exact physical object region without missing any text or containing any redundant text outside objects. It is the vital step to support high-quality information extraction, table understanding and knowledge base construction over the documents from various vertical domains (e.g. financial, legal, and government fields). Here, we consider digital documents, where characters and graphic elements are given with their exact texts, positions inside document pages, compared with image documents. Towards document layout recognition with pinpoint accuracy, we consider this problem as a document panoptic segmentation task, that each token in the document page must be assigned a class label and an instance id. Considering that two predicted objects may intersect under traditional visual panoptic segmentation method, like Mask R-CNN, however, document objects never intersect because most document pages follow manhattan layout. Therefore, we propose a novel framework, named document panoptic segmentation (DPS) model. It first splits the document page into column regions and groups tokens into line regions, then extracts the textual and visual features, and finally assigns class label and instance id to each line region. Additionally, we propose a novel metric based on the intersection over union (IoU) between the tokens contained in predicted and the ground-truth object, which is more suitable than metric based on the area IoU between predicted and the ground-truth bounding box. Finally, the empirical experiments based on PubLayNet, ArXiv and Financial datasets show that the proposed DPS model obtains 0.8833, 0.9205 and 0.8530 mAP scores on three datasets. The proposed model obtains great improvement on mAP score compared with Faster R-CNN and Mask R-CNN models.

CNN Based Page Object Detection in Document Images

A Deep Learning-Based Formula Detection Method for Pdf Documents

A Page Object Detection Method Based on Mask R-CNN

Icdar2017 Competition on Page Object Detection

A Lightweight Object Detection Method for Bank Operation and Maintenance Scenarios

Deep Learning Based Semantic Page Segmentation of Document Images in Chinese and English

A Table Detection Method for PDF Documents Based on Convolutional Neural Networks

A New Method Based on Deep Convolutional Neural Networks for Object Detection and Classification

Détection d'Objets dans les documents numérisés par réseaux de neurones profonds

Towards Document Panoptic Segmentation with Pinpoint Accuracy: Method and Evaluation

Efficient Document Image Classification Using Region-Based Graph Neural Network

Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis

A YOLO-Based Table Detection Method

Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval

Beyond document object detection: instance-level segmentation of complex layouts

HSCA-Net: A Hybrid Spatial-Channel Attention Network in Multi-Scale Feature Pyramid for Document Layout Analysis

End-to-End Semi-Supervised approach with Modulated Object Queries for Table Detection in Documents

A Hybrid Approach for Document Layout Analysis in Document images

Effective Document Image Rectification via a Deep Learning Framework

Cross-Domain Document Object Detection: Benchmark Suite and Method