Abstract:This paper takes an important step in bridging the performance gap between DETR and R-CNN for graphical object detection. Existing graphical object detection approaches have enjoyed recent enhancements in CNN-based object detection methods, achieving remarkable progress. Recently, Transformer-based detectors have considerably boosted the generic object detection performance, eliminating the need for hand-crafted features or post-processing steps such as Non-Maximum Suppression (NMS) using object queries. However, the effectiveness of such enhanced transformer-based detection algorithms has yet to be verified for the problem of graphical object detection. Essentially, inspired by the latest advancements in the DETR, we employ the existing detection transformer with few modifications for graphical object detection. We modify object queries in different ways, using points, anchor boxes and adding positive and negative noise to the anchors to boost performance. These modifications allow for better handling of objects with varying sizes and aspect ratios, more robustness to small variations in object positions and sizes, and improved image discrimination between objects and non-objects. We evaluate our approach on the four graphical datasets: PubTables, TableBank, NTable and PubLaynet. Upon integrating query modifications in the DETR, we outperform prior works and achieve new state-of-the-art results with the mAP of 96.9\%, 95.7\% and 99.3\% on TableBank, PubLaynet, PubTables, respectively. The results from extensive ablations show that transformer-based methods are more effective for document analysis analogous to other applications. We hope this study draws more attention to the research of using detection transformers in document image analysis.

Dual-branch dilated context convolutional for table detection transformer in the document images

CNN Based Page Object Detection in Document Images

Engagement Detection in Online Learning Based on Pre-trained Vision Transformer and Temporal Convolutional Network

Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer

HybridTabNet: Towards Better Table Detection in Scanned Document Images

Table Detection for Visually Rich Document Images

TableFormer: Table Structure Understanding with Transformers

A Table Detection Method for PDF Documents Based on Convolutional Neural Networks

Towards End-to-End Semi-Supervised Table Detection with Deformable Transformer

End-to-End Semi-Supervised approach with Modulated Object Queries for Table Detection in Documents

TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement

TableSegNet: a fully convolutional network for table detection and segmentation in document images

U-SSD: Improved SSD Based on U-Net Architecture for End-to-End Table Detection in Document Images

TableDet: An end-to-end deep learning approach for table detection and table image classification in data sheet images

TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

A Saliency-Based Convolutional Neural Network for Table and Chart Detection in Digitized Documents

Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images

Robust Table Detection and Structure Recognition from Heterogeneous Document Images

High-Performance Transformers for Table Structure Recognition Need Early Convolutions

Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

UTTSR: A Novel Non-Structured Text Table Recognition Model Powered by Deep Learning Technology