Abstract:Table detection, a pivotal task in document analysis, aims to precisely recognize and locate tables within document images. Although deep learning has shown remarkable progress in this realm, it typically requires an extensive dataset of labeled data for proficient training. Current CNN-based semi-supervised table detection approaches use the anchor generation process and Non-Maximum Suppression (NMS) in their detection process, limiting training efficiency. Meanwhile, transformer-based semi-supervised techniques adopted a one-to-one match strategy that provides noisy pseudo-labels, limiting overall efficiency. This study presents an innovative transformer-based semi-supervised table detector. It improves the quality of pseudo-labels through a novel matching strategy combining one-to-one and one-to-many assignment techniques. This approach significantly enhances training efficiency during the early stages, ensuring superior pseudo-labels for further training. Our semi-supervised approach is comprehensively evaluated on benchmark datasets, including PubLayNet, ICADR-19, and TableBank. It achieves new state-of-the-art results, with a mAP of 95.7% and 97.9% on TableBank (word) and PubLaynet with 30% label data, marking a 7.4 and 7.6 point improvement over previous semi-supervised table detection approach, respectively. The results clearly show the superiority of our semi-supervised approach, surpassing all existing state-of-the-art methods by substantial margins. This research represents a significant advancement in semi-supervised table detection methods, offering a more efficient and accurate solution for practical document analysis tasks.

What problem does this paper attempt to address?

The paper aims to address the problem of table detection in document images, particularly improving detection performance in scenarios with limited annotated data. Specifically, the research focuses on the following key points: 1. **Problem Background**: Table detection is a core task in document analysis, aiming to accurately identify and locate tables in document images. Although deep learning has made significant progress in this field, it typically requires a large amount of annotated data to train models. Current semi-supervised table detection methods based on Convolutional Neural Networks (CNNs) use anchor generation processes and Non-Maximum Suppression (NMS), which limit training efficiency. On the other hand, transformer-based semi-supervised techniques employ a one-to-one matching strategy, resulting in low-quality pseudo-labels and affecting overall efficiency. 2. **Research Contributions**: - Proposes an innovative transformer-based semi-supervised table detection method that combines one-to-many and one-to-one matching strategies to improve the quality of pseudo-labels. - Designs a query filtering module for the one-to-many matching strategy to provide high-quality pseudo-labels. - Conducts comprehensive evaluations on multiple benchmark datasets, including PubLayNet, ICDAR-19, and TableBank, demonstrating that the method surpasses existing CNN-based and transformer-based semi-supervised methods, achieving significant improvements in accuracy. 3. **Method Overview**: - The research proposes an end-to-end semi-supervised framework based on a teacher-student model. This framework includes two training stages, utilizing both one-to-one and one-to-many assignment strategies. - The teacher module uses a one-to-many matching strategy to generate high-quality pseudo-labels, while the student module uses these pseudo-labels for training and removes duplicate predictions through a one-to-one matching strategy. - The teacher module is updated using Exponential Moving Average (EMA) and filters pseudo-labels to ensure only high-quality pseudo-labels are selected. 4. **Experimental Setup**: - Evaluations are conducted using the TableBank, PubLayNet, and ICDAR-19 datasets. - Experimental evaluation metrics include Mean Average Precision (mAP), Average Precision at IoU threshold of 0.5 (AP50), Average Precision at IoU threshold of 0.75 (AP75), and Average Recall (AR). In summary, this paper addresses the problem of improving table detection performance with limited annotated data by introducing a semi-supervised framework that combines one-to-one and one-to-many matching strategies. This approach not only enhances training efficiency but also improves the quality of pseudo-labels, resulting in higher detection accuracy.

End-to-End Semi-Supervised approach with Modulated Object Queries for Table Detection in Documents

Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer

Towards End-to-End Semi-Supervised Table Detection with Deformable Transformer

CNN Based Page Object Detection in Document Images

Semi-supervised Single-Shot Object Detection for Table Detection in Scanned Documents

TableDet: An end-to-end deep learning approach for table detection and table image classification in data sheet images

HybridTabNet: Towards Better Table Detection in Scanned Document Images

Dual-branch dilated context convolutional for table detection transformer in the document images

TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images

Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

U-SSD: Improved SSD Based on U-Net Architecture for End-to-End Table Detection in Document Images

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

Semi-Supervised Object Detection: A Survey on Progress from CNN to Transformer

Flexible Hybrid Table Recognition and Semantic Interpretation System

Current Status and Performance Analysis of Table Recognition in Document Images with Deep Neural Networks

Detecting Table Region in PDF Documents Using Distant Supervision

A Table Detection Method for PDF Documents Based on Convolutional Neural Networks

TableFormer: Table Structure Understanding with Transformers

Table Structure Extraction with Bi-directional Gated Recurrent Unit Networks

A Hybrid Approach for Document Layout Analysis in Document images