TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

Avinash Anand,Raj Jaiswal,Pijush Bhuyan,Mohit Gupta,Siddhesh Bangar,Md. Modassir Imam,Rajiv Ratn Shah,Shin'ichi Satoh
DOI: https://doi.org/10.1145/3606040.3617444
2024-04-19
Abstract:The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representation, enhancing the predictive capabilities of various systems such as search engines and Knowledge Graphs. Addressing the two main problems, namely table detection (TD) and table structure recognition (TSR), has traditionally been approached independently. In this research, we propose an end-to-end pipeline that integrates deep learning models, including DETR, CascadeTabNet, and PP OCR v2, to achieve comprehensive image-based table recognition. This integrated approach effectively handles diverse table styles, complex structures, and image distortions, resulting in improved accuracy and efficiency compared to existing methods like Table Transformers. Our system achieves simultaneous table detection (TD), table structure recognition (TSR), and table content recognition (TCR), preserving table structures and accurately extracting tabular data from document images. The integration of multiple models addresses the intricacies of table recognition, making our approach a promising solution for image-based table understanding, data extraction, and information retrieval applications. Our proposed approach achieves an IOU of 0.96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the significant challenges faced in automatically recognizing tabular data in document images, mainly due to the diversity of table styles and the complexity of structures. Tables provide valuable content representations in various systems, such as search engines and knowledge graphs. Therefore, accurate detection and structure recognition of tables are crucial for improving the predictive capabilities of these systems. However, traditional research usually treats table detection (TD) and table structure recognition (TSR) as independent problems, which limits the overall efficiency and accuracy. To overcome these problems, the author proposes an end - to - end pipeline that integrates deep - learning models (including DETR, CascadeTabNet, and PP - OCR v2) to achieve comprehensive image - based table recognition. This integrated method can effectively handle different table styles, complex structures, and image distortion problems commonly found in document images, thereby improving accuracy and efficiency. Specifically, this system can simultaneously perform table detection (TD), table structure recognition (TSR), and table content recognition (TCR), preserve the table structure, and accurately extract tabular data from document images. The main contributions of the paper include: 1. Proposing a novel integrated pipeline that combines three state - of - the - art models to achieve end - to - end table recognition from image - based data. 2. Through rigorous experiments and evaluations, it is proven that the integrated pipeline is superior to existing methods in terms of the accuracy and efficiency of table recognition, especially in handling complex table structures and accurately extracting tabular data. In conclusion, this paper aims to overcome the current challenges in table analysis and recognition and improve the extraction and understanding capabilities of tabular data by proposing an innovative solution.