Complicated Table Structure Recognition

Zewen Chi,Heyan Huang,Heng-Da Xu,Houjin Yu,Wanxuan Yin,Xian-Ling Mao
DOI: https://doi.org/10.48550/arXiv.1908.04729
2019-08-29
Abstract:The task of table structure recognition aims to recognize the internal structure of a table, which is a key step to make machines understand tables. Currently, there are lots of studies on this task for different file formats such as ASCII text and HTML. It also attracts lots of attention to recognize the table structures in PDF files. However, it is hard for the existing methods to accurately recognize the structure of complicated tables in PDF files. The complicated tables contain spanning cells which occupy at least two columns or rows. To address the issue, we propose a novel graph neural network for recognizing the table structure in PDF files, named GraphTSR. Specifically, it takes table cells as input, and then recognizes the table structures by predicting relations among cells. Moreover, to evaluate the task better, we construct a large-scale table structure recognition dataset from scientific papers, named SciTSR, which contains 15,000 tables from PDF files and their corresponding structure labels. Extensive experiments demonstrate that our proposed model is highly effective for complicated tables and outperforms state-of-the-art baselines over a benchmark dataset and our new constructed dataset.
Information Retrieval,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **the accurate recognition of complex table structures in PDF files**. Specifically, existing methods have difficulty in accurately recognizing complex table structures that contain cells spanning multiple rows or columns (i.e., cells that span multiple rows or columns). Although such complex tables account for a relatively small proportion in the overall tables, they usually contain more important semantic information, such as table titles, etc., which is crucial for understanding the table content. ### Problem Background 1. **Table Structure Recognition Task**: Table structure recognition aims to recognize the internal structure of a table, which is a key step in enabling machines to understand tables. 2. **Limitations of Existing Methods**: - **Simple Tables**: Existing methods perform well when dealing with simple grid - like tables. - **Complex Tables**: However, for complex tables containing cells spanning multiple rows or columns, the performance of existing methods is poor. The cells spanning multiple rows or columns in these complex tables make structure recognition more difficult. 3. **Application Scenarios**: Accurate table structure recognition is very important for many applications, such as question - answering systems, dialogue systems, and table - to - text conversion, etc. ### Main Contributions of the Paper 1. **Proposed a new graph neural network model GraphTSR**: This model redefines the table structure recognition task as an edge prediction problem on a graph, and recognizes the table structure by predicting the relationships between cells. 2. **Constructed a large - scale table structure recognition dataset SciTSR**: This dataset contains 15,000 tables extracted from scientific papers and their corresponding structure labels, which are used to train and evaluate the model. ### Solutions - **GraphTSR Model**: This model realizes table structure recognition through the following steps: 1. **Pre - processing**: Obtain cell contents and their corresponding bounding boxes from PDF files. 2. **Graph Construction**: Construct an undirected graph based on cells. 3. **Relationship Prediction**: Use the GraphTSR model to predict adjacent relationships. 4. **Post - processing**: Restore the table structure from the marked graph. - **Dataset SciTSR**: In order to better evaluate the performance of the model, the author constructed a large - scale dataset containing 15,000 tables and divided it into a training set and a test set. Among them, there are 716 complex tables in the test set (accounting for about 24%), which are specifically used to evaluate the model's ability to recognize complex tables. ### Experimental Results - **Experiments show**: The GraphTSR model is significantly superior to existing baseline methods in complex table structure recognition, and performs well on both the benchmark dataset and the newly constructed dataset. - **Generalization Ability**: The GraphTSR model not only performs well on the training dataset, but also can be well generalized to tables in different fields and styles. Through these improvements, the paper effectively solves the problem of complex table structure recognition, providing new ideas and tools for follow - up research.