Abstract:Table structure recognition is an essential part for making machines understand tables. Its main task is to recognize the internal structure of a table. However, due to the complexity and diversity in their structure and style, it is very difficult to parse the tabular data into the structured format which machines can understand easily, especially for complex tables. In this paper, we introduce Split, Embed and Merge (SEM), an accurate table structure recognizer. Our model takes table images as input and can correctly recognize the structure of tables, whether they are simple or a complex tables. SEM is mainly composed of three parts, splitter, embedder and merger. In the first stage, we apply the splitter to predict the potential regions of the table row (column) separators, and obtain the fine grid structure of the table. In the second stage, by taking a full consideration of the textual information in the table, we fuse the output features for each table grid from both vision and language modalities. Moreover, we achieve a higher precision in our experiments through adding additional semantic features. Finally, we process the merging of these basic table grids in a self-regression manner. The correspondent merging results is learned through the attention mechanism. In our experiments, SEM achieves an average F1-Measure of 97.11% on the SciTSR dataset which outperforms other methods by a large margin. We also won the first place in the complex table and third place in all tables in ICDAR 2021 Competition on Scientific Literature Parsing, Task-B. Extensive experiments on other publicly available datasets demonstrate that our model achieves state-of-the-art.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges of complexity and diversity in Table Structure Recognition (TSR), especially for the parsing of complex tables. Specifically: 1. **Difficulty in parsing complex table structures**: Traditional table structure recognition methods mainly focus on simple tables. For complex tables containing spanning cells, these methods are difficult to accurately parse their internal structures. Spanning cells usually contain important semantic information, such as table headers, which is crucial for understanding the table content. 2. **Insufficient multi - modal information fusion**: Most existing table structure recognition methods only rely on visual features and ignore the rich text information in the tables. This leads to low recognition accuracy when dealing with some tables with visual ambiguity. 3. **Poor adaptability to scanned documents**: Many existing methods rely on PDF metadata or OCR models to extract low - level layout features, which makes them perform poorly when dealing with scanned documents, especially when facing diverse table layouts and text organizations. To solve these problems, the paper proposes the Split, Embed and Merge (SEM) model, aiming to improve the accuracy of table structure recognition in the following ways: - **Split**: Use a fully convolutional network (FCN) to predict the potential areas of table row/column separators, thereby obtaining the fine - grained grid structure of the table. - **Embed**: Design a Vision Module and a Text Module to extract the visual and text features of each table grid respectively, and fuse the two through a Blender Module to make full use of multi - modal information. - **Merge**: Adopt a gated recurrent unit (GRU) decoder with an attention mechanism to gradually predict which basic table grids should be merged to restore table cells and finally obtain the complete table structure. Through these innovations, SEM can not only handle simple tables but also effectively parse complex tables, and can directly operate on table images without relying on metadata or OCR. Experimental results show that SEM outperforms other methods on multiple public datasets, especially achieving significant advantages in the recognition of complex tables.

Split, embed and merge: An accurate table structure recognizer

SEMv2: Table separation line detection based on instance segmentation

SEMv3: A Fast and Robust Approach to Table Separation Line Detection

Rethinking Table Structure Recognition Using Sequence Labeling Methods

A Deep Semantic Segmentation Model for Image-based Table Structure Recognition

Flexible Hybrid Table Recognition and Semantic Interpretation System

Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables.

UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition

TSRDet: A Table Structure Recognition Method Based on Row-Column Detection

TSRFormer: Table Structure Recognition with Transformers

TableFormer: Table Structure Understanding with Transformers

Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling

Image-based table recognition: data, model, and evaluation

Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer

Robust Table Detection and Structure Recognition from Heterogeneous Document Images

Complicated Table Structure Recognition

End-to-End Compound Table Understanding with Multi-Modal Modeling

UTTSR: A Novel Non-Structured Text Table Recognition Model Powered by Deep Learning Technology

Table Structure Recognition with Conditional Attention