A Novel Encoder-Decoder Architecture for Table Border Segmentation of Scanned Documents.

Kaihong Yan,Haoran Tang,Jian Zhang,Peng Peng,Hongwei Wang
DOI: https://doi.org/10.1109/CSCWD57460.2023.10152650
2023-01-01
Abstract:Robotic Process Automation (RPA) has been widely used in business and enterprises to automate the processing of digital documents and collect information and acquire knowledge. Table structure reconstruction in scanned documents has been extensively studied as an essential application of RPA. However, the detection of table borders often ignores broken borders, which makes it unsuitable for natural scenes. To address this, our paper heavily employs a data augmentation approach to synthesize fake scanned documents to train our table-border semantic segmentation model. We propose a novel segmentation model for table borders based on semantic segmentation. We compare traditional morphology-based line detection algorithms with existing semantic segmentation-based approaches. The results indicate that our proposed algorithm can solve the frame line detection problem effectively, even for low-quality scanned images. Actual cases show that we can reconstruct the table’s structure and obtain the knowledge in the table.
What problem does this paper attempt to address?