A Table Detection Method for PDF Documents Based on Convolutional Neural Networks

Leipeng Hao,Liangcai Gao,Xiaohan Yi,Zhi Tang
DOI: https://doi.org/10.1109/das.2016.23
2016-01-01
Abstract:Because of the better performance of deep learning on many computer vision tasks, researchers in the area of document analysis and recognition begin to adopt this technique into their work. In this paper, we propose a novel method for table detection in PDF documents based on convolutional neutral networks, one of the most popular deep learning models. In the proposed method, some table-like areas are selected first by some loose rules, and then the convolutional networks are built and refined to determine whether the selected areas are tables or not. Besides, the visual features of table areas are directly extracted and utilized through the convolutional networks, while the non-visual information (e.g. characters, rendering instructions) contained in original PDF documents is also taken into consideration to help achieve better recognition results. The primary experimental results show that the approach is effective in table detection.
What problem does this paper attempt to address?