Dual-branch dilated context convolutional for table detection transformer in the document images

Ying Ni,Xiaoli Wang,Hanghang Peng,Yonzhi Li,Jinyang Wang,Haoxuan Li,Jin Huang
DOI: https://doi.org/10.1007/s00371-024-03561-6
IF: 2.835
2024-07-18
The Visual Computer
Abstract:With the increasing automation of document images like financial reports, table detection has become a critical component of document automation. It requires models to extract the position information of tables in document images without losing information. However, existing techniques still fall short in detecting certain small-sized or irregularly shaped tables. To address this issue, we propose a Transformer-based table detection model. To enhance both training efficiency and prediction performance, we employ a pretrained Transformer framework for fine-tuning to effectively capture underlying features. Additionally, we integrate a dual-branch dilated context convolutional module to further improve the detection accuracy and robustness for tables of various sizes and shapes by processing high-dimensional features. Furthermore, we integrated multiple layers of residual convolutional layers to capture and fuse features at different scales, enhancing the network's ability to represent features in multi-scale feature fusion, thus enhancing the detection performance of the network. We used feature maps and heatmaps for visualization to verify the reliability of our method. We evaluate our method on publicly available document datasets, and the results demonstrate that our approach achieves more advanced performance in evaluation metrics such as Precision. https://github.com/GT-HZ/TD
computer science, software engineering
What problem does this paper attempt to address?