An Efficient Method to Extract Data from Bank Statements Based on Image-Based Table Detection

Phu Nguyen,Tam-An Vo-Nguyen,Hoanh-Su Le
DOI: https://doi.org/10.1109/acomp53746.2021.00033
2021-11-01
Abstract:In this research, a quick survey was conducted in a Vietnam university and it revealed the outdated paperwork handling in the financial department. Specifically, the bank statement, which is a financial transaction tracking document from banking partners sending monthly to customers, is currently inputted into financial software completely manually. Therefore, this study aims to automatize the extracting data stage by analyzing the table structure on the document, which may reduce the effort for the accountants. The output of this study is extracted text, which can then be added to the software by robotic automation process or other technologies. In this paper, the used methodology is the imaged-based approach only. The bank statement soft copy was converted into an image before being processed through table detection, cell recognition, and text extraction. The text is displayed in a spreadsheet as the output of the process. The measurement on the experiment dataset returned an accuracy of over 93% in most cases. These results suggest that the imaged-based method is applicable for extracting data from the university’s bank statements without performing more complicated technologies. However, the output must be put in review by the users to eliminate unwarranted financial errors.
Computer Science
What problem does this paper attempt to address?