GenCheck: A LoRA-Adapted Multimodal Large Language Model for Check Analysis
Zixi Nan,Gary Warner,Yuanfei Ma,Bin Huang,Jiawen Chen,Rushi Chen,Fei Zhao,Chengcui Zhang,Shaorou Tang
DOI: https://doi.org/10.1109/MIPR62202.2024.00021
2024-08-07
Abstract:Rising incidences of paper check fraud, particularly with checks illicitly sold on platforms such as Telegram, pose significant challenges in financial security. Despite investigators' capability to gain access to these platforms, manually pinpointing checks in images and extracting necessary details to alert banks are inefficient and unscalable. Traditional optical character recognition-based (OCR) systems for extracting textual details from checks specifically struggle with handwritten content and are constrained by their dependency on predefined check layouts, limiting their effectiveness across varied and evolving check designs. To address these challenges, we introduce GenCheck, a generative AI-based framework that automates both the check detection and accurate extraction of check information, ensuring robust performance across various check layouts or styles. GenCheck operates through a two-stage pipeline: the preliminary stage encompasses multiple sub-tasks including check image classification, single check segmentation, image rectification, and check element detection, while the main stage focuses on the key task of check information extraction. Central to our pipeline is the strategic enhancement of a state-of-the-art (SOTA) multimodal large language model (LLaVA-NeXT) using Low-Rank Adaptation (LoRA). This fine-tuning leverages the model's pre-trained knowledge, applying a targeted, parameter-efficient approach that significantly enhances its ability to accurately extract key details such as dates, amounts, and payee information from paper check images. Our framework achieves exceptional accuracy rates in extracting date information with 92.07 % for year, 85.16% for month, and 82.72% for day. It also obtains an accuracy of 80.61 % in extracting monetary amounts and a normalized edit distance of 0.2583 for payee information, demonstrating substantial improvements over pure OCR-based methods. As the first framework of its kind, GenCheck estab-lishes a methodological base that supports continuous innovation and enhancement, allowing for independent updates of each component model. This also sets a new standard in automated check analysis, reducing the need for labor-intensive, rule-based processes and significantly advancing fraud prevention initiatives.
Business,Computer Science