Abstract:Text detection is a fundamental task in computer vision, particularly for Optical Character Recognition (OCR) applications. This study focuses on text detection within an OCR application, encompassing text detection, text recognition, and information extraction, explicitly focusing on text detection. Character-Region Awareness for Text Detection (CRAFT), Pyramid Mask Text Detector (PMTD), and Scene Text Detection with Supervised Pyramid Context Network (SPCNET) have demonstrated promising results in bounding-box detection. However, it faces challenges related to post-processing and multiline text detection. A post-processing problem arises because of the need to reconfigure the model when new documents are introduced, which leads to inefficiencies and complexities. In addition, CRAFT tends to merge bounding boxes from consecutive lines by introducing multiline errors, especially for CRAFT. To address these challenges, this study proposes an adapted approach based on Mask R-CNN, an instance segmentation model that treats each text element as an individual object. By adopting the Mask R-CNN approach, post-processing issues were successfully eliminated. Moreover, the multiline problem is effectively resolved. Comparative experiments demonstrate that the proposed model achieves results comparable to those of these models while surpassing them in accuracy and versatility. The proposed model is extensively evaluated on various document types, including bankbooks, Thai ID cards (both front and back sides), invoices, car registrations, mobile banking slips, passports, Indonesian ID cards, driver licenses, and receipts. The results indicated the model's high performance and potential for real-world applications. Eliminating post-processing and multiline problems ensures the model's adaptability to a wide range of document structures and reduces both time inference and resource utilization.

Detection Masking for Improved OCR on Noisy Documents

Towards Mask-robust Face Recognition.

OCR accuracy improvement on document images through a novel pre-processing approach

An Evaluation of OCR Systems Against Adversarial Machine Learning

Text Detection Forgot About Document OCR

EraseNet: A Recurrent Residual Network for Supervised Document Cleaning

A new method for detection and prediction of occluded text in natural scene images

Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection

Manipulation Mask Generator: High-Quality Image Manipulation Mask Generation Method Based on Modified Total Variation Noise Reduction

BusiNet -- a Light and Fast Text Detection Network for Business Documents

A Masked-Face Detection Algorithm Based on M-EIOU Loss and Improved ConvNeXt

Mask wearing object detection algorithm based on improved YOLOv5

Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents

MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

RetinaFaceMask: A Single Stage Face Mask Detector for Assisting Control of the COVID-19 Pandemic

A Page Object Detection Method Based on Mask R-CNN

Quality of OCR for Degraded Text Images

PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

Ensemble Model of Attention Mechanism-Based DCGAN and Autoencoder for Noised OCR Classification

Deep Learning-Based Multifunctional End-to-End Model for Optical Character Classification and Denoising

Universal Defensive Underpainting Patch: Making Your Text Invisible to Optical Character Recognition