Ensemble Model of Attention Mechanism-Based DCGAN and Autoencoder for Noised OCR Classification

Shuguang Xiong,Huitao Zhang,Meng Wang
DOI: https://doi.org/10.30564/jeis.v4i1.6725
2022-03-31
Journal of Electronic & Information Systems
Abstract:Optical Character Recognition (OCR) is a technology that converts images of text into machine-readable formats, essential for digitizing printed texts and enabling digital searches. Traditional OCR methods often struggle with variations in font styles and noise. This paper proposes an innovative approach to enhance OCR classification under challenging conditions by leveraging an ensemble model that combines an Attention Mechanism-Based Generative Adversarial Network (GAN) and an Autoencoder. The GAN generates synthetic data to mitigate the limitations of small datasets, while the autoencoder extracts robust features from noisy images. The model undergoes a two-phase training process, initially learning from the augmented dataset and then fine-tuning on a smaller, labeled dataset. Grad-CAM is used to demonstrate interpretability, highlighting the attention regions during predictions. Experimental results show significant improvements in OCR accuracy and robustness, validating the effectiveness of the proposed method in handling noise and limited training data.
What problem does this paper attempt to address?