A Deep Learning-Based Pre-Trained VGG19 Model for Optical Character Recognition

Shagun Sharma,Gurpreet Singh,Kalpna Guleria
DOI: https://doi.org/10.1109/ICoICI62503.2024.10696044
2024-08-28
Abstract:Optical Character Recognition (OCR) stands as a pivotal technology in the digitization and processing of textual information from images. In this study, we propose a novel approach to OCR leveraging the VGG19 convolutional neural network (CNN) architecture. VGG19, renowned for its depth and performance in image classification tasks, is repurposed here to tackle the intricate challenges of character recognition. Through extensive experimentation and evaluation, this study demonstrate the efficacy of the proposed approach in achieving state-of-the-art accuracy and robustness in extracting textual information from images. This study uses diverse datasets comprising printed and handwritten text samples, augmenting them using various techniques to enhance model generalization. The VGG19 model is trained end-to-end, with its convolutional layers serving as feature extractors for character recognition. This paper presents a novel approach to Optical Character Recognition (OCR) using the VGG19 convolutional neural network (CNN). OCR is a fundamental technology that converts printed or handwritten text into digital format, facilitating document digitization and information retrieval. The proposed method leverages the hierarchical features learned by VGG19 to accurately extract textual information from images. This study has conducted experiments using publicly available datasets, achieving significant improvements in both training and test accuracy across epochs. Specifically, the proposed model has achieved a training accuracy of 94.34% and a test accuracy of 94.96% after ten epochs of training. Furthermore, we observed a consistent decrease in both training and test loss throughout the training process, indicating effective convergence and refinement of the model parameters. These results demonstrate the efficacy of the VGG19-based OCR model in accurately recognizing characters from diverse input images, highlighting its potential for various real-world applications such as document digitization, augmented reality, and accessibility tools.
Engineering,Computer Science
What problem does this paper attempt to address?