Semantic Segmentation of Printed Text from Marathi Document Images using Deep Learning Methods

Shaheera Saba Mohd Naseem Akhter,Priti P. Rege
DOI: https://doi.org/10.1109/indicon47234.2019.9030360
2019-12-01
Abstract:Segmenting text from documents is a very important step prior to recognition. In this paper, semantic based text segmentation on Marathi document images using deep learning methods has been proposed. Semantic Segmentation using deep learning methods had given good results on various applications in the past. In this paper, U-Net and Residual U-Net (ResU-Net) architecture, are used for semantic segmentation on Marathi documents. Both the deep learning models (U-Net and ResUNet) had given a state-of-the-art performance on medical image segmentation. The models are tested on the dataset of scanned images taken from various Marathi books and articles. The experimental results show better performance on ResU-Net architecture than U-Net due to the presence of skip connection in the model. The model with skip connections avoids vanishing gradient problem, and also the feature accumulation in the model generalizes well on the segmentation task. U-Net and ResU-Net model gives 95% and 98% accuracy respectively on the dataset.
What problem does this paper attempt to address?