A Connected Component-Based Deep Learning Model for Multi-type Struck-Out Component Classification

Palaiahnakote Shivakumara,Tanmay Jain,Nitish Surana,Umapada Pal,Tong Lu,Michael Blumenstein,Sukalpa Chanda
DOI: https://doi.org/10.1007/978-3-030-86159-9_11
2021-01-01
Abstract:Due to the presence of struck-out handwritten words in document images, the performance of different methods degrades for several important applications, such as handwriting recognition, writer, gender, fraudulent document identification, document age estimation, writer age estimation, normal/abnormal behavior of person analysis, and descriptive answer evaluation. This work proposes a new method which combines connected component analysis for text component detection and deep learning for classification of struck-out and non-struck-out words. For text component detection, the proposed method finds the stroke width to detect edges of texts in images, and then performs smoothing operations to remove noise. Furthermore, morphological operations are performed on smoothed images to label connected components as text by fixing bounding boxes. Inspired by the great success of deep learning models, we explore DenseNet for classifying struck-out and non-struck-out handwritten components by considering text components as input. Experimental results on our dataset demonstrate the proposed method outperforms the existing methods in terms of classification rate.
What problem does this paper attempt to address?