DSANet: dilated spatial attention network for the detection of text, non-text and touching components in unconstrained handwritten documents

Showmik Bhowmik,Shaikh Risat,Bhaskar Sarkar
DOI: https://doi.org/10.1007/s00521-024-10013-8
2024-06-06
Neural Computing and Applications
Abstract:Handwritten documents generated in our day-to-day office work, class room and other sectors of society carry vital information. Automatic processing of these documents is a pipeline of many challenging steps. The very first and crucial step is to identify text separately from the non-text as any OCR (optical character recognition) engine can only process the textual content. Separating text from non-text in unconstrained handwritten documents is a very complex task. In addition to other challenges, touching component is one of the major issues for text non-text separation in unconstrained handwritten documents. Detection of text, non-text along with touching component in such documents is an unexplored area of research. To address this issue, in this work, we develop a dilated spatial attention-based network for text, non-text and touching component detection. Additionally, in this work, we also prepare a realistic dataset for the said task. In the proposed dataset, the present model obtains overall accuracy of 87.85%. The performance of the present model is compared with seven feature-engineering-based methods and six deep learning-based methods. In most of the cases, the proposed model outperforms the comparing methods in the proposed dataset. The codes of our method are available here https://github.com/Showmik-Bhowmik/DSANet-Dilated-Spatial-Attention-.git.
computer science, artificial intelligence
What problem does this paper attempt to address?