A two-stage method for text line detection in historical documents

Tobias Grüning,Gundram Leifert,Tobias Strauß,Johannes Michael,Roger Labahn
DOI: https://doi.org/10.1007/s10032-019-00332-1
2019-07-23
International Journal on Document Analysis and Recognition (IJDAR)
Abstract:This work presents a two-stage text line detection method for historical documents. Each detected text line is represented by its baseline. In a first stage, a deep neural network called ARU-Net labels pixels to belong to one of the three classes: baseline, separator and other. The separator class marks beginning and end of each text line. The ARU-Net is trainable from scratch with manageably few manually annotated example images ((<,50)). This is achieved by utilizing data augmentation strategies. The network predictions are used as input for the second stage which performs a bottom-up clustering to build baselines. The developed method is capable of handling complex layouts as well as curved and arbitrarily oriented text lines. It substantially outperforms current state-of-the-art approaches. For example, for the complex track of the cBAD: ICDAR2017 Competition on Baseline Detection the F value is increased from 0.859 to 0.922. The framework to train and run the ARU-Net is open source.
computer science, artificial intelligence
What problem does this paper attempt to address?