HisDoc R-CNN: Robust Chinese Historical Document Text Line Detection with Dynamic Rotational Proposal Network and Iterative Attention Head.

Cheng Jian,Lianwen Jin,Lingyu Liang,Chongyu Liu
DOI: https://doi.org/10.1007/978-3-031-41676-7_25
2023-01-01
Abstract:Text line detection is an essential task in a historical document analysis system. Although many existing text detection methods have achieved remarkable performance on various scene text datasets, they cannot perform well because of the high density, multiple scales, and multiple orientations of text lines in complex historical documents. Thus, it is crucial and challenging to investigate effective text line detection methods for historical documents. In this paper, we propose a Dynamic Rotational Proposal Network (DRPN) and an Iterative Attention Head (IAH), which are incorporated into Mask R-CNN to detect text lines in historical documents. The DRPN can dynamically generate horizontal or rotational proposals to enhance the robustness of the model for multi-oriented text lines and alleviate the multi-scale problem in historical documents. The proposed IAH integrates a multi-dimensional attention mechanism that can better learn the features of dense historical document text lines while improving detection accuracy and reducing the model parameters via an iterative mechanism. Our HisDoc R-CNN achieves state-of-the-art performance on various historical document benchmarks including CHDAC (the IACC competition ( http://iacc.pazhoulab-huangpu.com/shows/108/1.html ) dataset), MTHv2, and ICDAR 2019 HDRC CHINESE, thereby demonstrating the robustness of our method. Furthermore, we present special tricks for historical document scenarios, which may provide useful insights for practical applications.
What problem does this paper attempt to address?