RDLNet: A Novel and Accurate Real-world Document Localization Method

Yaqiang Wu,Zhen Xu,Yong Duan,Yanlai Wu,Qinghua Zheng,Hui Li,Xiaochen Hu,Lianwen Jin
DOI: https://doi.org/10.1145/3664647.3681655
2024-01-01
Abstract:The increasing use of smartphones for capturing documents in various real-world conditions has underscored the need for robust document localization technologies. Current challenges in this domain include handling diverse document types, complex backgrounds, and varying photographic conditions such as low contrast and occlusion. However, there currently are no publicly available datasets containing these complex scenarios and few methods demonstrate their capabilities on these complex scenes. To address these issues, we create a new comprehensive real-world document localization benchmark dataset which contains the complex scenarios mentioned above and propose a novel Real-world Document Localization Network (RDLNet) for locating targeted documents in the wild. The RDLNet consists of an innovative light-SAM encoder and a masked attention decoder. Utilizing light-SAM encoder, the RDLNet transfers the mighty generalization capability of SAM to the document localization task. In the decoding stage, the RDLNet exploits the masked attention and object query method to efficiently output the triple-branch predictions consisting of corner point coordinates, instance-level segmentation area and categories of different documents without extra post-processing. We compare the performance of RDLNet with other state-of-the-art approaches for real-world document localization on multiple benchmarks, the results of which reveal that the RDLNet remarkably outperforms contemporary methods, demonstrating its superiority in terms of both accuracy and practicability.
What problem does this paper attempt to address?