An Iterative Graph Learning Convolution Network for Key Information Extraction Based on the Document Inductive Bias.

Jiyao Deng,Yi Zhang,Xinpeng Zhang,Zhi Tang,Liangcai Gao
DOI: https://doi.org/10.1007/978-3-031-41682-8_6
2023-01-01
Abstract:Recently, there has been growing interest in automating the extraction of key information from document images. Previous methods mainly focus on modelling the complex interactions between multimodal features(text, vision and layout) of documents to comprehend their content. However, only considering these interactions may not work well when dealing with unseen document templates. To address this issue, in this paper, we propose a novel approach that incorporates the concept of document inductive bias into the graph convolution framework. Our approach recognizes that the content of a text segment in a document is often determined by the context provided by its surrounding segments and utilizes an adjacency matrix hybrid strategy to integrate this bias into the model. As a result, the model is able to better understand the relationships between text segments even when faced with unseen templates. Besides, we employ an iterative method to perform graph convolution operation, making full use of the textual, visual, and spatial information contained within documents. Extensive experimental results on two publicly available datasets demonstrate the effectivness of our methods.
What problem does this paper attempt to address?