CALM: Commen-Sense Knowledge Augmentation for Document Image Understanding

Qinyi Du,Qingqing Wang,Keqian Li,Jidong Tian,Liqiang Xiao,Yaohui Jin
DOI: https://doi.org/10.1145/3503161.3548321
2022-01-01
Abstract:Performance of document image understanding has been significantly fueled by encoding multi-modal information in recent years. However, existing works heavily rely on the superficial appearance of the observed data, resulting in counter-intuitive model behavior in many critical cases. To overcome this issue, this paper proposes a common-sense knowledge augmented model CALM for document image understanding tasks. It firstly produces purified representations of document contents to extract key information and learn common-sense augmented representation for inputs. Then, relevant common-sense knowledge is extracted from the external ConceptNet knowledge base, and a derived knowledge graph is built to enhance the common-sense reasoning capability of CALM jointly. In order to further highlight the importance of common-sense knowledge in document image understanding, we propose the first question-answering dataset, CS-DVQA, focused on common-sense reasoning for document images, in which questions are answered by taking both document contents and common-sense knowledge into consideration. Through extensive evaluation, the proposed CALM approach outperforms the state-of-the-art models in three document image understanding tasks, including key information extraction(from 85.37 to 86.52), document image classification(from 96.08 to 96.17), document visual question answering(from 86.72 to 88.03).
What problem does this paper attempt to address?