DeMatch: Towards Understanding the Panel of Chart Documents

Hesuo Zhang,Weihong Ma,Lianwen Jin,Yichao Huang,Kai Ding,Yaqiang Wu
DOI: https://doi.org/10.1007/978-3-030-86334-0_45
2021-01-01
Abstract:Chart document understanding is a challenging task in document analysis because of the complex format and specific semantics it contains. In this paper, we first define the chart panel analysis problem and propose a complete framework that can be performed on various types of chart. Generally, a chart document contains multiple types of elements, such as title, axis, legend, plot elements and comments. To solve the problem of chart panel analysis, we developed our system focusing on two aspects: axis analysis and legend analysis. For axis analysis, we first design a fully convolutional network to detect tick marks and then use the clustering method to distinguish the X-axis and Y-axis. A rectangle-growing matching rule is proposed to associate the predicted tick mark with its corresponding text. For legend analysis, we use a cascaded head detector to determine the accurate location of the legend marks and mark-text pairs; and then we design the highest IoU matching rule to determine the legend label text and its corresponding legend mark. Experimental results on synthetic and real data sets demonstrate the effectiveness of the proposed method. Specifically, we obtained the best result on the test set of the ICDAR2019 competition on harvesting raw tables from infographics, and state-of-the-art performance on UB PMC2020 data set of the ICPR2020 competition. Code will be at https://github.com/iiiHunter/CHART- DeMatch.
What problem does this paper attempt to address?