ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

Zhi-Qi Cheng,Qi Dai,Siyao Li,Jingdong Sun,Teruko Mitamura,Alexander G. Hauptmann
2023-04-05
Abstract:Charts are a powerful tool for visually conveying complex data, but their comprehension poses a challenge due to the diverse chart types and intricate components. Existing chart comprehension methods suffer from either heuristic rules or an over-reliance on OCR systems, resulting in suboptimal performance. To address these issues, we present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks. By learning the rules of charts automatically from annotated datasets, our approach eliminates the need for manual rule-making, reducing effort and enhancing accuracy.~We also introduce a data variable replacement technique and extend the input and position embeddings of the pre-trained model for cross-task training. We evaluate ChartReader on Chart-to-Table, ChartQA, and Chart-to-Text tasks, demonstrating its superiority over existing methods. Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model. Moreover, our approach offers opportunities for plug-and-play integration with mainstream LLMs such as T5 and TaPas, extending their capability to chart comprehension tasks. The code is available at <a class="link-external link-https" href="https://github.com/zhiqic/ChartReader" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Multimedia
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of existing chart - understanding methods when dealing with different types of charts. Specifically, existing methods mainly face two major problems: 1. **Over - reliance on heuristic rules**: Many chart - to - table methods rely on predefined heuristic rules to identify and extract components in charts. These rules require a great deal of domain knowledge and effort to formulate and are difficult to generalize to new chart types. For example, the ChartOCR method needs to classify the chart first and then use various predefined heuristic rules to detect different components. This method is not only complex but also difficult to extend to unknown chart categories. 2. **Over - reliance on OCR systems**: Current chart - understanding methods, such as chart - to - text and chart question - answering (ChartQA), usually rely on off - the - shelf OCR systems or tables extracted from real - data. This method ignores the visual and structural information of the chart, leading to the following problems: - **Chart - to - text and chart question - answering tasks degenerate into pure - text problems**: These methods are unable to extract visual semantics from chart - to - table rendering, and thus perform poorly in understanding and answering chart - related questions. - **Chart - to - table tasks cannot benefit from chart - understanding tasks**: Due to the lack of understanding of the visual semantics of the chart, existing systems have low accuracy when converting charts into tables. To overcome these problems, the paper proposes ChartReader, a unified framework that seamlessly integrates chart - to - table rendering and understanding tasks. The main features of this framework include: - **Rule - free chart - component - detection module**: Utilize Transformer - based methods to automatically detect the position and type of chart components without manually formulating rules. - **Extended pre - trained vision - language model**: Improve the effect of cross - task training by extending input and position embeddings and introducing data - variable - replacement techniques. - **Standardized task - processing approach**: Unify chart - to - table, chart - to - text, and chart - question - answering tasks into question - answering problems, thereby effectively solving multiple chart - understanding tasks. Through these innovations, ChartReader aims to reduce the manual effort in chart analysis, improve the accuracy and efficiency of chart understanding, and take an important step towards building a general - purpose chart - understanding model.