Abstract:Charts are a powerful tool for visually conveying complex data, but their comprehension poses a challenge due to the diverse chart types and intricate components. Existing chart comprehension methods suffer from either heuristic rules or an over-reliance on OCR systems, resulting in suboptimal performance. To address these issues, we present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks. By learning the rules of charts automatically from annotated datasets, our approach eliminates the need for manual rule-making, reducing effort and enhancing accuracy.~We also introduce a data variable replacement technique and extend the input and position embeddings of the pre-trained model for cross-task training. We evaluate ChartReader on Chart-to-Table, ChartQA, and Chart-to-Text tasks, demonstrating its superiority over existing methods. Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model. Moreover, our approach offers opportunities for plug-and-play integration with mainstream LLMs such as T5 and TaPas, extending their capability to chart comprehension tasks. The code is available at <a class="link-external link-https" href="https://github.com/zhiqic/ChartReader" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of existing chart - understanding methods when dealing with different types of charts. Specifically, existing methods mainly face two major problems: 1. **Over - reliance on heuristic rules**: Many chart - to - table methods rely on predefined heuristic rules to identify and extract components in charts. These rules require a great deal of domain knowledge and effort to formulate and are difficult to generalize to new chart types. For example, the ChartOCR method needs to classify the chart first and then use various predefined heuristic rules to detect different components. This method is not only complex but also difficult to extend to unknown chart categories. 2. **Over - reliance on OCR systems**: Current chart - understanding methods, such as chart - to - text and chart question - answering (ChartQA), usually rely on off - the - shelf OCR systems or tables extracted from real - data. This method ignores the visual and structural information of the chart, leading to the following problems: - **Chart - to - text and chart question - answering tasks degenerate into pure - text problems**: These methods are unable to extract visual semantics from chart - to - table rendering, and thus perform poorly in understanding and answering chart - related questions. - **Chart - to - table tasks cannot benefit from chart - understanding tasks**: Due to the lack of understanding of the visual semantics of the chart, existing systems have low accuracy when converting charts into tables. To overcome these problems, the paper proposes ChartReader, a unified framework that seamlessly integrates chart - to - table rendering and understanding tasks. The main features of this framework include: - **Rule - free chart - component - detection module**: Utilize Transformer - based methods to automatically detect the position and type of chart components without manually formulating rules. - **Extended pre - trained vision - language model**: Improve the effect of cross - task training by extending input and position embeddings and introducing data - variable - replacement techniques. - **Standardized task - processing approach**: Unify chart - to - table, chart - to - text, and chart - question - answering tasks into question - answering problems, thereby effectively solving multiple chart - understanding tasks. Through these innovations, ChartReader aims to reduce the manual effort in chart analysis, improve the accuracy and efficiency of chart understanding, and take an important step towards building a general - purpose chart - understanding model.

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

Advancing Chart Question Answering with Robust Chart Component Recognition

ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM

Improving Machine Understanding of Human Intent in Charts

Chart Understanding with Large Language Model

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

ChartBench: A Benchmark for Complex Visual Reasoning in Charts

mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs

Chartem: Reviving Chart Images with Data Embedding

ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs

ChartEye: A Deep Learning Framework for Chart Information Extraction

An Intelligent Approach to Automatically Discovering Visual Insights

DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding