EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Muye Huang,Lai Han,Xinyu Zhang,Wenjun Wu,Jie Ma,Lingling Zhang,Jun Liu

2024-09-03

Abstract:Chart understanding enables automated data analysis for humans, which requires models to achieve highly accurate visual comprehension. While existing Visual Language Models (VLMs) have shown progress in chart understanding, the lack of high-quality training data and comprehensive evaluation benchmarks hinders VLM chart comprehension. In this paper, we introduce EvoChart, a novel self-training method for generating synthetic chart data to enhance VLMs' capabilities in real-world chart comprehension. We also propose EvoChart-QA, a noval benchmark for measuring models' chart comprehension abilities in real-world scenarios. Specifically, EvoChart is a unique self-training data synthesis approach that simultaneously produces high-quality training corpus and a high-performance chart understanding model. EvoChart-QA consists of 650 distinct real-world charts collected from 140 different websites and 1,250 expert-curated questions that focus on chart understanding. Experimental results on various open-source and proprietary VLMs tested on EvoChart-QA demonstrate that even the best proprietary model, GPT-4o, achieves only 49.8% accuracy. Moreover, the EvoChart method significantly boosts the performance of open-source VLMs on real-world chart understanding tasks, achieving 54.2% accuracy on EvoChart-QA.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the issues that current Visual Language Models (VLMs) face in understanding real-world charts. Specifically, the paper focuses on the following aspects: 1. **Lack of high-quality training data**: Although existing VLMs have made progress in chart understanding, their actual performance is poor due to the lack of high-quality training data and comprehensive evaluation benchmarks. 2. **Limitations of existing datasets**: The widely used ChartQA dataset has a single source problem and overly focuses on advanced chart reasoning, which leads to an overestimation of the model's performance and fails to fully reflect its true chart understanding capabilities. To tackle these issues, the authors propose the EvoChart method, a novel self-training data synthesis approach that can generate high-quality chart datasets with real-world characteristics. Additionally, the EvoChart-QA benchmark is introduced to evaluate the model's chart understanding ability in real-world scenarios. Experimental results show that the EvoChart method significantly improves the performance of open-source VLMs on real-world chart understanding tasks, achieving an accuracy of 54.2%.

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

ChartBench: A Benchmark for Complex Visual Reasoning in Charts

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs

Chart Understanding with Large Language Model

ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning

Enhancing Question Answering on Charts Through Effective Pre-training Tasks

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

On Pre-training of Multimodal Language Models Customized for Chart Understanding

SynChart: Synthesizing Charts from Language Models

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback

CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models