EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Muye Huang,Lai Han,Xinyu Zhang,Wenjun Wu,Jie Ma,Lingling Zhang,Jun Liu
2024-09-03
Abstract:Chart understanding enables automated data analysis for humans, which requires models to achieve highly accurate visual comprehension. While existing Visual Language Models (VLMs) have shown progress in chart understanding, the lack of high-quality training data and comprehensive evaluation benchmarks hinders VLM chart comprehension. In this paper, we introduce EvoChart, a novel self-training method for generating synthetic chart data to enhance VLMs' capabilities in real-world chart comprehension. We also propose EvoChart-QA, a noval benchmark for measuring models' chart comprehension abilities in real-world scenarios. Specifically, EvoChart is a unique self-training data synthesis approach that simultaneously produces high-quality training corpus and a high-performance chart understanding model. EvoChart-QA consists of 650 distinct real-world charts collected from 140 different websites and 1,250 expert-curated questions that focus on chart understanding. Experimental results on various open-source and proprietary VLMs tested on EvoChart-QA demonstrate that even the best proprietary model, GPT-4o, achieves only 49.8% accuracy. Moreover, the EvoChart method significantly boosts the performance of open-source VLMs on real-world chart understanding tasks, achieving 54.2% accuracy on EvoChart-QA.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issues that current Visual Language Models (VLMs) face in understanding real-world charts. Specifically, the paper focuses on the following aspects: 1. **Lack of high-quality training data**: Although existing VLMs have made progress in chart understanding, their actual performance is poor due to the lack of high-quality training data and comprehensive evaluation benchmarks. 2. **Limitations of existing datasets**: The widely used ChartQA dataset has a single source problem and overly focuses on advanced chart reasoning, which leads to an overestimation of the model's performance and fails to fully reflect its true chart understanding capabilities. To tackle these issues, the authors propose the EvoChart method, a novel self-training data synthesis approach that can generate high-quality chart datasets with real-world characteristics. Additionally, the EvoChart-QA benchmark is introduced to evaluate the model's chart understanding ability in real-world scenarios. Experimental results show that the EvoChart method significantly improves the performance of open-source VLMs on real-world chart understanding tasks, achieving an accuracy of 54.2%.