TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging

Liang Zhang,Anwen Hu,Haiyang Xu,Ming Yan,Yichen Xu,Qin Jin,Ji Zhang,Fei Huang
DOI: https://doi.org/10.18653/v1/2024.emnlp-main.112
2024-01-01
Abstract:Charts are important for presenting and explaining complex datarelationships. Recently, multimodal large language models (MLLMs) have shownremarkable capabilities in various chart understanding tasks. However, thesheer size of these models in terms of parameters and computationalrequirements limits their use in resource-constrained environments. In thispaper, we present TinyChart, an efficient MLLM for chart understanding withonly 3B parameters. TinyChart overcomes two key challenges in efficient chartunderstanding: (1) reduce the burden of learning numerical computations througha Program-of-Thoughts (PoT) learning strategy, which trains the model togenerate Python programs for numerical calculations, and (2) reduce lengthyvision feature sequences produced by the vision transformer for high-resolutionimages through a Vision Token Merging module, which gradually merges mostsimilar vision tokens. Extensive experiments demonstrate that our 3B TinyChartachieves SOTA performance on a variety of chart understanding benchmarksincluding ChartQA, Chart-to-Text, Chart-to-Table, OpenCQA, and ChartX. Itoutperforms several chart understanding MLLM with up to 13B parameters such asChartLlama and ChartAst, and close-sourced general-purpose MLLM GPT-4V onChartQA. It also demonstrates its superior efficiency with higher throughputduring inference due to a smaller model scale and more efficient visionencoding. Our code and model are available athttps://github.com/X-PLUG/mPLUG-DocOwl/tree/main/TinyChart.
What problem does this paper attempt to address?