StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding

Renqiu Xia,Bo Zhang,Haoyang Peng,Hancheng Ye,Xiangchao Yan,Peng Ye,Botian Shi,Yu Qiao,Junchi Yan
DOI: https://doi.org/10.48550/arxiv.2309.11268
2023-01-01
Abstract:Charts are common in literature across different scientific fields, conveyingrich information easily accessible to readers. Current chart-related tasksfocus on either chart perception which refers to extracting information fromthe visual charts, or performing reasoning given the extracted data, e.g. in atabular form. In this paper, we aim to establish a unified and label-efficientlearning paradigm for joint perception and reasoning tasks, which can begenerally applicable to different downstream tasks, beyond thequestion-answering task as specifically studied in peer works. Specifically,StructChart first reformulates the chart information from the popular tubularform (specifically linearized CSV) to the proposed Structured TripletRepresentations (STR), which is more friendly for reducing the task gap betweenchart perception and reasoning due to the employed structured informationextraction for charts. We then propose a Structuring Chart-orientedRepresentation Metric (SCRM) to quantitatively evaluate the performance for thechart perception task. To enrich the dataset for training, we further explorethe possibility of leveraging the Large Language Model (LLM), enhancing thechart diversity in terms of both chart visual style and its statisticalinformation. Extensive experiments are conducted on various chart-relatedtasks, demonstrating the effectiveness and promising potential for a unifiedchart perception-reasoning paradigm to push the frontier of chartunderstanding.
What problem does this paper attempt to address?