FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding

Huitong Pan,Qi Zhang,Cornelia Caragea,Eduard Dragut,Longin Jan Latecki
2024-07-10
Abstract:Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex scientific flowcharts and simulated flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature and the simulated subset contains 10,000 flowcharts created using a customizable script. The dataset is enriched with annotations for visual components, OCR, Mermaid code representation, and VQA question-answer pairs. Despite the proven capabilities of Large Vision-Language Models (LVLMs) in various visual understanding tasks, their effectiveness in decoding flowcharts - a crucial element of scientific communication - has yet to be thoroughly investigated. The FlowLearn test set is crafted to assess the performance of LVLMs in flowchart comprehension. Our study thoroughly evaluates state-of-the-art LVLMs, identifying existing limitations and establishing a foundation for future enhancements in this relatively underexplored domain. For instance, in tasks involving simulated flowcharts, GPT-4V achieved the highest accuracy (58%) in counting the number of nodes, while Claude recorded the highest accuracy (83%) in OCR tasks. Notably, no single model excels in all tasks within the FlowLearn framework, highlighting significant opportunities for further development.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the shortcomings of Large Vision-Language Models (LVLMs) in understanding flowcharts. Specifically, the paper introduces the FlowLearn dataset, a resource specifically designed to enhance flowchart comprehension capabilities. FlowLearn includes complex scientific flowcharts and simulated flowcharts, aiming to evaluate the ability of LVLMs to decode these important scientific communication tools. Although LVLMs have shown excellent performance in various visual understanding tasks, their understanding of flowcharts has not been fully explored. By introducing detailed annotations and diverse evaluation tasks (such as node counting, OCR recognition, etc.), the paper reveals the limitations of existing models in handling complex flowcharts and provides benchmarks and directions for future research. The main objectives of the paper include: 1. Addressing the lack of flowchart annotations in current datasets. 2. Evaluating the performance of different LVLMs in flowchart understanding, identifying their strengths and weaknesses. 3. Providing foundational resources for the future development of more powerful visual understanding models.