FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding

Huitong Pan,Qi Zhang,Cornelia Caragea,Eduard Dragut,Longin Jan Latecki

2024-07-10

Abstract:Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex scientific flowcharts and simulated flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature and the simulated subset contains 10,000 flowcharts created using a customizable script. The dataset is enriched with annotations for visual components, OCR, Mermaid code representation, and VQA question-answer pairs. Despite the proven capabilities of Large Vision-Language Models (LVLMs) in various visual understanding tasks, their effectiveness in decoding flowcharts - a crucial element of scientific communication - has yet to be thoroughly investigated. The FlowLearn test set is crafted to assess the performance of LVLMs in flowchart comprehension. Our study thoroughly evaluates state-of-the-art LVLMs, identifying existing limitations and establishing a foundation for future enhancements in this relatively underexplored domain. For instance, in tasks involving simulated flowcharts, GPT-4V achieved the highest accuracy (58%) in counting the number of nodes, while Claude recorded the highest accuracy (83%) in OCR tasks. Notably, no single model excels in all tasks within the FlowLearn framework, highlighting significant opportunities for further development.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the shortcomings of Large Vision-Language Models (LVLMs) in understanding flowcharts. Specifically, the paper introduces the FlowLearn dataset, a resource specifically designed to enhance flowchart comprehension capabilities. FlowLearn includes complex scientific flowcharts and simulated flowcharts, aiming to evaluate the ability of LVLMs to decode these important scientific communication tools. Although LVLMs have shown excellent performance in various visual understanding tasks, their understanding of flowcharts has not been fully explored. By introducing detailed annotations and diverse evaluation tasks (such as node counting, OCR recognition, etc.), the paper reveals the limitations of existing models in handling complex flowcharts and provides benchmarks and directions for future research. The main objectives of the paper include: 1. Addressing the lack of flowchart annotations in current datasets. 2. Evaluating the performance of different LVLMs in flowchart understanding, identifying their strengths and weaknesses. 3. Providing foundational resources for the future development of more powerful visual understanding models.

FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding

First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models

Do Vision-Language Models Really Understand Visual Language?

FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts

Chart Understanding with Large Language Model

Visualization Literacy of Multimodal Large Language Models: A Comparative Study

Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs

Advancement in Graph Understanding: A Multimodal Benchmark and Fine-Tuning of Vision-Language Models

CaLMFlow: Volterra Flow Matching using Causal Language Models

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks

ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models

How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach

Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis

First Experiments On A New Online Handwritten Flowchart Database