Abstract:We introduce the task of text-to-diagram generation, which focuses on creating structured visual representations directly from textual descriptions. Existing approaches in text-to-image and text-to-code generation lack the logical organization and flexibility needed to produce accurate, editable diagrams, often resulting in outputs that are either unstructured or difficult to modify. To address this gap, we introduce DiagramGenBenchmark, a comprehensive evaluation framework encompassing eight distinct diagram categories, including flowcharts, model architecture diagrams, and mind maps. Additionally, we present DiagramAgent, an innovative framework with four core modules-Plan Agent, Code Agent, Check Agent, and Diagram-to-Code Agent-designed to facilitate both the generation and refinement of complex diagrams. Our extensive experiments, which combine objective metrics with human evaluations, demonstrate that DiagramAgent significantly outperforms existing baseline models in terms of accuracy, structural coherence, and modifiability. This work not only establishes a foundational benchmark for the text-to-diagram generation task but also introduces a powerful toolset to advance research and applications in this emerging area.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of automatically generating structured diagrams from text descriptions (text - to - diagram generation). Existing text - to - image and text - to - code generation methods have problems of weak logical organization and difficulty in modification when generating structured diagrams. Specifically: 1. **Text - to - image generation**: Although these methods can generate visually appealing images, they lack the precise relationships and hierarchical organization required for generating structured diagrams, resulting in logically incoherent, difficult - to - understand and - modify images. 2. **Text - to - code generation**: These methods can generate basic visual diagrams, such as bar charts and line charts, but when dealing with more complex diagrams, such as flowcharts, model architecture diagrams and mind maps, they lack flexibility and structural logic and cannot meet the fine - grained organization and interactive editing requirements of complex diagrams. To fill this gap, the paper introduces **DiagramGenBenchmark**, a comprehensive evaluation framework that covers eight different types of diagrams, including flowcharts, model architecture diagrams and mind maps. In addition, the paper proposes **DiagramAgent**, an innovative framework that contains four core modules: Plan Agent, Code Agent, Check Agent and Diagram - to - Code Agent. These modules are designed to promote the generation and refinement of complex diagrams. Through extensive experiments, combined with objective metrics and human evaluation, the paper proves that **DiagramAgent** is significantly superior to existing baseline models in terms of accuracy, structural consistency and modifiability. This work not only establishes a basic benchmark for the text - to - diagram generation task, but also provides a powerful set of tools for research and application in this emerging field.

From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement

DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

What Makes a Scene ? Scene Graph-based Evaluation and Feedback for Controllable Generation

Sketch-based cooperative diagramming tool for conceptual design

Efficient Visual Metaphor Image Generation Based on Metaphor Understanding

A Diagram Is Worth A Dozen Images

Computer Science Diagram Understanding with Topology Parsing

Structure Diagram Recognition in Financial Announcements

Structsum Generation for Faster Text Comprehension

BubbleFormer: Bubble Diagram Generation via Dual Transformer Models

A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions

TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design

Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver

MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

From Dialogue to Diagram: Task and Relationship Extraction from Natural Language for Accelerated Business Process Prototyping

Automated generation of geometric theorems from images of diagrams

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

RL-CSDia: Representation Learning of Computer Science Diagrams

A Bidirectional-Transformation-based Framework for Software Visualization and Visual Editing