From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing

Jingxuan Wei,Cheng Tan,Qi Chen,Gaowei Wu,Siyuan Li,Zhangyang Gao,Linzhuang Sun,Bihui Yu,Ruifeng Guo
2024-11-18
Abstract:We introduce the task of text-to-diagram generation, which focuses on creating structured visual representations directly from textual descriptions. Existing approaches in text-to-image and text-to-code generation lack the logical organization and flexibility needed to produce accurate, editable diagrams, often resulting in outputs that are either unstructured or difficult to modify. To address this gap, we introduce DiagramGenBenchmark, a comprehensive evaluation framework encompassing eight distinct diagram categories, including flowcharts, model architecture diagrams, and mind maps. Additionally, we present DiagramAgent, an innovative framework with four core modules-Plan Agent, Code Agent, Check Agent, and Diagram-to-Code Agent-designed to facilitate both the generation and refinement of complex diagrams. Our extensive experiments, which combine objective metrics with human evaluations, demonstrate that DiagramAgent significantly outperforms existing baseline models in terms of accuracy, structural coherence, and modifiability. This work not only establishes a foundational benchmark for the text-to-diagram generation task but also introduces a powerful toolset to advance research and applications in this emerging area.
Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of automatically generating structured diagrams from text descriptions (text - to - diagram generation). Existing text - to - image and text - to - code generation methods have problems of weak logical organization and difficulty in modification when generating structured diagrams. Specifically: 1. **Text - to - image generation**: Although these methods can generate visually appealing images, they lack the precise relationships and hierarchical organization required for generating structured diagrams, resulting in logically incoherent, difficult - to - understand and - modify images. 2. **Text - to - code generation**: These methods can generate basic visual diagrams, such as bar charts and line charts, but when dealing with more complex diagrams, such as flowcharts, model architecture diagrams and mind maps, they lack flexibility and structural logic and cannot meet the fine - grained organization and interactive editing requirements of complex diagrams. To fill this gap, the paper introduces **DiagramGenBenchmark**, a comprehensive evaluation framework that covers eight different types of diagrams, including flowcharts, model architecture diagrams and mind maps. In addition, the paper proposes **DiagramAgent**, an innovative framework that contains four core modules: Plan Agent, Code Agent, Check Agent and Diagram - to - Code Agent. These modules are designed to promote the generation and refinement of complex diagrams. Through extensive experiments, combined with objective metrics and human evaluation, the paper proves that **DiagramAgent** is significantly superior to existing baseline models in terms of accuracy, structural consistency and modifiability. This work not only establishes a basic benchmark for the text - to - diagram generation task, but also provides a powerful set of tools for research and application in this emerging field.