Leveraging Generative AI for Extracting Process Models from Multimodal Documents

Marvin Voelter,Raheleh Hadian,Timotheus Kampik,Marius Breitmayer,Manfred Reichert

2024-06-07

Abstract:This paper presents an investigation of the capabilities of Generative Pre-trained Transformers (GPTs) to auto-generate graphical process models from multi-modal (i.e., text- and image-based) inputs. More precisely, we first introduce a small dataset as well as a set of evaluation metrics that allow for a ground truth-based evaluation of multi-modal process model generation capabilities. We then conduct an initial evaluation of commercial GPT capabilities using zero-, one-, and few-shot prompting strategies. Our results indicate that GPTs can be useful tools for semi-automated process modeling based on multi-modal inputs. More importantly, the dataset and evaluation metrics as well as the open-source evaluation code provide a structured framework for continued systematic evaluations moving forward.

Software Engineering

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to use generative AI (especially multi - modal generative models) to automatically generate graphical business process models from multi - modal documents containing text and images**. Specifically, researchers hope to systematically evaluate and verify the capabilities of multi - modal generative models (such as GPTs) in processing text and image inputs to generate business process models by introducing a multi - modal dataset and an evaluation metric system. ### Main problems 1. **Processing of multi - modal inputs**: Traditional business process model generation methods usually only handle a single type of input (such as pure text or pure image), while this paper aims to explore the ability to process text and images simultaneously. 2. **Establishment of an evaluation framework**: In order to scientifically compare the performance of different models, an evaluation framework based on real - data is required. Therefore, the author created a small dataset containing 123 models and defined a set of evaluation metrics. 3. **Verification of model performance**: Through zero - shot, one - shot and few - shot prompting strategies, evaluate the performance of commercial GPT models (such as GPT - 4V) under multi - modal inputs to verify their feasibility in practical applications. ### Solutions - **Dataset construction**: The author created a multi - modal document dataset containing text and images based on the SAP - SAM dataset and provided the corresponding ground truth in JSON format. - **Evaluation framework**: Proposed an evaluation framework based on element decomposition and the adjusted Sørensen–Dice coefficient to quantify the similarity between the generative model and the ground truth. - **Experimental verification**: Experiments were carried out on GPT - 4V using zero - shot, one - shot and few - shot prompting strategies. The results show that multi - modal GPT performs excellently in some tasks (such as task names and types), but still has certain challenges when dealing with gateway labels and flows. ### Conclusions Research shows that multi - modal generative models can to a certain extent achieve the task of automatically generating business process models from multi - modal documents, but still need further improvement, especially in dealing with complex relationships and detailed information. In addition, the author also emphasizes that this research provides a structured evaluation framework for future research, which is helpful to promote the further development of this field.

Leveraging Generative AI for Extracting Process Models from Multimodal Documents

Leveraging Data Augmentation for Process Information Extraction

Conversational Process Modeling: Can Generative AI Empower Domain Experts in Creating and Redesigning Process Models?

Generative AI and Process Systems Engineering: The Next Frontier

Generating Privacy-Preserving Process Data with Deep Generative Models

Transforming the Output of Generative Pre-trained Transformer: The Influence of the PGI Framework on Attention Dynamics

Mani-GPT: A Generative Model for Interactive Robotic Manipulation

Unleashing the potential: harnessing generative artificial intelligence for empowering model training

Generative AI in the Era of Transformers: Revolutionizing Natural Language Processing with LLMs

Assisted Data Annotation for Business Process Information Extraction from Textual Documents

Power-up! What Can Generative Models Do for Human Computation Workflows?

Explainable Artificial Intelligence for Improved Modeling of Processes

LEVERAGING GENERATIVE AI: STRATEGIC ADOPTION PATTERNS FOR ENTERPRISES

ProMoAI: Process Modeling with Generative AI

Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions

Beyond Rule-based Named Entity Recognition and Relation Extraction for Process Model Generation from Natural Language Text

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

Generative AI: Overview, Economic Impact, and Applications in Asset Management

How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation