Abstract:The process engineering domain widely uses Process Flow Diagrams (PFDs) and Process and Instrumentation Diagrams (P&IDs) to represent process flows and equipment configurations. However, the P&IDs and PFDs, hereafter called flowsheets, can contain errors causing safety hazards, inefficient operation, and unnecessary expenses. Correcting and verifying flowsheets is a tedious, manual process. We propose a novel generative AI methodology for automatically identifying errors in flowsheets and suggesting corrections to the user, i.e., autocorrecting flowsheets. Inspired by the breakthrough of Large Language Models (LLMs) for grammatical autocorrection of human language, we investigate LLMs for the autocorrection of flowsheets. The input to the model is a potentially erroneous flowsheet and the output of the model are suggestions for a corrected flowsheet. We train our autocorrection model on a synthetic dataset in a supervised manner. The model achieves a top-1 accuracy of 80% and a top-5 accuracy of 84% on an independent test dataset of synthetically generated flowsheets. The results suggest that the model can learn to autocorrect the synthetic flowsheets. We envision that flowsheet autocorrection will become a useful tool for chemical engineers.

What problem does this paper attempt to address?

This paper attempts to solve the problem of errors in chemical process flowcharts (flowsheets). Specifically, the following types of errors may exist in Process Flow Diagrams (PFDs) and Process and Instrumentation Diagrams (P&IDs): - Missing or mis - placed components - Incorrect signal or flow connections - Missing or mis - placed subsystems These errors may lead to serious safety hazards, development delays, low operational efficiency, and unnecessary expenses. Currently, correcting and validating these flowcharts is a cumbersome and manual process. To solve this problem, the author proposes a new method based on large - language models (LLMs) for automatically identifying and correcting errors in flowcharts. This method takes the potentially erroneous flowchart as input and outputs a corrected flowchart suggestion. In this way, researchers hope to achieve the automatic error - correction function of flowcharts, similar to the function of text - grammar automatic - correction tools. ### Core ideas of the solution 1. **Problem representation**: Represent the flowchart as a string using SFILES 2.0 notation. 2. **Model training**: Train a sequence - to - sequence Transformer model on a synthetic dataset to learn the transformation from an incorrect flowchart to a correct one. 3. **Model application**: Generate correction suggestions by comparing the model's input and output. ### Advantages of the method - **Learning of complex error patterns**: The model can learn complex error patterns from data, not just errors of individual components. - **Global optimization**: It can detect and correct errors involving multiple components, not just individual - component errors. - **Efficient processing**: Compared with the traditional component - by - component analysis method, this method can handle the entire flowchart more efficiently. ### Experimental results The top - 1 accuracy of this model on an independent test dataset is 80.1%, and the top - 5 accuracy is 83.6%. Experiments show that the model can successfully add missing components/connections, remove incorrect components/connections, and even re - arrange components. ### Future work Although the preliminary results show good performance, this research still faces some challenges and improvement directions: - Add more information (such as physical/engineering knowledge) to the model. - Use industrial actual - data for training to improve the model's practicality and relevance. - Explore other model architectures, such as diffusion models, variational auto - encoders, etc. - Integrate rule - based methods with data - driven methods to improve the model's accuracy. In conclusion, this research aims to enable chemical engineers to design and validate chemical process flowcharts more efficiently and accurately by introducing advanced machine - learning techniques.

Toward autocorrection of chemical process flowsheets using large language models

Data augmentation for machine learning of chemical process flowsheets

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

Validation of the Scientific Literature via Chemputation Augmented by Large Language Models

Flowsheet synthesis through hierarchical reinforcement learning and graph neural networks

AutoFlow: Automated Workflow Generation for Large Language Model Agents

Graph-to-SFILES: Control structure prediction from process topologies using generative artificial intelligence

GPT Prompt Engineering for a Large Language Model-Based Process Improvement Generation System

Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Exploring the Use of Large Language Models (LLMs) in Chemical Engineering Education: Building Core Course Problem Models

Assessing Student Errors in Experimentation Using Artificial Intelligence and Large Language Models: A Comparative Study with Human Raters

Enhancing Large Language Model Comprehension of Material Phase Diagrams through Prompt Engineering and Benchmark Datasets

Towards automatic generation of Piping and Instrumentation Diagrams (P&IDs) with Artificial Intelligence

Large Language Models Cannot Self-Correct Reasoning Yet

FlowMind: Automatic Workflow Generation with LLMs

Chat-Microreactor: A Large-Language-Model-Based Assistant for Designing Continuous Flow Systems

Using artificial intelligence to find design errors in the engineering drawings

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Repairing Bugs in Python Assignments Using Large Language Models

Automated Theorem Provers Help Improve Large Language Model Reasoning