Toward autocorrection of chemical process flowsheets using large language models

Lukas Schulze Balhorn,Marc Caballero,Artur M. Schweidtmann
2023-12-06
Abstract:The process engineering domain widely uses Process Flow Diagrams (PFDs) and Process and Instrumentation Diagrams (P&IDs) to represent process flows and equipment configurations. However, the P&IDs and PFDs, hereafter called flowsheets, can contain errors causing safety hazards, inefficient operation, and unnecessary expenses. Correcting and verifying flowsheets is a tedious, manual process. We propose a novel generative AI methodology for automatically identifying errors in flowsheets and suggesting corrections to the user, i.e., autocorrecting flowsheets. Inspired by the breakthrough of Large Language Models (LLMs) for grammatical autocorrection of human language, we investigate LLMs for the autocorrection of flowsheets. The input to the model is a potentially erroneous flowsheet and the output of the model are suggestions for a corrected flowsheet. We train our autocorrection model on a synthetic dataset in a supervised manner. The model achieves a top-1 accuracy of 80% and a top-5 accuracy of 84% on an independent test dataset of synthetically generated flowsheets. The results suggest that the model can learn to autocorrect the synthetic flowsheets. We envision that flowsheet autocorrection will become a useful tool for chemical engineers.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the problem of errors in chemical process flowcharts (flowsheets). Specifically, the following types of errors may exist in Process Flow Diagrams (PFDs) and Process and Instrumentation Diagrams (P&IDs): - Missing or mis - placed components - Incorrect signal or flow connections - Missing or mis - placed subsystems These errors may lead to serious safety hazards, development delays, low operational efficiency, and unnecessary expenses. Currently, correcting and validating these flowcharts is a cumbersome and manual process. To solve this problem, the author proposes a new method based on large - language models (LLMs) for automatically identifying and correcting errors in flowcharts. This method takes the potentially erroneous flowchart as input and outputs a corrected flowchart suggestion. In this way, researchers hope to achieve the automatic error - correction function of flowcharts, similar to the function of text - grammar automatic - correction tools. ### Core ideas of the solution 1. **Problem representation**: Represent the flowchart as a string using SFILES 2.0 notation. 2. **Model training**: Train a sequence - to - sequence Transformer model on a synthetic dataset to learn the transformation from an incorrect flowchart to a correct one. 3. **Model application**: Generate correction suggestions by comparing the model's input and output. ### Advantages of the method - **Learning of complex error patterns**: The model can learn complex error patterns from data, not just errors of individual components. - **Global optimization**: It can detect and correct errors involving multiple components, not just individual - component errors. - **Efficient processing**: Compared with the traditional component - by - component analysis method, this method can handle the entire flowchart more efficiently. ### Experimental results The top - 1 accuracy of this model on an independent test dataset is 80.1%, and the top - 5 accuracy is 83.6%. Experiments show that the model can successfully add missing components/connections, remove incorrect components/connections, and even re - arrange components. ### Future work Although the preliminary results show good performance, this research still faces some challenges and improvement directions: - Add more information (such as physical/engineering knowledge) to the model. - Use industrial actual - data for training to improve the model's practicality and relevance. - Explore other model architectures, such as diffusion models, variational auto - encoders, etc. - Integrate rule - based methods with data - driven methods to improve the model's accuracy. In conclusion, this research aims to enable chemical engineers to design and validate chemical process flowcharts more efficiently and accurately by introducing advanced machine - learning techniques.