Data cleaning method for the process of acid production with flue gas based on improved random forest

Xiaoli Li,Minghua Liu,Kang Wang,Zhiqiang Liu,Guihai Li
DOI: https://doi.org/10.1016/j.cjche.2022.12.013
IF: 3.8
2023-01-01
Chinese Journal of Chemical Engineering
Abstract:Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling. The operation data is an important basis for state monitoring, optimal control, and fault diagnosis. However, the operating environment of acid production with flue gas is complex and there is much equipment. The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers. Therefore, to solve the problem of abnormal data in the process of acid production with flue gas, a data cleaning method based on improved random forest is proposed. Firstly, an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset. Secondly, an improved random forest regression model is established. Genetic algorithm is used to optimize the hyperparameters of the random forest regression model. Then the optimal parameter combination is found in the search space and the trend of data is predicted. Finally, the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized. Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas. The method improves the accuracy of compensation for missing data. With the data after cleaning, a more accurate model can be established, which is significant to the subsequent temperature control. The conversion rate of SO2 can be further improved, thereby improving the yield of sulfuric acid and economic benefits.
engineering, chemical
What problem does this paper attempt to address?