Uncovering the Impact of Spectroscopic Data Reduction Techniques on the Process Control Mode Pattern Recognition: The Case of Industrial Penicillin Production

vishakha vishakha,Josef Jancar
DOI: https://doi.org/10.26434/chemrxiv-2024-c1b8w
2024-02-09
Abstract:Process Analytical Technologies (PAT) often rely on real-time spectroscopy, allowing for fast-paced process control and monitoring. However, the data generated from real-time spectroscopy for long-running process systems can result in excessively large databases, which can be challenging to manage and may not necessarily lead to better process control. Therefore, it is crucial to reduce the amount of data generated by real-time spectroscopy while still retaining the essential information needed for process control. This work explores various data reduction techniques to address this issue. IndPenSim, a simulated spectroscopic probing dataset, was used as an oracle model to study the impact of data reduction techniques on the resulting process control identification. For analysis, the data pipeline consists of using principal component analysis (PCA) for visualization, followed by truncation and pre-processing (e.g. baseline corrections). Moreover, we have discussed the impact of data size reduction techniques (e.g. spectral data column selection, data binning, and region of interest (ROI), etc.) on the different chemometric models (e.g. PCA, PLS-DA, SIMCA, and KNN, etc.). Finally, the study examined the impact of data reduction on the control strategy for a realistic industrial fed-batch penicillin simulator. The multi-class classification performance was analyzed, and the results were interpreted to determine the best approach for controlling the process. Overall, the study provides valuable insights into data reduction techniques for real-time spectroscopy in PAT, which can improve the efficiency and accuracy of process control and monitoring.
Chemistry
What problem does this paper attempt to address?