Abstract:Biopharmaceutical products, particularly monoclonal antibodies (mAbs), have gained prominence in the pharmaceutical market due to their high specificity and efficacy. As these products are projected to constitute a substantial portion of global pharmaceutical sales, the application of machine learning models in mAb development and manufacturing is gaining momentum. This paper addresses the critical need for uncertainty quantification in machine learning predictions, particularly in scenarios with limited training data. Leveraging ensemble learning and Monte Carlo simulations, our proposed method generates additional input samples to enhance the robustness of the model in small training datasets. We evaluate the efficacy of our approach through two case studies: predicting antibody concentrations in advance and real-time monitoring of glucose concentrations during bioreactor runs using Raman spectra data. Our findings demonstrate the effectiveness of the proposed method in estimating the uncertainty levels associated with process performance predictions and facilitating real-time decision-making in biopharmaceutical manufacturing. This contribution not only introduces a novel approach for uncertainty quantification but also provides insights into overcoming challenges posed by small training datasets in bioprocess development. The evaluation demonstrates the effectiveness of our method in addressing key challenges related to uncertainty estimation within upstream cell cultivation, illustrating its potential impact on enhancing process control and product quality in the dynamic field of biopharmaceuticals.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to quantify the uncertainty of machine - learning predictions during cell - culture processes, especially in the case of limited training data. Specifically, the paper focuses on how to improve prediction performance and the ability to monitor process parameters in real - time through ensemble learning and Monte Carlo sampling methods in biopharmaceutical production, especially in the development and manufacturing process of monoclonal antibodies (mAbs). The method proposed in the paper aims to generate additional input samples to enhance model robustness in small - scale training datasets, and the effectiveness of the method has been verified through two case studies: 1. **Predicting antibody concentration one day in advance**: Use the current offline measurement values as input features to predict the antibody concentration one day in the future. 2. **Real - time monitoring of glucose concentration**: Use Raman spectroscopy data as input features to monitor the glucose concentration in real - time during the operation of the bioreactor. ### Main contributions 1. **Proposed a general framework**: Combine ensemble learning and Monte Carlo sampling to evaluate the uncertainty level of each prediction value, especially suitable for the case of small - scale training data. 2. **Applied case studies**: Verified the effectiveness of the method through two specific challenges (predicting antibody concentration in advance and real - time monitoring of glucose concentration). ### Method overview The method proposed in the paper includes the following steps: 1. **Generate synthetic training sets**: - Use the Monte Carlo sampling method to generate random values for each input feature and target variable based on the actual values and the coefficient of variation. - Generate \( N \) synthetic training sets, and each training set is used to train a base regressor. 2. **Construct an ensemble model**: - Train \( N \) base regressors, and each base regressor uses a synthetic training set. - For a new test sample \( X_T \), calculate the average value \( \hat{y}(X_T) \) and the standard deviation \( \sigma(X_T) \) of the prediction values of the \( N \) base regressors. 3. **Evaluate prediction uncertainty**: - Use the mean absolute error (MAE) as a performance indicator to evaluate the prediction performance of different models. - For models that return the standard deviation of the prediction values, calculate the MAE values of the upper bound \( \hat{y}+ 2\sigma \) and the lower bound \( \hat{y}- 2\sigma \). ### Experimental results The experimental results show that the proposed integrated SVR model is superior to the single SVR model in prediction performance. However, the performance of the integrated PLSR model is comparable to that of the single PLSR model. Compared with the Gaussian process (GP) model, although the GP model has the best prediction performance, its predicted uncertainty level is higher. This indicates that in practical applications, it is necessary to comprehensively consider the prediction performance and the uncertainty level. ### Conclusion The framework proposed in the paper not only provides an effective method to quantify the uncertainty of machine - learning predictions, but also provides new ideas for solving the challenges in small - scale training datasets. This is of great significance for improving the control of the biopharmaceutical production process and product quality.

Uncertainty Quantification Using Ensemble Learning and Monte Carlo Sampling for Performance Prediction and Monitoring in Cell Culture Processes

Predictive modeling for cell culture in commercial manufacturing of biotherapeutics

Pharmaceutical Process Optimisation: Decision Support under High Uncertainty

Hyperbox Mixture Regression for Process Performance Prediction in Antibody Production

Enhancing Decision Confidence in AI using Monte Carlo Dropout for Raman Spectra Classification

Application of advanced machine learning algorithms for anomaly detection and quantitative prediction in protein A chromatography

Results of carotid surgery in elderly patients.

Towards Robust Hemolysis Modeling with Uncertainty Quantification: A Universal Approach to Address Experimental Variance

Omics-driven hybrid dynamic modeling of bioprocesses with uncertainty estimation

Machine learning for classification and quantification of monoclonal antibody preparations for cancer therapy

Improving N-Glycosylation and Biopharmaceutical Production Predictions Using AutoML-Built Residual Hybrid Models

Uncertainty Quantification in Multivariate Mixed Models for Mass Cytometry Data

Data Augmentation to Support Biopharmaceutical Process Development through Digital Models—A Proof of Concept

Data-driven and Physics Informed Modelling of Chinese Hamster Ovary Cell Bioreactors

Applications of Machine Learning in Biopharmaceutical Process Development and Manufacturing: Current Trends, Challenges, and Opportunities

A multiscale hybrid modelling methodology for cell cultures enabled by enzyme-constrained dynamic metabolic flux analysis under uncertainty

Endoscopic therapy for high-grade dysplasia in Barrett's esophagus: ablate, resect, or both?

Quantification of Deep Neural Network Prediction Uncertainties for VVUQ of Machine Learning Models

Calibrating Ensembles for Scalable Uncertainty Quantification in Deep Learning-based Medical Segmentation

A Data-Driven Approach for Leveraging Inline and Offline Data to Determine the Causes of Monoclonal Antibody Productivity Reduction in the Commercial-Scale Cell Culture Process

Data-driven parameterization and development of mechanistic cell cultivation models in monoclonal antibody production processes: Shifts in cell metabolic behavior