Hyperbox Mixture Regression for Process Performance Prediction in Antibody Production

Ali Nik-Khorasani,Thanh Tung Khuat,Bogdan Gabrys
2024-11-03
Abstract:This paper addresses the challenges of predicting bioprocess performance, particularly in monoclonal antibody (mAb) production, where conventional statistical methods often fall short due to time-series data's complexity and high dimensionality. We propose a novel Hyperbox Mixture Regression (HMR) model which employs hyperbox-based input space partitioning to enhance predictive accuracy while managing uncertainty inherent in bioprocess data. The HMR model is designed to dynamically generate hyperboxes for input samples in a single-pass process, thereby improving learning speed and reducing computational complexity. Our experimental study utilizes a dataset that contains 106 bioreactors. This study evaluates the model's performance in predicting critical quality attributes in monoclonal antibody manufacturing over a 15-day cultivation period. The results demonstrate that the HMR model outperforms comparable approximators in accuracy and learning speed and maintains interpretability and robustness under uncertain conditions. These findings underscore the potential of HMR as a powerful tool for enhancing predictive analytics in bioprocessing applications.
Machine Learning,Computational Engineering, Finance, and Science,Quantitative Methods
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to address the challenges in performance prediction during monoclonal antibody (mAb) production. Specifically: 1. **Complex time - series data**: Traditional statistical methods often perform poorly when dealing with high - dimensional time - series data because these data have complex internal correlations, leading to inaccurate prediction results. 2. **High - dimensional data space**: The monoclonal antibody production process involves a large number of input parameters, and the high - dimensional nature of these parameters makes it difficult for traditional methods to handle effectively. 3. **Model transparency and interpretability**: In industrial applications, especially in the biopharmaceutical field, models not only need high precision but also need to be able to explain their prediction results for practical operation and decision - making. 4. **Uncertainty management**: Biological process data usually contains measurement noise and other uncertain factors, so the model needs to be able to reason and interpret under uncertain conditions. To meet these challenges, the authors propose a new model based on Hyperbox Mixture Regression (HMR). The HMR model divides the input space by dynamically generating hyper - boxes, thereby improving prediction accuracy, learning speed, and maintaining model transparency and robustness. This model is particularly suitable for predicting Critical Quality Attributes (CQAs) and Key Performance Indicators (KPIs) in the monoclonal antibody production process. ### Main contributions 1. **Introduction of a new neuro - fuzzy model structure**: The HMR model combines hyper - box fuzzy sets and local linear regressors and can work effectively in high - dimensional data spaces while maintaining model transparency. 2. **Efficient one - pass learning process**: The HMR model uses a one - pass input data transfer process for feature space partitioning, significantly improving the learning speed. 3. **Dynamically weighted combination of local linear regressors**: By dynamically weighting and combining the local linear regressors associated with each hyper - box, the accuracy of the model is improved and the network complexity is reduced. 4. **Normalization layer**: A normalization layer is introduced to reduce the risk of numerical instability in subsequent layers. 5. **Performance prediction for the next two days**: The HMR model can predict key performance indicators in the monoclonal antibody production process for the next two days, such as Viable Cell Density (VCD) and mAb concentration. ### Experimental results The experimental results show that the HMR model outperforms the existing ANFIS Hybrid Learning (HL) and Fuzzy Neural Network (FNN) Back - Propagation (BP) algorithms in both high - dimensional and low - dimensional scenarios. Specifically: - **High - dimensional scenario**: When using all 23 input features, the HMR model significantly outperforms the ANFIS HL and FNN BP algorithms, especially with a lower Root Mean Square Error (RMSE) on the test set. - **Feature selection**: Through the feature selection algorithm, the HMR model can further improve the prediction performance while reducing the number of input features and the model complexity. In conclusion, the HMR model proposed in this paper provides an efficient, transparent, and robust solution for performance prediction in the monoclonal antibody production process.