DeepSP: Deep Learning-Based Spatial Properties to Predict Monoclonal Antibody Stability

Lateefat Kalejaye,I-En Wu,Taylor Terry,Pin-Kuang Lai
DOI: https://doi.org/10.1101/2024.02.28.582582
2024-03-03
Abstract:Therapeutic antibody development, manufacturing, and administration face challenges due to high viscosities and aggregation tendencies often observed in highly concentrated antibody solutions. This poses a particular problem for subcutaneous administration, which requires low-volume and high-concentration formulations. The spatial charge map (SCM (mAbs, 8 (1) (2015), pp. 43-48)) and spatial aggregation propensity (SAP (PNAS. 2009; 106:11937–42) are two computational techniques proposed from previous studies to aid in predicting viscosity and aggregation, respectively. These methods rely on structural data derived from molecular dynamics (MD) simulations, which are known to be time-consuming and computationally demanding. DeepSCM (CSBJ. 2022, 20:2143-2152), a deep learning surrogate model to predict SCM scores in the entire variable region, was used to screen high-concentration antibody viscosity. DeepSCM is solely based on sequence information, which facilitates high throughput screening. This study further utilized a dataset of 20,530 antibody sequences to train a convolutional neural network deep learning surrogate model called Deep Spatial Properties (DeepSP). DeepSP directly predicts SAP and SCM scores in different domains of antibody variable regions based solely on their sequences without performing MD simulations. The linear correlation coefficient (R) between DeepSP scores and MD-derived scores for 30 properties achieved values between 0.76 and 0.96 with an average of 0.87 on the test set (N=2053). DeepSP was employed as features to build machine learning models to predict the aggregation rate of 21 antibodies. We observed remarkable results with R = 0.97 and a mean squared error (MSE) of 0.03 between the experimental and predicted aggregation rates, leave-one-out cross-validation (LOOCV) yielded R = 0.75 and MSE = 0.18, which is similar to the results obtained from the previous study using MD simulations. This result demonstrates that the DeepSP approach significantly reduces the computational time required compared to MD simulations. The DeepSP model enables the rapid generation of 30 structural properties that can also be used as features in other research to train machine learning models for predicting various antibody properties, such as viscosity, aggregation, or other properties that can influence their stability, using sequences only. The code and parameters are freely available at
Biophysics
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve the challenges faced by high - concentration monoclonal antibody solutions in the process of drug development, production and administration, especially the problems caused by high viscosity and aggregation tendency. These properties are particularly unfavorable for subcutaneous administration, because subcutaneous administration requires low - volume and high - concentration formulations. Specifically, the paper proposes a deep - learning - based method - DeepSP (Deep Spatial Properties) for rapidly predicting 30 spatial properties of monoclonal antibodies. These properties are usually calculated by molecular dynamics (MD) simulations, which are time - consuming and computationally expensive. The DeepSP model can predict these spatial properties only using antibody sequence information, thus significantly reducing the computation time. ### Main contributions 1. **Rapid prediction of spatial properties**: The DeepSP model can rapidly predict 30 spatial properties only through antibody sequence information, and these properties usually need to be calculated by MD simulations. 2. **High accuracy**: The linear correlation coefficient (R) of the DeepSP model on the test set is between 0.76 and 0.96, with an average value of 0.87, indicating that the model has high prediction accuracy. 3. **Prediction of antibody aggregation rate**: Using the features generated by DeepSP, a machine - learning model was constructed and successfully predicted the aggregation rates of 21 antibodies. The correlation coefficient between the experimental values and the predicted values reached 0.97, and the mean - squared error (MSE) was 0.03. 4. **Accelerating drug development**: The DeepSP model can generate antibody - specific features for training other machine - learning models to predict various stability properties of antibodies, such as viscosity, aggregation, etc., thus accelerating the drug development process. ### Method overview 1. **Dataset and pre - processing**: Obtain 20,530 antibody sequences from the Observed Antibody Space (OAS) database and pre - process them to ensure the consistency of the input data. 2. **Molecular dynamics simulation**: Perform MD simulations on the variable regions of 20,530 antibodies to calculate 30 spatial properties. 3. **Deep - learning model**: Develop the DeepSP model using a convolutional neural network (CNN), with the pre - processed antibody sequence as input and 30 spatial properties as output. 4. **Model validation**: Use the aggregation rate data of 21 antibodies to validate the performance of the DeepSP model, and the results show that they are comparable to the MD simulation results. ### Conclusion The DeepSP model successfully predicted 30 spatial properties only by using antibody sequence information and performed excellently in predicting the antibody aggregation rate. This method significantly reduces the computation time and provides strong support for the drug development of high - concentration monoclonal antibodies.