Leveraging protein language and structural models for early prediction of antibodies with fast clearance

Parisa Mazrooei,Daniel O'Neil,Saeed Izadi,Bingyuan Chen,Saroja Ramanujan
DOI: https://doi.org/10.1101/2024.06.08.597997
2024-06-10
Abstract:Monoclonal antibodies (mAbs) with long systemic persistence are widely used as therapeutics. However, antibodies with atypically fast clearance require more dosing, limiting their clinical usefulness. Deep learning can facilitate using sequence-based modeling to predict potential pharmacokinetic (PK) liabilities before antibody generation. Assembling a dataset of 103 mAbs with measured nonspecific clearance in cynomolgus monkeys (cyno), and using transfer learning from large protein language models, we developed multiple machine learning models to predict mAb clearance as fast/slow clearing. Focusing on minimizing misclassification of potentially promising molecules as fast clearing, our results show that using physicochemical properties yielded up to 73.1+/-1.1% classification accuracy on hold-out test data (precision 65.2+/-2.3%). Using only sequence-based features from deep learning protein language models yielded a comparable performance of 71+/-1.4% (precision 65.5+/-2.5%). Combining structural and deep learning derived features yielded a similar accuracy of 73.9+/-1.1%, and slightly improved precision (68.3+/-2.4%). Features important for classifying fast/slow clearance point to charge, moment, and surface area properties at pH 7.4 as well as deep learning derived features. These results suggest that the protein language models provide comparable information and predictive performance of clearance as physicochemical features. This work provides a foundation for in silico prediction of protein pharmacokinetics to inform antibody candidate generation and early deprioritization of designs with high risk of fast clearance. More generally, it illustrates the value of transfer learning-based application of protein language models to address characteristics of importance for protein therapeutics.
Pharmacology and Toxicology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to predict the clearance rate of monoclonal antibodies (mAbs) through deep learning and protein language models before antibody generation, especially to early identify those antibodies with rapid clearance characteristics**. Specifically, the authors hope to develop a machine - learning model that can distinguish between rapidly - clearing and slowly - clearing antibodies based solely on antibody - sequence information, thereby reducing the waste of time and resources in pre - clinical research and development and improving the selection efficiency of potential therapeutic antibodies. ### Problem Background 1. **Clinical Applications of Monoclonal Antibodies** - Monoclonal antibodies (mAbs) are widely used as therapeutic drugs in clinical practice due to their long half - life and low clearance rate. - However, some antibodies show abnormally rapid clearance, which requires more frequent dosing and limits their clinical applications. 2. **Limitations of Existing Methods** - Current methods rely on in - vitro experiments and in - vivo animal experiments to evaluate the clearance characteristics of antibodies. These methods are time - consuming and costly. - Traditional methods can only be tested after antibody generation and cannot screen out potential molecules at an early stage. ### Solution The authors propose a new method based on deep learning and protein language models, aiming to predict the clearance characteristics of antibodies through antibody - sequence information. Specific steps include: 1. **Data Collection and Processing** - A dataset of 103 monoclonal antibodies was collected, and the non - specific clearance rates of these antibodies were measured in cynomolgus monkeys. - Antibodies were divided into two categories according to the clearance rate: rapid clearance (> 8 mL/day/kg) and slow clearance (≤8 mL/day/kg). 2. **Feature Extraction** - Physicochemical properties (such as charge, moment, surface area, etc.) were used as features. - A pre - trained protein language model (such as Deep Manifold Sampler, DMS) was used to generate an embedded representation of the antibody sequence. 3. **Model Development and Evaluation** - Multiple machine - learning models were developed, including models based on physicochemical features and models based on protein - language - model embeddings. - The performance of models with different feature combinations was compared, and the accuracy and precision were evaluated. 4. **Result Analysis** - The models of physicochemical features and protein - language - model embeddings showed comparable classification accuracy (about 73%), but combining the two can slightly improve the precision. - Analysis of important features shows that charge, moment, and surface properties have a significant impact on the prediction of clearance rate. ### Conclusion This study shows how to use deep learning and protein language models to predict the clearance characteristics of antibodies based on antibody - sequence information. This method can perform early screening before antibody generation, avoid unnecessary experiments, and improve research and development efficiency. Future work can further improve the prediction performance by increasing the amount of data and improving the model structure.