Emily K Makowski,Hsin-Ting Chen,Tiexin Wang,Lina Wu,Jie Huang,Marissa Mock,Patrick Underhill,Emma Pelegri-O'Day,Erick Maglalang,Dwight Winters,Peter M Tessier,Emily K. MakowskiHsin-Ting ChenTiexin WangLina WuJie HuangMarissa MockPatrick UnderhillEmma Pelegri-O'DayErick MaglalangDwight WintersPeter M. Tessiera Department of Pharmaceutical Sciences,University of Michigan,Ann Arbor,MI,USAb Biointerfaces Institute,University of Michigan,Ann Arbor,MI,USAc Department of Chemical Engineering,University of Michigan,Ann Arbor,MI,USAd Therapeutic Discovery,Research,Amgen Inc,Thousand Oaks,CA,USAe Department of Chemical and Biological Engineering,Rensselaer Polytechnic Institute,Troy,NY,USAf Drug Product Technologies,Amgen Inc,Thousand Oaks,CA,USAg Department of Biomedical Engineering,University of Michigan,Ann Arbor,MI,USA

Abstract:Early identification of antibody candidates with drug-like properties is essential for simplifying the development of safe and effective antibody therapeutics. For subcutaneous administration, it is important to identify candidates with low self-association to enable their formulation at high concentration while maintaining low viscosity, opalescence, and aggregation. Here, we report an interpretable machine learning model for predicting antibody (IgG1) variants with low viscosity using only the sequences of their variable (Fv) regions. Our model was trained on antibody viscosity data (>100 mg/mL mAb concentration) obtained at a common formulation pH (pH 5.2), and it identifies three key Fv features of antibodies linked to viscosity, namely their isoelectric points, hydrophobic patch sizes, and numbers of negatively charged patches. Of the three features, most predicted antibodies at risk for high viscosity, including antibodies with diverse antibody germlines in our study (79 mAbs) as well as clinical-stage IgG1s (94 mAbs), are those with low Fv isoelectric points (Fv pIs < 6.3). Our model identifies viscous antibodies with relatively high accuracy not only in our training and test sets, but also for previously reported data. Importantly, we show that the interpretable nature of the model enables the design of mutations that significantly reduce antibody viscosity, which we confirmed experimentally. We expect that this approach can be readily integrated into the drug development process to reduce the need for experimental viscosity screening and improve the identification of antibody candidates with drug-like properties.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to identify monoclonal antibodies with drug - like properties at an early stage, especially those suitable for subcutaneous administration and able to maintain low viscosity, transparency and low aggregation at high concentrations**. Specifically, the authors developed an interpretable machine - learning model to predict the viscosity of monoclonal antibodies (IgG1) and design mutations that can significantly reduce antibody viscosity by using only their variable region (Fv) sequences. ### Problem Background 1. **The Need for High - Concentration Antibody Formulations**: In order to simplify the subcutaneous administration of therapeutic antibodies, high - concentration antibody solutions need to be prepared. However, many antibodies exhibit problems such as high viscosity, opacity and aggregation at high concentrations, which will affect the stability of the drug and the administration effect. 2. **Limitations of Existing Methods**: - **Insufficient Data**: There is a lack of sufficient high - concentration antibody viscosity data, making it difficult to conduct reliable model training and testing. - **Complexity and Inaccessibility**: Some existing models require complex calculations or authorization and are difficult to be widely used. - **Difficult to Interpret**: Most machine - learning models are "black boxes", difficult to interpret their prediction results and not convenient for guiding the rational design of antibodies. - **Insufficient Validation**: Most models have not been validated with new mutations and can only predict unseen antibodies. ### Core Contributions of the Paper 1. **Large - Scale Data Set**: The currently largest set of high - concentration antibody viscosity measurement data sets (> 100 mg/mL) was used, including 62 antibodies for model training and 17 antibodies for testing. 2. **Simple and Interpretable Model**: A decision - tree - based classification model was developed, which can predict the viscosity level of antibodies only by requiring Fv amino acid sequences and homology modeling. The model is based on three key Fv features: - **Isoelectric Point (pI)**: Negatively correlated with viscosity. - **Hydrophobic Patch Size**: Positively correlated with viscosity. - **Number of Negative - Charge Patches**: Negatively correlated with viscosity. 3. **Experimental Verification**: It was experimentally verified that the new mutations predicted by the model can indeed significantly reduce the viscosity of antibodies. 4. **Potential for Clinical Application**: This model can be integrated into the drug development process, reducing the need for experimental viscosity screening and improving the efficiency of identifying antibody candidates with drug - like properties. ### Key Findings - **Influence of Isoelectric Point**: Antibodies with Fv isoelectric points lower than 6.3 are more likely to exhibit high viscosity. - **Influence of Hydrophobic Patch**: Larger hydrophobic patches will lead to higher viscosity. - **Influence of Negative - Charge Patch**: More negative - charge patches help to reduce viscosity. - **Four Types of Antibody Behaviors**: According to the Fv isoelectric point, hydrophobic patch size and number of negative - charge patches, antibodies are divided into four types (Type I, II, III, IV), and each type has different viscosity behaviors. ### Summary This paper successfully solved the problem of high - concentration monoclonal antibody viscosity prediction by developing an interpretable machine - learning model, and provided new tools and methods for antibody engineering. This result is expected to accelerate the development process of antibody drugs and improve the safety and effectiveness of subcutaneous administration.

Reduction of monoclonal antibody viscosity using interpretable machine learning

Reconciling predicted and measured viscosity parameters in high concentration antibody solutions

Machine learning prediction of antibody aggregation and viscosity for high concentration formulation development of protein therapeutics

Sequence-Based Viscosity Prediction for Rapid Antibody Engineering

Enhancing viscosity control in antibody formulations: A framework for the biophysical screening of mutations targeting solvent-accessible hydrophobic and electrostatic patches

Variable domain mutational analysis to probe the molecular mechanisms of high viscosity of an IgG1 antibody

Modeling and mitigation of high-concentration antibody viscosity through structure-based computer-aided protein design

Reduction of therapeutic antibody self-association using yeast-display selections and machine learning

Biophysical Determinants for the Viscosity of Concentrated Monoclonal Antibody Solutions

ProtT5 and random forests-based viscosity prediction method for therapeutic mAbs

Leveraging Multi-Modal Feature Learning for Predictions of Antibody Viscosity

Poly(glutamic acid)-Based Viscosity Reducers for Concentrated Formulations of a Monoclonal IgG Antibody

Leveraging high-throughput analytics and automation to rapidly develop high-concentration mAb formulations: integrated excipient compatibility and viscosity screening

Viscosity-Lowering Effect of Amino Acids and Salts on Highly Concentrated Solutions of two IgG1 Monoclonal Antibodies.

Predicting Colloidal Stability of High-Concentration Monoclonal Antibody Formulations in Common Pharmaceutical Buffers Using Improved Polyethylene Glycol Induced Protein Precipitation Assay.

Explainable Machine Learning Models to Predict Gibbs–Donnan Effect During Ultrafiltration and Diafiltration of High‐Concentration Monoclonal Antibody Formulations

DeepSP: Deep Learning-Based Spatial Properties to Predict Monoclonal Antibody Stability

Modeling the impact of amino acid substitution in a monoclonal antibody on cation exchange chromatography

DeepSCM: an efficient convolutional neural network surrogate model for the screening of therapeutic antibody viscosity

Using Cluster Theory to Calculate the Experimental Structure Factors of Antibody Solutions

AB-Amy: machine learning aided Amyloidogenic risk prediction of therapeutic antibody light chains