Accurate prediction of antibody deamidations by combining high-throughput automated peptide mapping and protein language model-based deep learning

Ben Niu,Benjamin Lee,Lili Wang,Wen Chen,Jeffrey Johnson
DOI: https://doi.org/10.26434/chemrxiv-2024-bf6pw
2024-05-31
Abstract:Therapeutic antibodies such as monoclonal antibodies (mAbs), bispecific and multispecific antibodies are pivotal in therapeutic protein development and have transformed disease treatments across various therapeutic areas. The integrity of therapeutic antibodies, however, is compromised by sequence liabilities, notably deamidation, where asparagine (N) and glutamine (Q) residues undergo chemical degradations. Deamidation negatively impacts the efficacy, stability, and safety of diverse classes of antibodies, thus necessitating the critical need for early and accurate identification of vulnerable sites. In this article, a comprehensive antibody deamidation-specific dataset (n = 2285) of varied modalities was created by using high- throughput automated peptide mapping, followed by supervised machine learning to predict the deamidation propensities as well as extents throughout the entire antibody sequences. We propose a novel chimeric deep-learning model, integrating protein language model (pLM)- derived embeddings with local sequence information for enhanced deamidation predictions. Remarkably, this model requires only sequence inputs, eliminating the need for laborious feature engineering. Our approach demonstrates state-of-the-art performance, offering a streamlined workflow for high-throughput automated peptide mapping and deamidation prediction, with potential of broader applicability to other antibody sequence liabilities.
Chemistry
What problem does this paper attempt to address?
This paper focuses on the issue of antibody deamidation, which is an important chemical degradation process that affects the stability and efficacy of therapeutic antibodies. The researchers created a comprehensive dataset of antibody deamidation specificity through high-throughput automated peptide mapping, and used supervised machine learning to predict the deamidation tendency and extent in the entire antibody sequence. They proposed an innovative deep learning model that combines embeddings derived from protein language models (pLM) and local sequence information to enhance the prediction capability of deamidation. This model only requires sequence input, simplifying the workflow, improving prediction efficiency, and may be applicable to other antibody sequence problems. Through this approach, they were able to achieve early and accurate identification of deamidation sites, thereby reducing risk and accelerating the drug development process.