Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model

Hongtai Jing,Zhengtao Gao,Sheng Xu,Tao Shen,Zhangzhi Peng,Shwai He,Tao You,Shuang Ye,Wei Lin,Siqi Sun
2023-08-31
Abstract:In recent decades, antibodies have emerged as indispensable therapeutics for combating diseases, particularly viral infections. However, their development has been hindered by limited structural information and labor-intensive engineering processes. Fortunately, significant advancements in deep learning methods have facilitated the precise prediction of protein structure and function by leveraging co-evolution information from homologous proteins. Despite these advances, predicting the conformation of antibodies remains challenging due to their unique evolution and the high flexibility of their antigen-binding regions. Here, to address this challenge, we present the Bio-inspired Antibody Language Model (BALM). This model is trained on a vast dataset comprising 336 million 40% non-redundant unlabeled antibody sequences, capturing both unique and conserved properties specific to antibodies. Notably, BALM showcases exceptional performance across four antigen-binding prediction tasks. Moreover, we introduce BALMFold, an end-to-end method derived from BALM, capable of swiftly predicting full atomic antibody structures from individual sequences. Remarkably, BALMFold outperforms those well-established methods like AlphaFold2, IgFold, ESMFold, and OmegaFold in the antibody benchmark, demonstrating significant potential to advance innovative engineering and streamline therapeutic antibody development by reducing the need for unnecessary trials.
Biomolecules
What problem does this paper attempt to address?
The paper aims to address the challenging issues in antibody structure and function prediction. Specifically: 1. **Challenges in Antibody Structure Prediction**: Despite significant progress in protein structure prediction in recent years, the unique evolutionary characteristics of antibodies and the high flexibility of their antigen-binding regions make accurate prediction of antibody structures still challenging. 2. **Reducing Experimental Costs and Time**: Current wet lab-based methods in antibody development are time-consuming and expensive. Using computational methods to predict antibody structure and function from sequences can significantly reduce the trial-and-error process during screening and characterization, thereby improving the efficiency of therapeutic antibody development. To address these issues, the research team proposed a Bio-inspired Antibody Language Model (BALM), which leverages a large amount of unannotated antibody sequence data for training and has demonstrated excellent performance in various antibody binding prediction tasks. Additionally, an end-to-end method based on BALM, called BALMFold, was introduced. BALMFold can quickly and accurately predict the full-atom structure of a single sequence and has outperformed existing methods such as AlphaFold2 and IgFold in multiple benchmark tests. Through these methods, the research team hopes to achieve significant breakthroughs in antibody engineering and therapeutic antibody development, reduce unnecessary experimental trial and error, and accelerate the process of new drug development.