AntiBARTy Diffusion for Property Guided Antibody Design

Jordan Venderley
2023-09-23
Abstract:Over the past decade, antibodies have steadily grown in therapeutic importance thanks to their high specificity and low risk of adverse effects compared to other drug modalities. While traditional antibody discovery is primarily wet lab driven, the rapid improvement of ML-based generative modeling has made in-silico approaches an increasingly viable route for discovery and engineering. To this end, we train an antibody-specific language model, AntiBARTy, based on BART (Bidirectional and Auto-Regressive Transformer) and use its latent space to train a property-conditional diffusion model for guided IgG de novo design. As a test case, we show that we can effectively generate novel antibodies with improved in-silico solubility while maintaining antibody validity and controlling sequence diversity.
Biomolecules,Machine Learning
What problem does this paper attempt to address?
The paper aims to address several key issues in antibody design, particularly in improving the physicochemical properties of antibodies (such as solubility). Specifically: 1. **Limitations of Traditional Methods**: Traditional antibody discovery mainly relies on phage display libraries or transgenic mouse hybridoma technology. While these methods are effective, they have limitations in terms of sequence diversity and high-throughput screening. 2. **Using Machine Learning for Antibody Design**: With the development of machine learning technologies, especially language models trained on large-scale sequence data, it is possible to some extent to compensate for the shortcomings of traditional methods. By constructing a powerful sequence prior distribution and performing conditional sampling based on it, new antibodies with specific biophysical properties can be generated. 3. **Specific Goals**: This paper proposes a method called AntiBARTy Diffusion. This method first trains an antibody-specific language model based on the BART architecture, and then trains a property-conditional diffusion model in the latent space of this model to achieve guided de novo design of IgG. Experimental results show that this method can effectively generate antibodies with improved simulated solubility characteristics while maintaining antibody efficacy and controlling sequence diversity.