Energy-based generative models for monoclonal antibodies

Paul Pereira,Hervé Minoux,Aleksandra M. Walczak,Thierry Mora
2024-11-20
Abstract:Since the approval of the first antibody drug in 1986, a total of 162 antibodies have been approved for a wide range of therapeutic areas, including cancer, autoimmune, infectious, or cardiovascular diseases. Despite advances in biotechnology that accelerated the development of antibody drugs, the drug discovery process for this modality remains lengthy and costly, requiring multiple rounds of optimizations before a drug candidate can progress to preclinical and clinical trials. This multi-optimization problem involves increasing the affinity of the antibody to the target antigen while refining additional biophysical properties that are essential to drug development such as solubility, thermostability or aggregation propensity. Additionally, antibodies that resemble natural human antibodies are particularly desirable, as they are likely to offer improved profiles in terms of safety, efficacy, and reduced immunogenicity, further supporting their therapeutic potential. In this article, we explore the use of energy-based generative models to optimize a candidate monoclonal antibody. We identify tradeoffs when optimizing for multiple properties, concentrating on solubility, humanness and affinity and use the generative model we develop to generate candidate antibodies that lie on an optimal Pareto front that satisfies these constraints.
Biomolecules
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the multi - objective optimization challenges encountered in the development process of monoclonal antibody drugs. Specifically, the author focuses on how to optimize candidate monoclonal antibodies through generative models to achieve the best balance among multiple biophysical properties (such as solubility, human - origin and binding affinity). ### Problem Background Since the approval of the first antibody drug in 1986, 162 antibody drugs have been approved for the treatment of various diseases, including cancer, autoimmune diseases, infectious diseases and cardiovascular diseases. Although the progress of biotechnology has accelerated the development of antibody drugs, the discovery process of antibody drugs is still long and costly, and requires multiple rounds of optimization before entering the pre - clinical and clinical trial stages. During this process, researchers need to optimize the affinity of antibodies to target antigens while improving other key biophysical properties, such as solubility, thermal stability or aggregation tendency. In addition, it is highly desirable to make antibodies as close to natural human antibodies as possible, because this can improve the safety and effectiveness of drugs and reduce immunogenicity, thereby enhancing their therapeutic potential. ### Core Problems of the Paper The paper aims to explore the use of energy - based generative models to optimize candidate monoclonal antibodies to address the above - mentioned multi - objective optimization problems. Specifically: - **Multi - objective Optimization**: The paper discusses the trade - offs when optimizing multiple attributes, especially solubility, human - origin and affinity. - **Pareto Optimal Solutions**: Researchers hope to find candidate antibodies located on the Pareto frontier, which can achieve the best balance among different attributes. - **Application of Generative Models**: Generate candidate antibodies that meet the above - mentioned constraints through generative models and verify whether these candidate antibodies can meet the requirements of actual drug development. ### Mathematical Formula Representation Some of the key formulas involved in the paper are as follows: 1. **Probability Distribution of the Generative Model**: \[ p(x)=\frac{1}{Z}p_{\text{HUM}}(x)e^{-E(x)/T} \] where, - \(p_{\text{HUM}}(x)\) is the natural antibody distribution, - \(E(x)=-w\hat{f}_{\text{aff}}(x)-(1 - w)\hat{f}_{\text{sol}}(x)\) is the energy function, - \(Z\) is the normalization factor, - \(T\) is the temperature parameter, - \(w\) is the weight between affinity and solubility. 2. **Distance to the Pareto Frontier**: \[ d_P(x)=\min_{x'\in PO}\left(\frac{(f_{\text{aff}}(x)-f_{\text{aff}}(x'))^2}{\sigma_{\text{aff}}^2}+\frac{(f_{\text{sol}}(x)-f_{\text{sol}}(x'))^2}{\sigma_{\text{sol}}^2}\right) \] where, - \(\sigma_{\text{aff}}^2\) and \(\sigma_{\text{sol}}^2\) are the variances of affinity and solubility respectively. 3. **Diversity Index**: \[ \text{Diversity}(D_{\text{gen}})=\frac{1}{N_{\text{gen}}(N_{\text{gen}}-1)}\sum_{(x,x')\in D_{\text{gen}}}d_H(x,x') \] where, - \(d_H(x,x')\) is the Hamming distance, - \(N_{\text{gen}}\) is the number of generated sequences. 4. **Novelty Index**: \[ \text{N}