Decoupled Sequence and Structure Generation for Realistic Antibody Design

Nayoung Kim,Minsu Kim,Sungsoo Ahn,Jinkyoo Park
2024-05-27
Abstract:ntibody design plays a pivotal role in advancing therapeutics. Although deep learning has made rapid progress in this field, existing methods jointly generate antibody sequences and structures, limiting task-specific optimization. In response, we propose an antibody sequence-structure decoupling (ASSD) framework, which separates sequence generation and structure prediction. Although our approach is simple, such a decoupling strategy has been overlooked in previous works. We also find that the widely used non-autoregressive generators promote sequences with overly repeating tokens. Such sequences are both out-of-distribution and prone to undesirable developability properties that can trigger harmful immune responses in patients. To resolve this, we introduce a composition-based objective that allows an efficient trade-off between high performance and low token repetition. Our results demonstrate that ASSD consistently outperforms existing antibody design models, while the composition-based objective successfully mitigates token repetition of non-autoregressive models. Our code is available at \url{this https URL}.
Quantitative Methods,Machine Learning
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are two key issues in existing antibody design methods: 1. **Limitations of joint sequence - structure generation**: Existing antibody design methods usually generate the sequence and structure of antibodies simultaneously. This joint generation method restricts the optimization of the model architecture for specific tasks and may hinder higher sequence - structure modeling performance. 2. **The problem of repeated tokens in non - autoregressive models**: Recent studies have shown that using non - autoregressive models for sequence generation can achieve faster inference speed and better performance. However, these models tend to generate overly repeated amino acid types (for example, the natural sequence ARMGSDYDVWFDY and the non - autoregressive prediction TRYYYYYYYYYDY). Such sequences are not only out of distribution but may also lead to undesirable development properties, such as aggregation, thus causing harmful immune responses in patients. To solve these problems, the authors propose an Antibody Sequence - Structure Decoupling (ASSD) framework, which separates sequence generation and structure prediction into two independent steps. In addition, they introduce a composition - based training objective to solve the problem of repeated tokens in non - autoregressive models. In this way, the ASSD framework not only improves the performance of antibody design models but also effectively reduces the repeated tokens in the generated sequences.