ADCNet: a unified framework for predicting the activity of antibody-drug conjugates

Liye Chen,Biaoshun Li,Yihao Chen,Mujie Lin,Shipeng Zhang,Chenxin Li,Yu Pang,Ling Wang
2024-01-17
Abstract:Antibody-drug conjugate (ADC) has revolutionized the field of cancer treatment in the era of precision medicine due to their ability to precisely target cancer cells and release highly effective drug. Nevertheless, the realization of rational design of ADC is very difficult because the relationship between their structures and activities is difficult to understand. In the present study, we introduce a unified deep learning framework called ADCNet to help design potential ADCs. The ADCNet highly integrates the protein representation learning language model ESM-2 and small-molecule representation learning language model FG-BERT models to achieve activity prediction through learning meaningful features from antigen and antibody protein sequences of ADC, SMILES strings of linker and payload, and drug-antibody ratio (DAR) value. Based on a carefully designed and manually tailored ADC data set, extensive evaluation results reveal that ADCNet performs best on the test set compared to baseline machine learning models across all evaluation metrics. For example, it achieves an average prediction accuracy of 87.12%, a balanced accuracy of 0.8689, and an area under receiver operating characteristic curve of 0.9293 on the test set. In addition, cross-validation, ablation experiments, and external independent testing results further prove the stability, advancement, and robustness of the ADCNet architecture. For the convenience of the community, we develop the first online platform (
Machine Learning
What problem does this paper attempt to address?
The paper aims to address the challenge of rational design of Antibody-Drug Conjugates (ADCs), especially in the era of precision medicine, where ADCs have revolutionized the field of cancer treatment due to their ability to precisely target cancer cells and release potent drugs. However, the complex relationship between the structure and activity of ADCs makes their rational design extremely difficult. To tackle this issue, the paper introduces a unified deep learning framework—ADCNet, for predicting the activity of ADCs to assist in their design. ADCNet integrates the protein representation learning language model ESM-2 and the small molecule representation learning language model FG-BERT, achieving activity prediction by learning meaningful features from the antigen and antibody protein sequences of ADCs, linkers, and the SMILES strings of the payloads, as well as the drug-to-antibody ratio (DAR). Based on a carefully designed and manually curated ADC dataset, extensive evaluation results show that ADCNet outperforms baseline machine learning models on the test set, achieving the best performance across all metrics, such as an average prediction accuracy of 87.12%, a balanced accuracy of 0.8689, and an area under the receiver operating characteristic curve (AUC) of 0.9293. Furthermore, cross-validation, ablation experiments, and external independent testing results further demonstrate the stability, advancement, and robustness of the ADCNet architecture. To facilitate community use, the research team also developed the first online platform (https://ADCNet.idruglab.cn), based on the optimal ADCNet model to predict the activity of ADCs, and made the source code public (https://github.com/idrugLab/ADCNet). The paper details the methods for dataset preparation, label setting, model architecture and training strategy, model performance evaluation, and web server implementation, showcasing the superior performance of ADCNet in predicting ADC activity and providing a powerful tool for the rational design of ADCs.