Pretraining Transformers for TCR-pMHC Binding Prediction.

Jinsheng Shang,Qihong Jiao,Cheng Chen,Daming Zhu,Xuefeng Cui
DOI: https://doi.org/10.1109/BIBM55620.2022.9994875
2022-01-01
Abstract:The knowledge concerning antigen presentation by the main histocompatibility complex (MHC) to T-cell receptor (TCR) and TCR binding specificity can facilitate the application of T-cell immunity in modern medicine, such as tumor immunotherapy and drug and vaccine design cases. With the development of high-throughput sequencing technology and artificial intelligence, data-driven approaches can be employed to help understand the rules of TCR-pMHC binding. Simulating the biological binding process of TCRs and pMHCs, we propose a novel pipeline, pMTattn, using transfer learning based on an attention mechanism for TCR-pMHC binding prediction. During the pretraining stage, partner-specific training strategies can capture useful local binding features. In the fine-tuning stage, an attention block is employed to aggregate the TCR encoding and pMHC encoding information, forming a better global TCR-pMHC representation. Visualization experiments indicate that the pMTattn model focuses more on the voxels near the binding sites of pMHCs and TCRs. This key observation effectively supports our hypothesis that attention is critical for TCR-pMHC binding prediction. In addition, on an independent test set, the area under the precision-recall curve (AUPR) and the area under the receiver operating characteristic curve (AUC) are improved from 0.533 to 0.583 and from 0.830 to 0.866, respectively, by pMTattn compared to those of the state-of-the-art model. Simultaneously, we also explore the influences of different sequence lengths and dataset differences on the model effect, and pMTattn exhibits better robustness than other models. These results suggest that pMTattn has the ability to be used as an adjunct tool for screening and discovering neoantigens.
What problem does this paper attempt to address?