HeteroTCR: A heterogeneous graph neural network-based method for predicting peptide-TCR interaction

Zilan Yu,Mengnan Jiang,Xun Lan
DOI: https://doi.org/10.1038/s42003-024-06380-6
IF: 6.548
2024-06-05
Communications Biology
Abstract:Identifying interactions between T-cell receptors (TCRs) and immunogenic peptides holds profound implications across diverse research domains and clinical scenarios. Unsupervised clustering models (UCMs) cannot predict peptide-TCR binding directly, while supervised predictive models (SPMs) often face challenges in identifying antigens previously unencountered by the immune system or possessing limited TCR binding repertoires. Therefore, we propose HeteroTCR , an SPM based on Heterogeneous Graph Neural Network (GNN), to accurately predict peptide-TCR binding probabilities. HeteroTCR captures within-type (TCR-TCR or peptide-peptide) similarity information and between-type (peptide-TCR) interaction insights for predictions on unseen peptides and TCRs, surpassing limitations of existing SPMs. Our evaluation shows HeteroTCR outperforms state-of-the-art models on independent datasets. Ablation studies and visual interpretation underscore the Heterogeneous GNN module's critical role in enhancing HeteroTCR's performance by capturing pivotal binding process features. We further demonstrate the robustness and reliability of HeteroTCR through validation using single-cell datasets, aligning with the expectation that pMHC-TCR complexes with higher predicted binding probabilities correspond to increased binding fractions.
biology
What problem does this paper attempt to address?
The paper aims to address the problem of predicting interactions between T-cell receptors (TCRs) and immunopeptides, a research topic of significant importance in fields such as cancer immunotherapy, vaccine development, and personalized medicine. Existing prediction methods are mainly divided into two categories: Unsupervised Clustering Models (UCMs) and Supervised Prediction Models (SPMs). While UCMs can cluster TCR sequences to some extent, they cannot directly predict peptide-TCR binding; SPMs, on the other hand, can make predictions but perform poorly when dealing with unseen antigens or TCRs with limited binding spectra. To solve the above issues, the authors propose the HeteroTCR method, an SPM based on Heterogeneous Graph Neural Network (HGNN). HeteroTCR can utilize the same-type similarity (TCR-TCR or peptide-peptide) and cross-type interaction information (peptide-TCR) from TCR and peptide sequence information, thereby improving the prediction accuracy of the binding probability between unseen peptides and TCRs. Specifically, the model first extracts numerical embeddings of TCRs and peptides through a pre-trained Convolutional Neural Network (CNN) module, then uses the HGNN module to capture information between different types of nodes, and finally determines whether there is an interaction between TCRs and peptides through a Multi-Layer Perceptron (MLP). Evaluations on multiple independent datasets show that HeteroTCR outperforms existing advanced models in prediction performance. Additionally, ablation studies confirm the critical role of the HGNN module in enhancing overall performance. The model performs well under different data partitioning strategies, especially when dealing with unseen antigens and TCRs, demonstrating good generalization ability. This indicates that HeteroTCR has high practical value and potential in peptide-TCR interaction prediction.