Abstract:T cell receptors (TCR) define the specificity of T cells and are responsible for their interaction with peptide antigen targets presented in complex with major histocompatibility complex (MHC) molecules. Understanding the rules underlying this interaction hence forms the foundation for our understanding of basic adaptive immunology. Over the last decade, efforts have been dedicated to developing assays for high throughput identification of peptide-specific TCRs. Based on such data, several computational methods have been proposed for predicting the TCR-pMHC interaction. The general conclusion from these studies is that the prediction of TCR interactions with MHC-peptide complexes remains highly challenging. Several reasons form the basis for this including scarcity and quality of data, and ill-defined modeling objectives imposed by the high redundancy of the available data. In this work, we propose a framework for dealing with this redundancy, allowing us to address essential questions related to the modeling of TCR specificity including the use of peptide- versus pan-specific models, how to best define negative data, and the performance impact of integrating of CDR1 and 2 loops. Further, we illustrate how and why it is strongly recommended to include simple similarity-based modeling approaches when validating an improved predictive power of machine learning models, and that such validation should include a performance evaluation as a function of "distance" to the training data, to quantify the potential for generalization of the proposed model. The conclusion of the work is that, given current data, TCR specificity is best modeled using peptide-specific approaches, integrating information from all 6 CDR loops, and with negative data constructed from a combination of true and mislabeled negatives. Comparing such machine learning models to similarity-based approaches demonstrated an increased performance gain of the former as the "distance" to the training data was increased; thus demonstrating an improved generalization ability of the machine learning-based approaches. We believe these results demonstrate that the outlined modeling framework and proposed evaluation strategy form a solid basis for investigating the modeling of TCR specificities and that adhering to such a framework will allow for faster progress within the field. The final devolved model, NetTCR-2.1, is available at https://services.healthtech.dtu.dk/service.php?NetTCR-2.1.

Assessing the Generalization Capabilities of TCR Binding Predictors via Peptide Distance Analysis

Enhancing TCR specificity predictions by combined pan- and peptide-specific training, loss-scaling, and sequence similarity integration

NetTCR-2.1: Lessons and guidance on how to develop models for TCR specificity predictions

T cell receptor binding prediction: A machine learning revolution

Improving generalizability for MHC-I binding peptide predictions through geometric deep learning

Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency

epiTCR-KDA: Knowledge Distillation model on Dihedral Angles for TCR-peptide prediction

Predicting TCR-Epitope Binding Specificity Using Deep Metric Learning and Multimodal Learning

Predicting Antigen Specificity of Single T Cells Based on TCR CDR 3 Regions

TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets

T-Cell Receptor Cognate Target Prediction Based on Paired α and β Chain Sequence and Structural CDR Loop Similarities

DapPep: Domain Adaptive Peptide-agnostic Learning for Universal T-cell Receptor-antigen Binding Affinity Prediction

Revealing the hidden sequence distribution of epitope-specific TCR repertoires and its influence on machine learning model performance

tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity

HeteroTCR: A heterogeneous graph neural network-based method for predicting peptide-TCR interaction

Feature Selection Enhances Peptide Binding Predictions for TCR-Specific Interactions

Attentive Variational Information Bottleneck for TCR-peptide interaction prediction

Predicting Antigen‐Specificities of Orphan T Cell Receptors from Cancer Patients with TCRpcDist

Trans-Allelic Model for Prediction of Peptide:MHC-II Interactions

Pretraining Transformers for TCR-pMHC Binding Prediction.

NetTCR: sequence-based prediction of TCR binding to peptide-MHC complexes using convolutional neural networks