HLAIIPred: Cross-Attention Mechanism for Modeling the Interaction of HLA Class II Molecules with Peptides

Mojtaba Haghighatlari,Nicholas Marze,Robert Joseph Seward,Andrew Ciarla,Santosh Dhule,Rachel Hindin,Jennifer Calderini,Benjamin Keenan,Sarah Hall-Swan,Timothy P Hickling,Eric Bennett,Brajesh Rai,Sophie Tourdot
DOI: https://doi.org/10.1101/2024.10.01.616078
2024-10-03
Abstract:We introduce HLAIIPred, a deep learning model to predict peptides presented by class II human leukocyte antigens (HLAII) on the surface of antigen presenting cells. HLAIIPred is trained using a Transformer-based neural network and a dataset comprising of HLAII-presented peptides identified by mass spectrometry. In addition to predicting peptide presentation, the model can also provide important insights into peptide-HLAII interactions by identifying core peptide residues that form such interactions. We evaluate the performance of HLAIIPred on three different tasks, peptide presentation in monoallelic samples, immunogenicity prediction of therapeutic antibodies, and neoantigen prioritization for cancer immunotherapy. Additionally, we created a new dataset of biotherapeutics HLAII peptides presented by human dendritic cells. This data is used to develop screening strategies to predict the unwanted immunogenic segments of therapeutic antibodies by HLAII presentation models. HLAIIPred demonstrates superior or equivalent performance when compared to the latest models across all evaluated benchmark datasets. We achieve a 16% increase in prediction of presented peptides compared to the second-best model on a set of unseen peptides presented by less frequent alleles. The model also improves the area under the precision-recall curve by 3% for distinguishing between immunogenic and non-immunogenic antibodies. We show that HLAIIPred can identify epitopes in therapeutic antibodies and prioritize neoantigens with high accuracy.
Bioinformatics
What problem does this paper attempt to address?
The problem this paper attempts to address is the prediction of peptides presented by human leukocyte antigen class II molecules (HLAII) on the surface of antigen-presenting cells. Specifically, the paper proposes a deep learning model named HLAIIPred, which aims to address the shortcomings of existing models in predicting HLAII-presented peptides. The main issues include: 1. **Lack of negative samples**: Experimental data usually only contains peptides presented by HLAII, without data on peptides that are not presented, making it difficult to train classification models. 2. **Multi-allelic nature of the data**: Experimental data is often multi-allelic, meaning that multiple alleles in a sample may be responsible for peptide presentation, leading to non-deterministic data in terms of alleles. 3. **Peptide binding core prediction**: The peptide binding core refers to the continuous 9 amino acid segment that binds to HLAII. Accurately predicting these residues is a challenge because most peptides in the training or evaluation datasets lack such data. To address these issues, the HLAIIPred model employs a Transformer-based neural network architecture and introduces a cross-attention mechanism to flexibly learn the interactions between peptides and HLAII molecules. Additionally, the model can identify the core peptide residues that form these interactions, providing important biological insights. The paper evaluates the performance of HLAIIPred through three different tasks: peptide presentation in single-allele samples, immunogenicity prediction of therapeutic antibodies, and neoantigen prioritization in cancer immunotherapy. The results show that HLAIIPred outperforms or is comparable to state-of-the-art models on multiple benchmark datasets, with a 16% improvement in predicting peptides presented by rare alleles and a 3% increase in the area under the precision-recall curve in distinguishing immunogenic from non-immunogenic antibodies.