Abstract:Background Computational prediction of major histocompatibility complex class II (MHC-II) binding peptides can assist researchers in understanding the mechanism of immune systems and developing peptide based vaccines. Although many computational methods have been proposed, the performance of these methods are far from satisfactory. The difficulty of MHC-II peptide binding prediction comes mainly from the large length variation of binding peptides. Methods We develop a novel multiple instance learning based method called MHC2MIL, in order to predict MHC-II binding peptides. We deem each peptide in MHC2MIL as a bag, and some substrings of the peptide as the instances in the bag. Unlike previous multiple instance learning based methods that consider only instances of fixed length 9 (9 amino acids), MHC2MIL is able to deal with instances of both lengths of 9 and 11 (11 amino acids), simultaneously. As such, MHC2MIL incorporates important information in the peptide flanking region. For measuring the distances between different instances, furthermore, MHC2MIL explicitly highlights the amino acids in some important positions. Results Experimental results on a benchmark dataset have shown that, the performance of MHC2MIL is significantly improved by considering the instances of both 9 and 11 amino acids, as well as by emphasizing amino acids at key positions in the instance. The results are consistent with those reported in the literature on MHC-II peptide binding. In addition to five important positions (1, 4, 6, 7 and 9) for HLA(human leukocyte antigen, the name of MHC in Humans) DR peptide binding, we also find that position 2 may play some roles in the binding process. By using 5-fold cross validation on the benchmark dataset, MHC2MIL outperforms two state-of-the-art methods of MHC2SK and NN-align with being statistically significant, on 12 HLA DP and DQ molecules. In addition, it achieves comparable performance with MHC2SK and NN-align on 14 HLA DR molecules. MHC2MIL is freely available at http://datamining-iip.fudan.edu.cn/service/MHC2MIL/index.html .

Trans-Allelic Model for Prediction of Peptide:MHC-II Interactions

Peptide binding predictions for HLA DR, DP and DQ molecules

Ranking-based Convolutional Neural Network Models for Peptide-MHC Binding Prediction

MHC2MIL: a Novel Multiple Instance Learning Based Method for MHC-II Peptide Binding Prediction by Considering Peptide Flanking Region and Residue Positions

A Novel Peptide Binding Prediction Approach for HLA-DR Molecule Based on Sequence and Structural Information.

Immunogenicity Prediction of The Peptides Presented by MHC I Molecules Based on The TAP Binding Affinity Model

Improving MHC Binding Peptide Prediction by Incorporating Binding Data of Auxiliary MHC Molecules

MHCII-peptide presentation: an assessment of the state-of-the-art prediction methods

Peptide-binding specificity prediction using fine-tuned protein structure prediction networks

Improving Prediction of MHC Class I Binding Peptides with Additional Binding Data

Quantitative prediction of MHC-II peptide binding affinity using relevance vector machine

Evaluating Cross-linking-driven integrative modeling in peptide-HLAII complexes prediction with insights for refining predictive accuracy

Machine learning application to predict binding affinity between peptide containing non-canonical amino acids and HLA0201

Quantitative Prediction of Mhc-Ii Peptide Binding Affinity Using Global Description of Peptide Sequences

Toward prediction of binding affinities between the MHC protein and its peptide ligands using quantitative structure-affinity relationship approach.

A Bayesian Regression Approach to the Prediction of MHC-II Binding Affinity

Prediction of MHC-binding Peptides of Flexible Lengths from Sequence-Derived Structural and Physicochemical Properties

MetaMHC: a meta approach to predict peptides binding to MHC molecules.

Limitations of Ab Initio Predictions of Peptide Binding to MHC Class II Molecules.

Predicting MHC-I ligands across alleles and species: How far can we go?

Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling