SETE: Sequence-based Ensemble learning approach for TCR Epitope binding prediction

Yao Tong,Jiayin Wang,Tian Zheng,Xuanping Zhang,Xiao Xiao,Xiaoyan Zhu,Xin Lai,Xiang Liu
DOI: https://doi.org/10.1016/j.compbiolchem.2020.107281
IF: 3.737
2020-08-01
Computational Biology and Chemistry
Abstract:<p>Prediction the binding of T cell receptors (TCRs) to epitopes plays a vital role in the immunotherapy, and its specific binding helps guide the development of therapeutic vaccines and cancer treatments. Many prediction methods attempted to explain the relationship between TCR repertoires from different aspects such as the V(D)J Gene locus and the biophysical features of amino acids molecules, but the extraction of these features is time consuming and the performance of these models are limited. Few studies have investigated how k-mers formed by adjacent amino acids in TCR sequences direct the epitope recognition, and the specific mechanism of TCR and epitope is still unclear. Motivated by these, we presented <em>SETE</em> (Sequence-based Ensemble learning approach for TCR Epitope binding prediction), a novel model to predict the TCRs binding epitopes accurately. The model deconstructed the CDR3β sequence to short amino acid chains as features and learned the pattern of them between different TCR repertoires with gradient boosting decision tree algorithm. Experiments have demonstrated that <em>SETE</em> can be helpful in predicting the TCRs' corresponding epitopes and it outperforms other state-of-the-art methods in predicting the epitope specificity of TCR on VDJdb data set. The source codes have been uploaded at <a href="https://github.com/wonanut/SETE">https://github.com/wonanut/SETE</a> for academic usage only.</p>
biology,computer science, interdisciplinary applications
What problem does this paper attempt to address?