Abstract:Background An emerging type of cancer treatment, known as cell immunotherapy, is gaining popularity over chemotherapy or other radiation therapy that causes mass destruction to our body. One favourable approach in cell immunotherapy is the use of neoantigens as targets that help our body immune system identify the cancer cells from healthy cells. Neoantigens, which are non-autologous proteins with individual specificity, are generated by non-synonymous mutations in the tumor cell genome. Owing to its strong immunogenicity and lack of expression in normal tissues, it is now an important target for tumor immunotherapy. Neoantigens are some form of special protein fragments excreted as a by-product on the surface of cancer cells during the DNA mutation at the tumour. In cancer immunotherapies, certain neoantigens which exist only on cancer cells elicit our white blood cells (body’s defender, anti-cancer T-cell) responses that fight the cancer cells while leaving healthy cells alone. Personalized cancer vaccines therefore can be designed de novo for each individual patient, when the specific neoantigens are found to be relevant to his/her tumour. The vaccine which is usually coded in synthetic long peptides, RNA or DNA representing the neoantigens trigger an immune response in the body to destroy the cancer cells (tumour). The specific neoantigens can be found by a complex process of biopsy and genome sequencing. Alternatively, modern technologies nowadays tap on AI to predict the right neoantigen candidates using algorithms. However, determining the binding and non-binding of neoantigens on T-cell receptors (TCR) is a challenging computational task due to its very large search space.Objective To enhance the efficiency and accuracy of traditional deep learning tools, for serving the same purpose of finding potential responsiveness to immunotherapy through correctly predicted neoantigens. It is known that deep learning is possible to explore which novel neoantigens bind to T-cell receptors and which ones don’t. The exploration may be technically expensive and time-consuming since deep learning is an inherently computational method. one can use putative neoantigen peptide sequences to guide personalized cancer vaccines design.Methods These models all proceed through complex feature engineering, including feature extraction, dimension reduction and so on. In this study, we derived 4 features to facilitate prediction and classification of 4 HLA-peptide binding namely AAC and DC from the global sequence, and the LAAC and LDC from the local sequence information. Based on the patterns of sequence formation, a nested structure of bidirectional long-short term memory neural network called local information module is used to extract context-based features around every residue. Another bilstm network layer called global information module is introduced above local information module layer to integrate context-based features of all residues in the same HLA-peptide binding chain, thereby involving inter-residue relationships in the training process. introducedResults Finally, a more effective model is obtained by fusing the above two modules and 4 features matric, the method performs significantly better than previous prediction schemes, whose overall r-square increased to 0.0125 and 0.1064 on train and increased to 0.0782 and 0.2926 on test datasets. The RMSE for our proposed models trained decreased to approximately 0.0745 and 1.1034, respectively, and decreased to 0.6712 and 1.6506 on test dataset.Conclusion Our work has been actively refining a machine-learning model to improve neoantigen identification and predictions with the determinants for Neoantigen identification. The final experimental results show that our method is more effective than existing methods for predicting peptide types, which can help laboratory researchers to identify the type of novel HLA-peptide binding.### Competing Interest StatementThe authors have declared no competing interest.

Genesis: A Modular Protein Language Modelling Approach to Immunogenicity Prediction

IMPROVE: a feature model to predict neoepitope immunogenicity through broad-scale validation of T-cell recognition

Abstract 4422: Ineo-Pred: A Hybrid-Model Predictor for MHC Class I Neoantigens

Multi-strategies embedded framework for neoantigen vaccine maturation

PGNneo: A Proteogenomics-Based Neoantigen Prediction Pipeline in Noncoding Regions.

Guiding a language-model based protein design method towards MHC Class-I immune-visibility profiles for vaccines and therapeutics

A comprehensive proteogenomic pipeline for neoantigen discovery to advance personalized cancer immunotherapy

ProGeo-neo: a Customized Proteogenomic Workflow for Neoantigen Prediction and Selection

NeoaPred: a deep-learning framework for predicting immunogenic neoantigen based on surface and structural features of peptide-human leukocyte antigen complexes

A Deep Learning Approach for NeoAG-Specific Prediction Considering Both HLA-Peptide Binding and Immunogenicity: Finding Neoantigens to Making T-Cell Products More Personal

Proteogenomics guided identification of functional neoantigens in non-small cell lung cancer

GGNpTCR: A Generative Graph Structure Neural Network for Predicting Immunogenic Peptides for T-cell Immune Response

APE-Gen2.0: Expanding Rapid Class I Peptide–Major Histocompatibility Complex Modeling to Post-Translational Modifications and Noncanonical Peptide Geometries

Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling

ImmunoStruct: Integration of protein sequence, structure, and biochemical properties for immunogenicity prediction and interpretation

ConvNeXt-MHC: improving MHC-peptide affinity prediction by structure-derived degenerate coding and the ConvNeXt model

Improved proteasomal cleavage prediction with positive-unlabeled learning

Enhancing TCR specificity predictions by combined pan- and peptide-specific training, loss-scaling, and sequence similarity integration

OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs

Antigenic: An improved prediction model of protective antigens

IEPAPI: a Method for Immune Epitope Prediction by Incorporating Antigen Presentation and Immunogenicity