mtx-COBRA: Subcellular localization prediction for bacterial proteins
Isha Arora,Arkadij Kummer,Hao Zhou,Mihaela Gadjeva,Eric Ma,Gwo-Yu Chuang,Edison Ong
DOI: https://doi.org/10.1016/j.compbiomed.2024.108114
IF: 7.7
2024-02-24
Computers in Biology and Medicine
Abstract:Background Bacteria can have beneficial effects on our health and environment; however, many are responsible for serious infectious diseases, warranting the need for vaccines against such pathogens. Bioinformatic and experimental technologies are crucial for the development of vaccines. The vaccine design pipeline requires identification of bacteria-specific antigens that can be recognized and can induce a response by the immune system upon infection. Immune system recognition is influenced by the location of a protein. Methods have been developed to determine the subcellular localization (SCL) of proteins in prokaryotes and eukaryotes. Bioinformatic tools such as PSORTb can be employed to determine SCL of proteins, which would be tedious to perform experimentally. Unfortunately, PSORTb often predicts many proteins as having an "Unknown" SCL, reducing the number of antigens to evaluate as potential vaccine targets. Method We present a new pipeline called sub C ellular l O calization prediction for B acte R i A l Proteins (mtx-COBRA). mtx-COBRA uses Meta's protein language model, Evolutionary Scale Modeling, combined with an Extreme Gradient Boosting machine learning model to identify SCL of bacterial proteins based on amino acid sequence. This pipeline is trained on a curated dataset that combines data from UniProt and the publicly available ePSORTdb dataset. Results Using benchmarking analyses, nested 5-fold cross-validation, and leave-one-pathogen-out methods, followed by testing on the held-out dataset, we show that our pipeline predicts the SCL of bacterial proteins more accurately than PSORTb. Conclusions mtx-COBRA provides an accessible pipeline that can more efficiently classify bacterial proteins with currently "Unknown" SCLs than existing bioinformatic and experimental methods.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology