Prediction of Protein Half-lives from Amino Acid Sequences by Protein Language Models

Tatsuya Sagawa,Eisuke Kanao,Kosuke Ogata,Koshi Imami,Yasushi Ishihama
DOI: https://doi.org/10.1101/2024.09.10.612367
2024-09-14
Abstract:We developed a protein half-life prediction model, PLTNUM, based on a protein language model using an extensive dataset of protein sequences and protein half-lives from the NIH3T3 mouse embryo fibroblast cell line as a training set. PLTNUM achieved an accuracy of 71% on validation data and showed robust performance with an ROC of 0.73 when applied to a human cell line dataset. By incorporating Shapley Additive Explanations (SHAP) into PLTNUM, we identified key factors contributing to shorter protein half-lives, such as cysteine-containing domains and intrinsically disordered regions. Using SHAP values, PLTNUM can also predict potential degron sequences that shorten protein half-lives. This model provides a platform for elucidating the sequence dependency of protein half-lives, while the uncertainty in predictions underscores the importance of biological context in influencing protein half-lives.
Bioinformatics
What problem does this paper attempt to address?