Discovering nuclear localization signal universe through a novel deep language learning network with attention as the neurons

Hongbin Shen,Yifan Li,Xiaoyong Pan
DOI: https://doi.org/10.21203/rs.3.rs-4204156/v1
2024-01-01
Abstract:Abstract Nuclear localization signals (NLSs) are pivotal peptide fragments within proteins, playing a decisive role in guiding proteins into the cell nucleus. Determining the existence and precise locations of NLS through experimental methods is time-consuming, resulting in a scarcity of experimentally validated NLS fragments. Consequently, annotated NLS dataset for training deep learning models in this domain are relatively small. In this study, we propose a novel interpretable approach NLSExplorer, which leverages large-scale protein language models to capture crucial spatial information with a novel deep attention network for NLS identification. NLSExplorer achieves superior predictive performance to existing methods on two NLS benchmark datasets. Furthermore, we leverage NLSExplorer to perform a comprehensive analysis of all proteins located in the nucleus within the SwissProt database, revealing a multi-level universe of potential NLS with variations in NLS preferences across different species. Moreover, the potential evolutionary trends of nuclear localization proteins across species are explored with NLSExplorer, showing the dynamic nature of NLS across diverse organisms. This study not only enhances our understanding of cross-species evolutionary trends for nuclear localization proteins, but also showcases NLSExplorer as a powerful tool for predicting and exploring NLS space.
What problem does this paper attempt to address?