GP-HTNLoc: A Graph Prototype Head-Tail Network-based Model for Multi-label Subcellular Localization Prediction of ncRNAs

Shuangkai Han,Lin Liu
DOI: https://doi.org/10.1016/j.csbj.2024.04.052
IF: 6.155
2024-05-05
Computational and Structural Biotechnology Journal
Abstract:Highlights • The head-tail network contributes to addressing class imbalance. • The prototype module improves the performance of the model in the scenario of small-sample multi-label classification. • Case study have demonstrated the usability and reliability of GP-HTNLoc. • The SHAP method was employed to explain the prediction process of GP-HTNLoc. • A user-friendly online web application has been developed. Numerous research results demonstrated that understanding the subcellular localization of non-coding RNAs (ncRNAs) is pivotal in elucidating their roles and regulatory mechanisms in cells. Despite the existence of over ten computational models dedicated to predicting the subcellular localization of ncRNAs, a majority of these models are designed solely for single-label prediction. In reality, ncRNAs often exhibit localization across multiple subcellular compartments. Furthermore, the existing multi-label localization prediction models are insufficient in addressing the challenges posed by the scarcity of training samples and class imbalance in ncRNA dataset. To address these limitations, this study proposes a novel multi-label localization prediction model for ncRNAs, named GP-HTNLoc. To mitigate class imbalance, GP-HTNLoc adopts separate training approaches for head and tail location labels. Additionally, GP-HTNLoc introduces a pioneering graph prototype module to enhance its performance in small-sample, multi-label scenarios. The experimental results based on 10-fold cross-validation on benchmark datasets demonstrate that GP-HTNLoc achieves competitive predictive performance. The average results from 10 rounds of testing on an independent dataset show that GP-HTNLoc outperforms the best existing models on the human lncRNA, human snoRNA, and human miRNA subsets, with average precision improvements of 31.5%, 14.2%, and 5.6%, respectively, reaching 0.685, 0.632, and 0.704. A user-friendly online GP-HTNLoc server is accessible at https://56s8y85390.goho.co . Graphical abstract Download : Download high-res image (292KB) Download : Download full-size image
biochemistry & molecular biology
What problem does this paper attempt to address?