Voting-ac4C:Pre-trained large RNA language model enhances RNA N4-acetylcytidine site prediction
Yanna Jia,Zilong Zhang,Shankai Yan,Qingchen Zhang,Leyi Wei,Feifei Cui
DOI: https://doi.org/10.1016/j.ijbiomac.2024.136940
IF: 8.2
2024-11-02
International Journal of Biological Macromolecules
Abstract:RNA N4-acetylcytidine (ac4C) modification plays a crucial role in gene expression regulation. However, existing prediction methods face limitations in capturing RNA sequence features, particularly in handling sequence complexity and long-range dependencies. To enhance the accuracy of RNA-ac4C modification sites prediction, this study introduces, for the first time, the transformer-based RNAErnie pre-trained model, which deeply extracts semantic information from RNA sequences. This model is combined with six traditional feature extraction methods (such as One-hot, ENAC, etc.) to form a multidimensional feature set. On this basis, we propose the Voting-ac4C model, which utilizes a deep neural network for feature selection. The selected features are then fed into a soft voting ensemble learning model, integrating the strengths of various machine learning algorithms to predict RNA-ac4C modification sites. Experimental results demonstrate that compared to the state-of-the-art methods, Voting-ac4C achieves significant improvements across multiple metrics, including AUC, SN, SP, ACC, and MCC. This study provides a novel approach for RNA modification sites prediction and highlights the potential applications of pre-trained models in biological sequence analysis.
polymer science,biochemistry & molecular biology,chemistry, applied