Fuzzy kernel evidence Random Forest for identifying pseudouridine sites

Mingshuai Chen,Mingai Sun,Xi Su,Prayag Tiwari,Yijie Ding
DOI: https://doi.org/10.1093/bib/bbae169
IF: 9.5
2024-04-16
Briefings in Bioinformatics
Abstract:Pseudouridine is an RNA modification that is widely distributed in both prokaryotes and eukaryotes, and plays a critical role in numerous biological activities. Despite its importance, the precise identification of pseudouridine sites through experimental approaches poses significant challenges, requiring substantial time and resources.Therefore, there is a growing need for computational techniques that can reliably and quickly identify pseudouridine sites from vast amounts of RNA sequencing data. In this study, we propose fuzzy kernel evidence Random Forest (FKeERF) to identify pseudouridine sites. This method is called PseU-FKeERF, which demonstrates high accuracy in identifying pseudouridine sites from RNA sequencing data. The PseU-FKeERF model selected four RNA feature coding schemes with relatively good performance for feature combination, and then input them into the newly proposed FKeERF method for category prediction. FKeERF not only uses fuzzy logic to expand the original feature space, but also combines kernel methods that are easy to interpret in general for category prediction. Both cross-validation tests and independent tests on benchmark datasets have shown that PseU-FKeERF has better predictive performance than several state-of-the-art methods. This new method not only improves the accuracy of pseudouridine site identification, but also provides a certain reference for disease control and related drug development in the future.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the problem of reliably and quickly identifying pseudouridine sites in RNA sequences through computational methods. Specifically: 1. **Background and Importance**: - Pseudouridine is a type of RNA modification widely present in both prokaryotes and eukaryotes, playing a crucial role in various biological processes. - Despite its significant importance, experimental methods face major challenges in accurately identifying pseudouridine sites, requiring substantial time and resources. 2. **Proposed Method**: - The researchers proposed a method called Fuzzy Kernel Evidence Random Forest (FKeERF) and named it PseU-FKeERF. - This method combines fuzzy logic to extend the original feature space and an easily interpretable kernel method for category prediction. 3. **Main Contributions**: - The PseU-FKeERF model selects four well-performing RNA feature encoding schemes from RNA sequencing data, combines them, and inputs them into the newly proposed FKeERF method for category prediction. - Experimental results show that PseU-FKeERF performs excellently in cross-validation tests and independent tests on benchmark datasets, outperforming several existing advanced methods. - The new method not only improves the accuracy of pseudouridine site identification but also provides a reference for disease control and related drug development. Through this research, the paper aims to develop a more efficient and accurate computational method to identify pseudouridine sites, thereby advancing research in related fields.