Exploring functional conservation : a new machine learning approach to RNA-editing

Michał Zawisza-Álvarez,Jesús Peñuela-Melero,Esteban Vegas,Ferran Reverter,Jordi Garcia-Fernàndez,Carlos Herrera-Úbeda
DOI: https://doi.org/10.1101/2023.11.21.568001
2024-04-15
Abstract:Around 50 years from now, molecular biology opened the path to understand changes in forms, adaptations, complexity, or the basis of human diseases, through myriads of reports on gene birth, gene duplication, gene expression regulation, and splicing regulation, among other relevant mechanisms behind gene function. Here, with the advent of big data and artificial intelligence (AI), we focus on an elusive and intriguing mechanism of gene function regulation, RNA editing, in which a single nucleotide from an RNA molecule is changed with a remarkable impact in the increase of the complexity of transcriptome and proteome. We present a new generation approach to assess the functional conservation of the RNA-editing targeting mechanism using two AI learning algorithms, random forest (RF) and bidirectional long short-term memory (biLSTM) neural networks with attention layer. These algorithms combined with RNA-editing data coming from databases and variant calling from same-individual RNA and DNA-seq experiments from different species, allowed us to predict RNA-editing events using both primary sequence and secondary structure. Then, we devised a method for assessing conservation or divergence in the molecular mechanisms of editing completely : the cross-training analysis. This novel method not only helps to understand the conservation of the editing mechanism through evolution but could set the basis for understanding how it is involved in several human diseases.
Genetics
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Functional Conservation of RNA Editing Mechanisms**: The paper proposes a new method to evaluate the functional conservation of RNA editing targeting mechanisms, independent of the conservation of the editing sites themselves. By using two machine learning algorithms, Random Forest (RF) and bidirectional Long Short-Term Memory (biLSTM) neural networks with attention layers, combined with RNA editing data and variant detection results from different species, it predicts RNA editing events and analyzes their conservation and differences in the evolutionary process. 2. **Prediction Accuracy of RNA Editing**: The paper compares the performance of different machine learning methods in RNA editing prediction and finds that the biLSTM algorithm outperforms other existing methods on human data. Additionally, the study explores the differences in prediction accuracy between datasets of different species, such as human, mouse, and mackerel datasets. 3. **Cross-Species Training Analysis**: Through cross-species training analysis, researchers can assess the conservation degree of RNA editing mechanisms between different species. For example, when a model is trained with data from one species and tested on another, the similarity and differences in mechanisms can be inferred. This method helps to understand the changes in RNA editing mechanisms during evolution and their relationship with human diseases. In summary, by introducing new machine learning methods, this paper aims to better understand the conservation and evolutionary characteristics of RNA editing mechanisms across different species, laying the foundation for further research into their biological functions and potential medical applications.