siRNADesign: A Graph Neural Network for siRNA Efficacy Prediction via Deep RNA Sequence Analysis

Rongzhuo Long,Ziyu Guo,Da Han,Xudong Yuan,Guangyong Chen,Pheng Ann Heng,Liang Zhang
DOI: https://doi.org/10.1101/2024.04.28.591509
2024-05-28
Abstract:The clinical adoption of small interfering RNAs (siRNAs) has prompted the development of various computational strategies for siRNA design, from traditional data analysis to advanced machine learning techniques. However, previous studies have inadequately considered the full complexity of the siRNA silencing mechanism, neglecting critical elements such as siRNA positioning on mRNA, RNA base-pairing probabilities, and RNA-AGO2 interactions, thereby limiting the insight and accuracy of existing models. Here, we introduce siRNADesign, a Graph Neural Network (GNN) framework that leverages both non-empirical and empirical rules-based features of siRNA and mRNA to effectively capture the complex dynamics of gene silencing. In multiple internal datasets, siRNADesign achieves state-of-the-art performance. Significantly, siRNADesign also outperforms existing methodologies in in vitro wet lab experiments and an externally validated dataset. Additionally, we develop a new data-splitting methodology that addresses the data leakage issue, a frequently overlooked issue in previous studies, ensuring the robustness and stability of our model under various experimental settings. Through rigorous testing, siRNADesign has demonstrated remarkable predictive accuracy and robustness, making significant contributions to the field of gene silencing. Furthermore, our approach in redefining data-splitting standards aims to set new benchmarks for future research in the domain of predictive biological modeling for siRNA.
Bioinformatics
What problem does this paper attempt to address?
The paper aims to address key issues in small interfering RNA (siRNA) design, particularly improving the accuracy and stability of siRNA efficacy prediction. Specifically, the research team developed a new method called siRNADesign, which is a framework based on Graph Neural Networks (GNN) for predicting siRNA efficacy through deep RNA sequence analysis. The main objectives of siRNADesign include: 1. **Overcoming the limitations of existing methods**: Previous studies often overlooked the complexity of the siRNA silencing mechanism, such as the positioning of siRNA on mRNA, RNA base pairing probability, and the interaction between RNA and AGO2 protein. These factors are crucial for accurately predicting siRNA efficacy. 2. **Introducing new feature representations**: siRNADesign considers not only non-empirical rule features (such as sequence embedding, position encoding, base pairing probability, etc.) but also integrates empirical rule features (such as thermodynamic stability profiles, nucleotide frequency, GC percentage, and siRNA rule encoding). This approach enables the model to more comprehensively capture the complex dynamics of gene silencing. 3. **Improving evaluation strategies**: To ensure the robustness and effectiveness of the model, the paper proposes a new data splitting method to address the common issue of data leakage in previous studies. The new method randomly splits the dataset into training, validation, and test sets, avoiding data leakage or bias. 4. **Achieving state-of-the-art performance**: siRNADesign achieved state-of-the-art performance on multiple internal datasets and also performed excellently on in vitro wet experiments and external validation datasets, demonstrating its significant advantages in siRNA efficacy prediction. In summary, this study significantly improves the accuracy and reliability of siRNA efficacy prediction by deeply exploring the features of siRNA and mRNA sequences and combining advanced machine learning techniques, particularly the application of Graph Neural Networks. It provides powerful tools and support for research in the field of gene silencing.