Advancing microRNA Target Site Prediction with Transformer and Base-Pairing Patterns

Yue Bi,Fuyi Li,Cong Wang,Tong Pan,Chen Davidovich,Geoffrey I Webb,Jiangning Song
DOI: https://doi.org/10.1101/2024.05.05.592612
2024-05-14
Abstract:Micro RNAs (miRNAs) are short non-coding RNAs involved in various cellular processes, playing a crucial role in gene regulation. Identifying miRNA targets remains a central challenge and is pivotal for elucidating the complex gene regulatory networks. Traditional computational approaches have predominantly focused on identifying miRNA targets through perfect Watson-Crick base pairings within the seed region, referred to as canonical sites. However, emerging evidence suggests that perfect seed matches are not a prerequisite for miRNA-mediated regulation, underscoring the importance of also recognizing imperfect, or non-canonical, sites. To address this challenge, we propose Mimosa, a new computational approach that employs the Transformer framework to enhance the prediction of miRNA targets. Mimosa distinguishes itself by integrating contextual, positional, and base-pairing information to capture in-depth attributes, thereby improving its predictive capabilities. Its unique ability to identify non-canonical base-pairing patterns makes Mimosa a standout model, reducing the reliance on pre-selecting candidate targets. Mimosa achieves superior performance in gene-level predictions and also shows impressive performance in site-level predictions across various non-human species through extensive benchmarking tests. To facilitate research efforts in miRNA targeting, we have developed an easy-to-use web server for comprehensive end-to-end predictions, which is publicly available at http://monash.bioweb.cloud.edu.au/Mimosa/.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy of microRNA (miRNA) target site prediction, especially to identify non - canonical base - pairing patterns. Traditional methods mainly rely on perfect Watson - Crick (WC) base - pairing within the seed region to identify miRNA targets. However, more and more evidence shows that perfect seed matching is not a necessary condition for miRNA - mediated regulation, and non - canonical base - pairing can also lead to a decrease in mRNA levels. Therefore, the paper proposes a new computational method named Mimosa, which uses the Transformer framework to enhance the prediction ability of miRNA targets. By integrating context, position and base - pairing information, Mimosa can independently identify various types of non - canonical base - pairing patterns, thereby reducing the dependence on pre - selected candidate target sites and improving the accuracy and scope of prediction. Specifically, the paper addresses the following key issues: 1. **Improving the accuracy of miRNA target prediction**: By introducing the Transformer framework and dynamic programming algorithms to identify the optimal local alignment and create base - pairing embeddings, Mimosa can automatically identify complex interactions between miRNA and mRNA without relying on manual feature engineering. 2. **Identifying non - canonical base - pairing patterns**: Traditional prediction methods mainly focus on classical base - pairing patterns, such as 8mer, 7mer - m8, 7mer - A1 and 6mer, etc., while Mimosa can identify non - canonical base - pairing patterns including mismatches, bulges and wobble pairs, expanding the scope of prediction. 3. **Reducing the dependence on pre - selected candidate target sites**: By directly integrating base - pairing patterns in model training, Mimosa reduces the dependence on pre - selected candidate target sites and avoids missing biologically important non - canonical sites due to overly strict pre - selection criteria. 4. **Generalization ability across species**: Mimosa not only performs well in human gene - level prediction, but also shows an impressive performance in site - level prediction of multiple non - human species, and its generalization ability has been verified through extensive benchmark tests. In conclusion, by proposing the Mimosa model, this paper aims to overcome the limitations of existing miRNA target prediction methods, especially in identifying non - canonical base - pairing patterns, so as to more comprehensively reveal the complexity of the miRNA regulatory network.