Abstract:MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally. In animals, this regulation is achieved via base-pairing with partially complementary sequences on mainly 3' UTR region of messenger RNAs (mRNAs). Computational approaches that predict miRNA target interactions (MTIs) facilitate the process of narrowing down potential targets for experimental validation. The availability of new datasets of high-throughput, direct MTIs has led to the development of machine learning (ML) based methods for MTI prediction. To train an ML algorithm, it is beneficial to provide entries from all class labels (i.e., positive and negative). Currently, no high-throughput assays exist for capturing negative examples. Therefore, current ML approaches must rely on either artificially generated or inferred negative examples deduced from experimentally identified positive miRNA-target datasets. Moreover, the lack of uniform standards for generating such data leads to biased results and hampers comparisons between studies. In this comprehensive study, we collected methods for generating negative data for animal miRNAs–target interactions and investigated their impact on the classification of true human MTIs. Our study relies on training ML models on a fixed positive dataset in combination with different negative datasets and evaluating their intra- and cross-dataset performance. As a result, we were able to examine each method independently and evaluate ML models' sensitivity to the methodologies utilized in negative data generation. To achieve a deep understanding of the performance results, we analyzed unique features that distinguish between datasets. In addition, we examined whether one-class classification models that utilize solely positive interactions for training are suitable for the task of MTI classification. We demonstrate the importance of negative data in MTI classification, analyze specific methodological characteristics that differentiate negative datasets, and highlight the challenge of ML models generalizing interaction rules from training to testing sets derived from different approaches. This study provides valuable insights into the computational prediction of MTIs that can be further used to establish standards in the field. Gene expression regulation is fundamental for all organisms' development, homeostasis, and environmental adaptation. microRNAs (miRNAs) play a central role in post-transcriptional gene regulation by binding to target mRNAs and repressing their translation or mediating their degradation. Technical challenges in experimental miRNA target identification led to growing interest in computational target prediction. While machine learning (ML) models have shown success in this area, they rely heavily on artificially generated negative examples due to limited experimental data. The diversity of methods for generating negative interactions and the lack of a uniform standardized approach introduce bias and hinder the comparison of results across different studies. In this study, we collected methods for generating negative data for animal miRNAs–target interactions and analyzed their impact on classifying true interactions in humans. Using an ML approach, we evaluated miRNA–target prediction performance within and across negative datasets. We also analyzed unique features distinguishing between the negative datasets to understand the performance results better. Our study shows that negative data is essential for accurately classifying miRNA–target interactions and that ML models struggle to apply what they have learned from the training set when faced with new data derived from different approaches. This study particularly appeals to researchers interested in miRNA–target classification. It emphasizes the need for standardized methods to enhance comparability between studies.

Benchmarking the negatives: Effect of negative data generation on the classification of miRNA-mRNA interactions

TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples

Improved and Promising Identificationof Human MicroRNAs by Incorporatinga High-Quality Negative Set.

Identifying Human miRNA Target Sites via Learning the Interaction Patterns between miRNA and mRNA Segments

Selecting high-quality negative samples for effectively predicting protein-RNA interactions

Too Many False Targets for MicroRNAs: Challenges and Pitfalls in Prediction of miRNA Targets and Their Gene Ontology in Model and Non‐model Organisms

miRNA Targets: From Prediction Tools to Experimental Validation

Current approaches to micro-RNA analysis and target gene prediction

Improved differential expression analysis of miRNA-seq data by modeling competition to be counted

Microrna As An Integral Part Of Cell Communication: Regularized Target Prediction And Network Prediction

Common features of microRNA target prediction tools

Automatic learning of pre-miRNAs from different species

Computational miRNomics

–informatics: Integrate negative controls to get the good data

Reexamining assumptions about miRNA-guided gene silencing

Machine learning in the development of targeting microRNAs in human disease

The miRNA–target interactions: An underestimated intricacy

Mirna Target Prediction Based on Gene Ontology

Bayesian Analysis for miRNA and mRNA Interactions Using Expression Data

An integrated framework for the identification of potential miRNA-disease association based on novel negative samples extraction strategy.

A miRNA-Disease Association Identification Method Based on Reliable Negative Sample Selection and Improved Single-Hidden Layer Feedforward Neural Network