Abstract:The involvement of non-coding RNAs in biological processes and diseases has made the exploration of their functions crucial. Most non-coding RNAs have yet to be studied, creating the need for methods that can rapidly classify large sets of non-coding RNAs into functional groups, or classes. In recent years, the success of deep learning in various domains led to its application to non-coding RNA classification. Multiple novel architectures have been developed, but these advancements are not covered by current literature reviews. We present an exhaustive comparison of the different methods proposed in the state-of-the-art and describe their associated datasets. Moreover, the literature lacks objective benchmarks. We perform experiments to fairly evaluate the performance of various tools for non-coding RNA classification on popular datasets. The robustness of methods to non-functional sequences and sequence boundary noise is explored. We also measure computation time and CO 2 emissions. With regard to these results, we assess the relevance of the different architectural choices and provide recommendations to consider in future methods. RNA can either encode proteins, which perform different functions in the genome, or be non-coding. Non-coding RNAs represent around 98% of the genome, and were long thought to be non-functional. It has now been proven that non-coding RNAs can have diverse biological functions and be involved in diseases. A large proportion of non-coding RNAs has not yet been studied. The function of specific non-coding RNAs can be studied experimentally, but experiments are costly and time-consuming. One possibility to massively characterize the function of non-coding RNAs is to use computational methods to classify them into functional groups, or classes. Recent computational methods for non-coding RNA classification are all based on deep learning, as it leads to faster runtime and improved performance. Our work presents and compares the different approaches adopted in the state-of-the-art, as well as the non-coding RNA datasets that are used. We also present a comprehensive benchmark, measuring classification performance in different conditions, computation time, and CO 2 emissions. The descriptions and comparisons provided are meant to guide researchers in the field, whether wanting to use existing tools or to develop new ones.

A comparison of automatic cell identification methods for single-cell RNA sequencing data

A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq

Systematic comparative analysis of single cell RNA-sequencing methods

Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction

scRNA-seq mixology: towards better benchmarking of single cell RNA-seq analysis methods

Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments

A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples

Software Benchmark—Classification Tree Algorithms for Cell Atlases Annotation Using Single-Cell RNA-Sequencing Data

A Strategy to Compare Single-Cell RNA Sequencing Data Sets Provides Phenotypic Insight into Cellular Heterogeneity Underlying Biological Similarities and Differences Between Samples

A systematic performance evaluation of clustering methods for single-cell RNA-seq data

Comparison of Single Cell Transcriptome Sequencing Methods: Of Mice and Men

A benchmark of batch-effect correction methods for single-cell RNA sequencing data

A comparison of scRNA-seq annotation methods based on experimentally labeled immune cell subtype dataset

scAnnotatR: framework to accurately classify cell types in single-cell RNA-sequencing data

Comparison of algorithms used in single-cell transcriptomic data analysis

Systematic Comparative Analysis of Single-Nucleotide Variant Detection Methods from Single-Cell RNA Sequencing Data

Benchmarking single-cell RNA-sequencing protocols for cell atlas projects

scClassify: sample size estimation and multiscale classification of cells using single and multiple reference

A Comprehensive Benchmarking Study on Computational Tools for Cross-omics Label Transfer from Single-cell RNA to ATAC Data

Comparison and benchmark of deep learning methods for non-coding RNA classification