Comparison and benchmark of deep learning methods for non-coding RNA classification

Constance Creux,Farida Zehraoui,François Radvanyi,Fariza Tahi
DOI: https://doi.org/10.1371/journal.pcbi.1012446
2024-09-13
PLoS Computational Biology
Abstract:The involvement of non-coding RNAs in biological processes and diseases has made the exploration of their functions crucial. Most non-coding RNAs have yet to be studied, creating the need for methods that can rapidly classify large sets of non-coding RNAs into functional groups, or classes. In recent years, the success of deep learning in various domains led to its application to non-coding RNA classification. Multiple novel architectures have been developed, but these advancements are not covered by current literature reviews. We present an exhaustive comparison of the different methods proposed in the state-of-the-art and describe their associated datasets. Moreover, the literature lacks objective benchmarks. We perform experiments to fairly evaluate the performance of various tools for non-coding RNA classification on popular datasets. The robustness of methods to non-functional sequences and sequence boundary noise is explored. We also measure computation time and CO 2 emissions. With regard to these results, we assess the relevance of the different architectural choices and provide recommendations to consider in future methods. RNA can either encode proteins, which perform different functions in the genome, or be non-coding. Non-coding RNAs represent around 98% of the genome, and were long thought to be non-functional. It has now been proven that non-coding RNAs can have diverse biological functions and be involved in diseases. A large proportion of non-coding RNAs has not yet been studied. The function of specific non-coding RNAs can be studied experimentally, but experiments are costly and time-consuming. One possibility to massively characterize the function of non-coding RNAs is to use computational methods to classify them into functional groups, or classes. Recent computational methods for non-coding RNA classification are all based on deep learning, as it leads to faster runtime and improved performance. Our work presents and compares the different approaches adopted in the state-of-the-art, as well as the non-coding RNA datasets that are used. We also present a comprehensive benchmark, measuring classification performance in different conditions, computation time, and CO 2 emissions. The descriptions and comparisons provided are meant to guide researchers in the field, whether wanting to use existing tools or to develop new ones.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?