Abstract:Abstract Motivation In recent years, circular RNAs (circRNAs), the particular form of RNA with a closed-loop structure, have attracted widespread attention due to their physiological significance (they can directly bind proteins), leading to the development of numerous protein site identification algorithms. Unfortunately, these studies are supervised and require the vast majority of labeled samples in training to produce superior performance. But the acquisition of sample labels requires a large number of biological experiments and is difficult to obtain. Results To resolve this matter that a great deal of tags need to be trained in the circRNA-binding site prediction task, a self-supervised learning binding site identification algorithm named CircSI-SSL is proposed in this article. According to the survey, this is unprecedented in the research field. Specifically, CircSI-SSL initially combines multiple feature coding schemes and employs RNA_Transformer for cross-view sequence prediction (self-supervised task) to learn mutual information from the multi-view data, and then fine-tuning with only a few sample labels. Comprehensive experiments on six widely used circRNA datasets indicate that our CircSI-SSL algorithm achieves excellent performance in comparison to previous algorithms, even in the extreme case where the ratio of training data to test data is 1:9. In addition, the transplantation experiment of six linRNA datasets without network modification and hyperparameter adjustment shows that CircSI-SSL has good scalability. In summary, the prediction algorithm based on self-supervised learning proposed in this article is expected to replace previous supervised algorithms and has more extensive application value. Availability and implementation The source code and data are available at https://github.com/cc646201081/CircSI-SSL.

Self-Supervised Representation Learning for Basecalling Nanopore Sequencing Data

MSRCall: A Multi-scale Deep Neural Network to Basecall Oxford Nanopore Sequences

Delineating the Effective Use of Self-Supervised Learning in Single-Cell Genomics

A theoretical model for the histamine H2-receptor.

Benchmarking Self-Supervised Learning for Single-Cell Data

Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing

Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models

Semi-supervised learning with pseudo-labeling compares favorably with large language models for regulatory sequence prediction

Narrowing the Gap between Supervised and Unsupervised Sentence Representation Learning with Large Language Model

CircSI-SSL: circRNA-binding site identification based on self-supervised learning

Blind Biological Sequence Denoising with Self-Supervised Set Learning

RNAmigos2: Fast and accurate structure-based RNA virtual screening with semi-supervised graph learning and large-scale docking data

Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

Improving Self-supervised Molecular Representation Learning using Persistent Homology

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

BaseNet: A Transformer-Based Toolkit for Nanopore Sequencing Signal Decoding

On the Discriminability of Self-Supervised Representation Learning

Self-Supervised Learning for Endoscopic Video Analysis

Self-Supervised Anomaly Detection in the Wild: Favor Joint Embeddings Methods

S2Snet: deep learning for low molecular weight RNA identification with nanopore

Self-Supervised Learning for Improved Synthetic Aperture Sonar Target Recognition