Abstract:Abstract Motivation Predicting regulatory effects of genetic variants is a challenging but important problem in functional genomics. Given the relatively low sensitivity of functional assays, and the pervasiveness of class imbalance in functional genomic data, popular statistical prediction models can sharply underestimate the probability of a regulatory effect. We describe here the presence-only model (PO-EN), a type of semisupervised model, to predict regulatory effects of genetic variants at sequence-level resolution in a context of interest by integrating a large number of epigenetic features and massively parallel reporter assays (MPRAs). Results Using experimental data from a variety of MPRAs we show that the presence-only model produces better calibrated predicted probabilities and has increased accuracy relative to state-of-the-art prediction models. Furthermore, we show that the predictions based on pretrained PO-EN models are useful for prioritizing functional variants among candidate eQTLs and significant SNPs at GWAS loci. In particular, for the costimulatory locus, associated with multiple autoimmune diseases, we show evidence of a regulatory variant residing in an enhancer 24.4 kb downstream of CTLA4, with evidence from capture Hi-C of interaction with CTLA4. Furthermore, the risk allele of the regulatory variant is on the same risk increasing haplotype as a functional coding variant in exon 1 of CTLA4, suggesting that the regulatory variant acts jointly with the coding variant leading to increased risk to disease. Availability and implementation The presence-only model is implemented in the R package ‘PO.EN’, freely available on CRAN. A vignette describing a detailed demonstration of using the proposed PO-EN model can be found on github at https://github.com/Iuliana-Ionita-Laza/PO.EN/ Supplementary information Supplementary data are available at Bioinformatics online.

Semi-supervised learning improves regulatory sequence prediction with unlabeled sequences

Semi-supervised deep learning with graph neural network for cross-species regulatory sequence prediction

Semi-supervised learning with pseudo-labeling compares favorably with large language models for regulatory sequence prediction

Pre-training with pseudo-labeling compares favorably with large language models for regulatory sequence prediction

A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations

Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation

A semisupervised model to predict regulatory effects of genetic variants at single nucleotide resolution using massively parallel reporter assays

Multimodal learning of noncoding variant effects using genome sequence and chromatin structure

Advancing regulatory genomics with machine learning

Deep learning approaches for non-coding genetic variant effect prediction: current progress and future prospects

Semi-Supervised Prediction of Gene Regulatory Networks Using Machine Learning Algorithms

Fundamentals for predicting transcriptional regulations from DNA sequence patterns

Predicting gene regulatory regions with a convolutional neural network for processing double-strand genome sequence information

Predicting Functional Elements and Variants Effects in Non-Coding Regions Based on Deep Learning

Predicting exonization in the human genome with a deep learning model

Prediction of disease-associated functional variants in noncoding regions through a comprehensive analysis by integrating datasets and features

Supervised Learning-Based Tagsnp Selection for Genome-Wide Disease Classifications

Graphylo: A deep learning approach for predicting regulatory DNA and RNA sites from whole-genome multiple alignments

sscNOVA: a semi-supervised convolutional neural network for predicting functional regulatory variants in autoimmune diseases

A regulatory-sequence classifier with a neural network for genomic information processing

Mining Functionally Related Genes with Semi-Supervised Learning