Abstract:Enhancers are short non-coding DNA sequences outside of the target promoter regions that can be bound by specific proteins to increase a gene’s transcriptional activity, which has a crucial role in the spatiotemporal and quantitative regulation of gene expression. However, enhancers do not have a specific sequence motifs or structures, and their scattered distribution in the genome makes the identification of enhancers from human cell lines particularly challenging. Here we present a novel, stacked multivariate fusion framework called SMFM, which enables a comprehensive identification and analysis of enhancers from regulatory DNA sequences as well as their interpretation. Specifically, to characterize the hierarchical relationships of enhancer sequences, multi-source biological information and dynamic semantic information are fused to represent regulatory DNA enhancer sequences. Then, we implement a deep learning–based sequence network to learn the feature representation of the enhancer sequences comprehensively and to extract the implicit relationships in the dynamic semantic information. Ultimately, an ensemble machine learning classifier is trained based on the refined multi-source features and dynamic implicit relations obtained from the deep learning-based sequence network. Benchmarking experiments demonstrated that SMFM significantly outperforms other existing methods using several evaluation metrics. In addition, an independent test set was used to validate the generalization performance of SMFM by comparing it to other state-of-the-art enhancer identification methods. Moreover, we performed motif analysis based on the contribution scores of different bases of enhancer sequences to the final identification results. Besides, we conducted interpretability analysis of the identified enhancer sequences based on attention weights of EnhancerBERT, a fine-tuned BERT model that provides new insights into exploring the gene semantic information likely to underlie the discovered enhancers in an interpretable manner. Finally, in a human placenta study with 4,562 active distal gene regulatory enhancers, SMFM successfully exposed tissue-related placental development and the differential mechanism, demonstrating the generalizability and stability of our proposed framework.

Integrative Machine Learning Framework for the Identification of Cell-Specific Enhancers from the Human Genome

Enhancer-MDLF: a novel deep learning framework for identifying cell-specific enhancers

Predicting Enhancers with Deep Convolutional Neural Networks

Ienhancer-Dhf: Identification of Enhancers and Their Strengths Using Optimize Deep Neural Network with Multiple Features Extraction Methods

DeepEnhancer: Predicting Enhancers by Convolutional Neural Networks.

iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach.

iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor

ADH-Enhancer: an attention-based deep hybrid framework for enhancer identification and strength prediction

Ienhancer-Kl: A Novel Two-Layer Predictor for Identifying Enhancers by Position Specific of Nucleotide Composition

Ienhancer-Mrbf: Identifying Enhancers and Their Strength with a Multiple Laplacian-regularized Radial Basis Function Network.

In silico identification of enhancers on the basis of a combination of transcription factor binding motif occurrences

A deep learning based two-layer predictor to identify enhancers and their strength

Supervised enhancer prediction with epigenetic pattern recognition and targeted validation

Ienhancer-Dcla: Using the Original Sequence to Identify Enhancers and Their Strength Based on a Deep Learning Framework

DeepCAPE: A Deep Convolutional Neural Network for the Accurate Prediction of Enhancers.

Genome-wide Identification and Characterization of DNA Enhancers with a Stacked Multivariate Fusion Framework

iEnhancer-SKNN: a stacking ensemble learning-based method for enhancer identification and classification using sequence information.

Ienhancer-Dlra: Identification of Enhancers and Their Strengths by a Self-Attention Fusion Strategy for Local and Global Features.

iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models

Erfsvm: a Hybrid Classifier to Predict Enhancers-Integrating Random Forests with Support Vector Machines

iEnhancer-DCSV: Predicting enhancers and their strength based on DenseNet and improved convolutional block attention module