Abstract:Abstract Enhancers, noncoding DNA fragments, play a pivotal role in gene regulation, facilitating gene transcription. Identifying enhancers is crucial for understanding genomic regulatory mechanisms, pinpointing key elements and investigating networks governing gene expression and disease-related mechanisms. Existing enhancer identification methods exhibit limitations, prompting the development of our novel multi-input deep learning framework, termed Enhancer-MDLF. Experimental results illustrate that Enhancer-MDLF outperforms the previous method, Enhancer-IF, across eight distinct human cell lines and exhibits superior performance on generic enhancer datasets and enhancer–promoter datasets, affirming the robustness of Enhancer-MDLF. Additionally, we introduce transfer learning to provide an effective and potential solution to address the prediction challenges posed by enhancer specificity. Furthermore, we utilize model interpretation to identify transcription factor binding site motifs that may be associated with enhancer regions, with important implications for facilitating the study of enhancer regulatory mechanisms. The source code is openly accessible at https://github.com/HaoWuLab-Bioinformatics/Enhancer-MDLF.

What problem does this paper attempt to address?

The main objective of this paper is to propose a new deep learning framework, called Enhancer-MDLF (Multi-input Deep Learning Framework), for identifying cell-specific enhancers. Specifically, the paper aims to address the following key issues: 1. **Limitations of existing methods**: Existing enhancer identification methods have certain limitations, including methods based on conserved sequence and transcription factor binding site data, methods using ChIP-seq data, methods relying on chromatin accessibility-related data, and methods using histone modification data or enhancer RNA (eRNA) data. These methods either have a high false positive rate, cannot distinguish enhancers from promoter regions, or are limited in predicting enhancers with inactive transcription. 2. **Need for computational tools**: Due to the time-consuming and costly nature of experimental methods, there is a need to develop reliable computational tools to identify enhancers. 3. **Issues with existing computational methods**: Although several computational methods have been proposed for enhancer identification, they are usually based on a general dataset containing 9 different cell lines, which overlooks the cell specificity of enhancers. Additionally, these methods may perform poorly when handling sequences of unequal lengths and require significant time for parameter optimization when applied to new cell lines. 4. **Improving existing frameworks**: The Enhancer-IF framework mentioned in the paper, although considering cell specificity, still needs improvement in predictive performance, and its model lacks interpretability, making it difficult to explore the role of transcription factor binding sites (TFBS) in enhancer regions. To address the above challenges, the paper proposes the Enhancer-MDLF, a multi-input deep learning framework that combines word vector features of human genome sequences and motif features extracted from position weight matrices (PWM). Comprehensive evaluations on various datasets demonstrate that Enhancer-MDLF has significant advantages over previous methods, particularly in cell-specific enhancer prediction. Additionally, the framework introduces transfer learning to address cross-cell line prediction challenges brought by enhancer specificity and provides model interpretability to identify the most important TFBS motifs within enhancer regions.

Enhancer-MDLF: a novel deep learning framework for identifying cell-specific enhancers

DeepEnhancer: Predicting Enhancers by Convolutional Neural Networks.

Predicting Enhancers with Deep Convolutional Neural Networks

ADH-Enhancer: an attention-based deep hybrid framework for enhancer identification and strength prediction

EnhancerBD identifing sequence feature

A deep learning based two-layer predictor to identify enhancers and their strength

DeepCAPE: A Deep Convolutional Neural Network for the Accurate Prediction of Enhancers.

DeepRegFinder: deep learning-based regulatory elements finder

A novel interpretable deep learning-based computational framework designed synthetic enhancers with broad cross-species activity

PEDLA: Predicting Enhancers with a Deep Learning-Based Algorithmic Framework

An Interpretable Deep Learning Approach for Enhancer Classification

Predicting enhancer-promoter interactions by deep learning and matching heuristic

iEnhancer-DCSV: Predicting enhancers and their strength based on DenseNet and improved convolutional block attention module

A New Method for Enhancer Prediction Based on Deep Belief Network

Enhancer-FRL: Improved and Robust Identification of Enhancers and Their Activities Using Feature Representation Learning

A Dual-Feature Input DNABert Based Deep Learning Method for Enhancer Recognition

iEnhancer-SKNN: a stacking ensemble learning-based method for enhancer identification and classification using sequence information.

iEnhancer-EBLSTM: Identifying Enhancers and Strengths by Ensembles of Bidirectional Long Short-Term Memory

iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models

iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor

DeepICSH: a complex deep learning framework for identifying cell-specific silencers and their strength from the human genome