ADH-Enhancer: an attention-based deep hybrid framework for enhancer identification and strength prediction
Faiza Mehmood,Shazia Arshad,Muhammad Shoaib
DOI: https://doi.org/10.1093/bib/bbae030
IF: 9.5
2024-02-24
Briefings in Bioinformatics
Abstract:Enhancers play an important role in the process of gene expression regulation. In DNA sequence abundance or absence of enhancers and irregularities in the strength of enhancers affects gene expression process that leads to the initiation and propagation of diverse types of genetic diseases such as hemophilia, bladder cancer, diabetes and congenital disorders. Enhancer identification and strength prediction through experimental approaches is expensive, time-consuming and error-prone. To accelerate and expedite the research related to enhancers identification and strength prediction, around 19 computational frameworks have been proposed. These frameworks used machine and deep learning methods that take raw DNA sequences and predict enhancer's presence and strength. However, these frameworks still lack in performance and are not useful in real time analysis. This paper presents a novel deep learning framework that uses language modeling strategies for transforming DNA sequences into statistical feature space. It applies transfer learning by training a language model in an unsupervised fashion by predicting a group of nucleotides also known as k-mers based on the context of existing k-mers in a sequence. At the classification stage, it presents a novel classifier that reaps the benefits of two different architectures: convolutional neural network and attention mechanism. The proposed framework is evaluated over the enhancer identification benchmark dataset where it outperforms the existing best-performing framework by 5%, and 9% in terms of accuracy and MCC. Similarly, when evaluated over the enhancer strength prediction benchmark dataset, it outperforms the existing best-performing framework by 4%, and 7% in terms of accuracy and MCC.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Enhancer identification and its strength prediction**. Specifically, enhancers play an important role in the process of gene expression regulation. The presence or absence and the abnormality of their strength will affect the gene expression process, and then lead to a variety of genetic diseases, such as hemophilia, bladder cancer, diabetes and congenital diseases. Identifying enhancers and their strength through experimental methods is both expensive and time - consuming, and error - prone. Therefore, researchers have proposed a variety of computational frameworks to accelerate this research process, but the existing frameworks still have deficiencies in performance and cannot achieve real - time analysis.
To solve these problems, this paper proposes a new deep - learning framework **ADH - Enhancer**, aiming to improve enhancer identification and strength prediction in the following ways:
1. **Feature extraction**: Use the language model strategy to transform DNA sequences into a statistical feature space. This framework trains a language model in an unsupervised manner to predict a set of nucleotides (k - mer) based on the context of the existing k - mer, thereby capturing the semantic information in the DNA sequence.
2. **Classifier design**: A new classifier is introduced, which combines the advantages of convolutional neural network (CNN) and attention mechanism to more accurately extract and learn the patterns of nucleotides.
3. **Performance evaluation**: Evaluations were carried out on two benchmark datasets for enhancer identification and strength prediction tasks respectively. The results show that the performance of this framework on these two tasks is better than that of the existing best frameworks, with the accuracy and MCC (Matthews correlation coefficient) increased by 5% and 9%, 4% and 7% respectively.
In summary, this paper aims to improve the accuracy and efficiency of enhancer identification and strength prediction by developing a deep - hybrid framework (ADH - Enhancer) based on the attention mechanism, thereby promoting the research progress in related fields.