Deep learning: new computational modelling techniques for genomics

Gökcen Eraslan,Žiga Avsec,Julien Gagneur,Fabian J. Theis
DOI: https://doi.org/10.1038/s41576-019-0122-6
IF: 59.581
2019-04-10
Nature Reviews Genetics
Abstract:As a data-driven science, genomics largely utilizes machine learning to capture dependencies in data and derive novel biological hypotheses. However, the ability to extract new insights from the exponentially increasing volume of genomics data requires more expressive machine learning models. By effectively leveraging large data sets, deep learning has transformed fields such as computer vision and natural language processing. Now, it is becoming the method of choice for many genomics modelling tasks, including predicting the impact of genetic variation on gene regulatory mechanisms such as DNA accessibility and splicing.
genetics & heredity
What problem does this paper attempt to address?
This paper aims to explore the application of deep learning in genomics and its modeling techniques. Specifically, the paper attempts to solve the following problems: 1. **Data - driven scientific challenges**: With the explosive growth of genomics data, how to extract new biological hypotheses and dependencies from these large - scale data has become an important challenge. Traditional hypothesis - based scientific research methods may no longer be applicable, and more flexible and powerful analysis tools are required to process these data. 2. **The importance of feature representation**: The performance of machine - learning algorithms largely depends on the way data is represented, that is, the selection and calculation of features. For example, in the tumor classification task, manually - designed features (such as cell count) may not be able to capture key visual features (such as cell morphology, the distance between cells or the position within an organ), thus affecting the accuracy of classification. Deep learning can automatically discover highly - complex related features and improve prediction accuracy by embedding the feature - calculation process. 3. **The application of deep neural networks**: The paper details several main types of neural networks (fully - connected networks, convolutional networks, recurrent networks, and graph - convolutional networks) and their applications in genomics. For example, convolutional neural networks (CNNs) have been successfully applied to predict transcription factor binding sites, chromatin features, DNA contact maps, DNA methylation, gene expression, and other molecular phenotypes. Recurrent neural networks (RNNs) are used to model long - range dependencies in genomic sequences. 4. **Multi - task learning and multi - modal learning**: In order to integrate multiple datasets and data types, the paper discusses two modeling techniques, multi - task learning and multi - modal learning. These techniques help to improve the generalization ability and prediction performance of the model. 5. **Transfer learning and model interpretation**: The paper also introduces transfer - learning techniques, which can rapidly develop new models, and discusses how to interpret deep - learning models, which is crucial for understanding the decision - making process of the model. 6. **Unsupervised - learning techniques**: Finally, the paper discusses two unsupervised - learning techniques - autoencoders and generative adversarial networks (GANs), which have been applied in single - cell genomics. Overall, this paper attempts to demonstrate the potential and advantages of deep learning in genomics, especially its ability to handle large - scale and complex data, by introducing and analyzing various techniques and applications of deep learning.