Abstract:A majority of the human genome consists of sequences that do not code for a particular protein, so called non-coding DNA. The non-coding regions nonetheless play a vital role in gene expression. These non-coding regions of the DNA contain cis-regulatory elements such as promoters and enhancers. These regions can be bound by transcription factor proteins and thereby controlling the rate of transcription of DNA to messenger RNA. This then helps to regulate the expression of nearby genes. Next-generation sequencing (NGS) techniques allow for identifying and studying the genomic factors such as transcription factor binding, histone modifications and open chromatin that underlie transcription with great sequencing depth. Furthermore, these data allow researchers to build predictive models for these events using machine learning approaches, which permit the annotation of new cell types without having to perform the experiment. In particular, convolutional neural networks seem to be well suited to model genomic data. A convolutional neural network (CNN) is a type of feed-forward neural network inspired by the animal visual cortex. CNNs are characterized by having spatially local connections. This connectivity pattern allows CNNs to be effective on data that have a grid-like topologies. In other words, data that can be represented by nodes which are connected to neighbors along one or more dimensions, where neighboring elements have statistical dependencies. Recently, algorithmic advances as well as great improvements in processing capabilities and tools and better datasets have made it possible to train increasingly complex models. Indeed, deep convolutional neural networks have proven to be very successful on many artificial intelligence tasks such as image classification, finding policy and value functions for game playing AI and drug discovery. As for typical NGS data, which includes DNA sequences, open chromatin and transcription factor binding data, these are all one dimensional grids. Identifying transcription factor binding sites can greatly help researchers understand the transcription process and the underlying factors to genetic diseases. In the first experiment, convolutional neural networks models were built to predict transcription factor binding sites from sequence, open chromatin, gene expression and DNA shape data. We found the convolutional neural network to perform close to the state of the art on some transcription factors, while performing significantly worse on others. Building models for each task separately resulted in better predictive performance than a multi-task network modeling all transcription factors simultaneously. In the second experiment, we took a closer look at the transcription process. The exact location of transcription initiation, the transcription start site (TSS), can be determined experimentally at base pair resolution. Unlike translation, where the exact amino acid triplet for starting the translation process is known, translation is less well understood. We studied the transcription process by building a convolutional neural network to predict the exact positions of the transcription starts sites. The trained models were then interpreted, which lead to the finding that the area directly around the TSS site is most decisive factor for determining whether a particular base is a TSS, which to best of our knowledge is not reported in literature.

Convolutional Neural Networks for Regulatory Genomics

Frequency of movement disorders in an Ethiopian university practice

Study On Transformation Of Natural Organic Matter In Source Water During Chlorination And Its Chlorinated Products Using Ultrahigh Resolution Mass Spectrometry

Spectroscopic Characteristics and Disinfection Byproduct Formation During UV-assisted Photoelectrochemical Degradation of Humic Acid

Interpretable Machine Learning and Reactomics Assisted Isotopically Labeled FT-ICR-MS for Exploring the Reactivity and Transformation of Natural Organic Matter during Ultraviolet Photolysis.

Characterization of the transformation of natural organic matter and disinfection byproducts after chlorination, ultraviolet irradiation and ultraviolet irradiation/chlorination treatment

Formation of nitro(so) and chlorinated products and toxicity alteration during the UV/monochloramine treatment of phenol

Tracing nitrogenous byproducts during ozonation in the presence of bromide and ammonia using stable isotope labeling and high resolution mass spectrometry

Enhanced Formation of Trichloronitromethane Precursors During UV/monochloramine Treatment

Insight into the mechanisms of trichloronitromethane formation by vacuum ultraviolet: QSAR model and FTICR-MS analysis

UV/chlorine and chlorination of effluent organic matter fractions: Tracing nitrogenous DBPs using FT-ICR mass spectrometry

Mechanisms of extracellular S0 globule production and degradation in Chlorobaculumtepidum via dynamic cell-globule interactions.

DBP alteration from NOM and model compounds after UV/persulfate treatment with post chlorination

Formation of halogenated organic byproducts during medium-pressure UV and chlorine coexposure of model compounds, NOM and bromide

Molecular insights towards changing behaviors of organic matter in a full-scale water treatment plant using FTICR-MS

Identification of important precursors and theoretical toxicity evaluation of byproducts driving cytotoxicity and genotoxicity in chlorination

Trace level nitrite sensitized photolysis of the antimicrobial agents parachlormetaxylenol and chlorophene in water

Molecular Weight Fraction-Specific Transformation of Natural Organic Matter During Hydroxyl Radical and Sulfate Radical Oxidation

Generation of Reactive Nitrogen Species in UV Photolysis of Dichloramine and Their Incorporation into Nitrogenous Byproducts

Formation of nitrogenous disinfection byproducts in MP UV-based water treatments of natural organic matters: The role of nitrate

Quantification of the Diverse Inhibitory Effects of Dissolved Organic Matter on Transformation of Micropollutants in UV/persulfate Treatment.