Abstract:BACKGROUND:The regulation of gene expression is complex and occurs at many levels, including transcriptional and post-transcriptional, in metazoans. Transcriptional regulation is mainly determined by sequence elements within the promoter regions of genes while sequence elements within the 3' untranslated regions of mRNAs play important roles in post-transcriptional regulation such as mRNA stability and translation efficiency. Identifying cis-regulatory elements, or motifs, in multicellular eukaryotes is more difficult compared to unicellular eukaryotes due to the larger intergenic sequence space and the increased complexity in regulation. Experimental techniques for discovering functional elements are often time consuming and not easily applied on a genome level. Consequently, computational methods are advantageous for genome-wide cis-regulatory motif detection. To decrease the search space in metazoans, many algorithms use cross-species alignment, although studies have demonstrated that a large portion of the binding sites for the same trans-acting factor do not reside in alignable regions. Therefore, a computational algorithm should account for both conserved and nonconserved cis-regulatory elements in metazoans.RESULTS:We present CompMoby (Comparative MobyDick), software developed to identify cis-regulatory binding sites at both the transcriptional and post-transcriptional levels in metazoans without prior knowledge of the trans-acting factors. The CompMoby algorithm was previously shown to identify cis-regulatory binding sites in upstream regions of genes co-regulated in embryonic stem cells. In this paper, we extend the software to identify putative cis-regulatory motifs in 3' UTR sequences and verify our results using experimentally validated data sets in mouse and human. We also detail the implementation of CompMoby into a user-friendly tool that includes a web interface to a streamlined analysis. Our software allows detection of motifs in the following three categories: one, those that are alignable and conserved; two, those that are conserved but not alignable; three, those that are species specific. One of the output files from CompMoby gives the user the option to decide what category of cis-regulatory element to experimentally pursue based on their biological problem. Using experimentally validated biological datasets, we demonstrate that CompMoby is successful in detecting cis-regulatory target sites of known and novel trans-acting factors at the transcriptional and post-transcriptional levels.CONCLUSION:CompMoby is a powerful software tool for systematic de novo discovery of evolutionarily conserved and nonconserved cis-regulatory sequences involved in transcriptional or post-transcriptional regulation in metazoans. This software is freely available to users at http://genome.ucsf.edu/compmoby/.

A comparative genomic method for computational identification of prokaryotic translation initiation sites

Accuracy improvement for identifying translation initiation sites in microbial genomes

Gene Prediction by the Noise-Assisted MEMD and Wavelet Transform for Identifying the Protein Coding Regions

Hidden Markov Model Variants and their Application

New Solutions of Translation Initiation Site Prediction for Prokaryotic Genomes

MED: a New Non-Supervised Gene Prediction Algorithm for Bacterial and Archaeal Genomes

ProtiGeno: a prokaryotic short gene finder using protein language models

Prediction of Translation Initiation Site in Bacterial and Archaeal Genomes

Identification of new genes on a whole genome scale using saturated reporter transposon mutagenesis

GeneMarkS-2 : Raising Standards of Accuracy in Gene Recognition

Multivariate Entropy Distance Method for Prokaryotic Gene Identification.

Pro-SMP finder–A systematic approach for discovering small membrane proteins in prokaryotes

PreTIS: A Tool to Predict Non-canonical 5’ UTR Translational Initiation Sites in Human and Mouse

Comparative Exon Prediction Based on Heuristic Coding Region Alignment.

An Improved Algorithm On Detecting Transcription And Translation Motif In Archaeal Genomic Sequences

CompMoby: Comparative MobyDick for Detection of Cis-Regulatory Motifs

Computational approach for calculating the probability of eukaryotic translation initiation from ribo-seq data that takes into account leaky scanning

A Proteogenomics Approach Integrating Proteomics and Ribosome Profiling Increases the Efficiency of Protein Identification and Enables the Discovery of Alternative Translation Start Sites.

ProTInSeq: transposon insertion tracking by ultra-deep DNA sequencing to identify translated large and small ORFs

An Integrative and Applicable Phylogenetic Footprinting Framework for Cis-Regulatory Motifs Identification in Prokaryotic Genomes

An HMM-Based Comparative Genomic Framework for Detecting Introgression in Eukaryotes