Abstract:Regulatory elements are responsible for regulating gene transcription. Therefore, identification of these elements is a tremendous challenge in the field of gene expression. Transcription factors (TFs) play a key role in gene regulation by binding to target promoter sequences. A set of conserved sequence patterns with a highly similar structure that is bound by a TF is called a motif. Motif discovery has been a difficult problem over the past decades. Meanwhile, it is a foundation stone in meeting this challenge. Recent advances in obtaining genomic sequences and high-throughput gene expression analysis techniques have enabled the rapid development of computational methods for motif discovery. As a result, a large number of motif-finding algorithms aiming at various motif models have sprung up in the past few years. However, most of them are not suitable for analysis of the large data sets generated by next-generation sequencing. To better handle large-scale ChIP-Seq data and achieve better performance in computational time and motif detection accuracy, we propose an excellent motif-finding algorithm known as GSMC (Combining Parallel Gibbs Sampling with Maximal Cliques for hunting DNA Motif). The GSMC algorithm consists of two steps. First, we employ the commonly used Gibbs sampling to generating initial motifs. Second, we utilize maximal cliques to cluster motifs according to Similarity with Position Information Contents (SPIC). Consequently, we raise the detection accuracy in a great degree, in the meantime holding comparative computation efficiency. In addition, we can find much more credible cofactor interacting motifs.

Discovering Motifs in DNA Sequences.

Discovering Maximal Frequent Patterns in Sequence Groups

Motif Caller: Sequence Reconstruction for Motif-Based DNA Storage

A Comprehensive Survey on Genetic Algorithms for DNA Motif Prediction

Discovering Frequent Patterns in Sequence Groups

Discovering DNA shape motifs with multiple DNA shape features: generalization, methods, and validation

A Frequent Pattern Mining Method for Finding Planted Motifs of Unknown Length in DNA Sequences.

An Improved Genetic Algorithm For Dna Motif Discovery With Public Domain Information

A Motif Finding Algorithm Based on Color Coding Technology

Automated DNA Motif Discovery

Comparison and Analysis on Subtle Motifs Discovery Algorithms in DNA Sequence

GSMC: Combining Parallel Gibbs Sampling with Maximal Cliques for Hunting DNA Motif

Overlap-Based Similarity Metrics for Motif Search in DNA Sequences

An Algorithm for Motif Discovery with Iteration on Lengths of Motifs

MotifHub: Detection of trans-acting DNA motif group with probabilistic modeling algorithm

A Computational Approach to Finding RNA Tertiary Motifs in Genomic Sequences

Building Innovative Representations of DNA Sequences to Facilitate Gene Finding.

Finding Sequence Features in Tissue-specific Sequences

A Framework for Discovering Variable-length Motifs in Medical Data Streams

MotifMark: Finding regulatory motifs in DNA sequences

Motif Discoveries in Unaligned Molecular Sequences Using Self-Organizing Neural Networks