Searching Maximal Degenerate Motifs Guided by a Compact Suffix Tree.

Hongshan Jiang,Ying Zhao,Wenguang Chen,Weimin Zheng
DOI: https://doi.org/10.1007/978-1-4419-5913-3_3
2010-01-01
Advances in experimental medicine and biology
Abstract:Compared to a mismatched consensus motif, a degenerate consensus motif is more suitable for modeling position-specific variations within motifs. In the literature, the state-of-art methods using degenerate consensus motifs for de novo motif finding use a naïve enumeration algorithm, which is far from efficient. In this paper, we propose an efficient algorithm to extract maximal degenerate consensus motifs from a set of sequences based on a compact suffix tree. Our algorithm achieved a time complexity about [Formula: see text] times lower than that of a naïve enumeration, where [Formula: see text] is the average length of source sequences. To demonstrate the efficiency and effectiveness of our proposed algorithm, we applied it to finding transcription factor binding sites. It is validated on a popular benchmark proposed by Tompa. The executable files of our algorithm can be accessed through http://hpc.cs.tsinghua.edu.cn/bioinfo.
What problem does this paper attempt to address?