A Frequent Pattern Mining Method for Finding Planted Motifs of Unknown Length in DNA Sequences.

Caiyan Jia,Ruqian Lu,Lusheng Chen
DOI: https://doi.org/10.1007/978-3-642-16248-0_37
2010-01-01
Abstract:Identification and characterization of gene regulatory binding motifs is one of the fundamental tasks toward systematically understanding the molecular mechanisms of transcriptional regulation. Recently, the problem has been abstracted as the challenge planted ( l , d )-motif problem. Previous studies have developed numerous methods to solve the problem. But most of methods need to specify the length l of a motif in advance. In this study, we present an exact and efficient algorithm, called Apriori-Motif, without given l . The algorithm uses breadth first search and prunes the search space quickly by the downward closure property used in Apriori, a classical algorithm of frequent pattern mining. Empirical study shows that Apriori-Motif is better than some existing methods.
What problem does this paper attempt to address?