A new algorithm for mining frequent closed patterns in gene expression datasets

MIAO Yu-qing,CHEN Guo-liang,XU Yun
DOI: https://doi.org/10.3969/j.issn.0253-2778.2007.09.006
2007-01-01
Journal of University of Science and Technology of China
Abstract:Unlike the traditional datasets,gene expression datasets typically contain a huge number of items and a few transactions.While there are large numbers of algorithms developed for frequent closed patterns mining,their running time increased exponentially with increasing average length of the transactions,thus such gene expression datasets render most current algorithms impractical.TPclose,a new efficient algorithm for mining frequent closed patterns from gene expression datasets was proposed.It stored the tidset of each item using a TP-tree(tidset-prefix tree).TPclose converted the problem of mining frequent closed patterns into one of mining frequent closed tidsets,adopting the top-down and divide-and-conquer search strategy to explore transaction enumeration search space and combining efficient pruning and effective optimizing.Several experiments on real-life gene expression datasets show that TPclose outperforms RERⅡ,an existing algorithm based on bottom-up search strategy,by up to two orders of magnitude.
What problem does this paper attempt to address?