An Effective Biclustering Algorithm for Time-Series Gene Expression Data.

Huixin Xu,Yun Xue,Zhihao Lu,Xiaohui Hu,Hongya Zhao,Zhengling Liao,TieChen Li
DOI: https://doi.org/10.1007/978-3-662-45652-1_12
2014-01-01
Abstract:The biclustering is a useful tool in analysis of massive gene expression data, which performs simultaneous clustering on rows and columns of the data matrix to find subsets of coherently expressed genes and conditions. Especially, in analysis of time-series gene expression data, it is meaningful to restrict biclusters to contiguous time points concerning coherent evolutions. In this paper, the BCCC-Bicluster is proposed as an extension of the CCC-Bicluster. An algorithm based on the frequent sequential mining is proposed to find all maximal BCCC-Biclusters. The newly defined Frequent-Infrequent Tree-Array (FITA) is constructed to speed up the traversal process, with useful strategies originating from Apriori Property to avoid redundant search. To make it more efficient, the bitwise operation XOR is applied to capture identical or opposite contiguous patterns between two rows. The algorithm is tested on the yeast microarray data. Experimental results show that the proposed algorithm is able to find all embedded BCCC-Biclusters, which are proven to reveal significant GO terms involved in biological processes.
What problem does this paper attempt to address?