Tendency based Subspace Clustering on Gene Expression Data

Jinze Liu,Wei Wang
2003-01-01
Abstract:Microarrays are one of the latest breakthroughs in experimental molecular biology. By monitoring expressions of different genes under different experiments, a large matrix representing the gene expression levels of varying experiments will be produced. To reveal patterns in such matrices, Ben-Dor et al. introduced a probabilistic model to discover the strictly order-preserving submatrix (OPSM) embedded in the gene expression matrix. The proposed algorithm to discover one hidden OPSM with designated column size s, starts from building the smallest partial model with the best qualities, and then iteratively grow the partial model(s) in a number of best directions by including extra elements. This terminates when the column size of the partial model is no smaller than s. Due to the probabilistic nature of this model, it suffers from several drawbacks. The OPSM algorithms favor large row support based on the intuitions that the OPSMs with large row support will be more significant and potentially, can be developed into OPSMs with more columns. However, when there are submatrices with both large and small row supports, the larger submatrices might prevent the smaller ones from discovering since the smaller ones could be eliminated in the early stage of the development. In addition, the probabilistic model and the given algorithm can only determine row support for a certain order of conditions for a specific OPSM. To tackle those problems, we propose a more general model (u-Cluster) and an efficient deterministic algorithm to discover all the submatrices exhibiting tendencies in one run. Experimental study on a yeast gene expression dataset and a drug activity dataset demonstrate that our algorithm is much more robust, more effective and more efficient than the OPSM algorithm.
What problem does this paper attempt to address?