On Demand Phenotype Ranking Through Subspace Clustering.

Xiang Zhang,Wei Wang,Jun Huan
DOI: https://doi.org/10.1137/1.9781611972771.72
2007-01-01
Abstract:High throughput biotechnologies have enabled scientists to collect a large number of genetic and phenotypic attributes for a large collection of samples. Computational methods are in need to analyze these data for discovering genotype-phenotype associations and inferring possible phenotypes from genotypic attributes. In this paper, we study the problem of on demand phenotype ranking. Given a query sample, for which only its genetic information is available, we want to predict the possible phenotypes it may have, ranked in descending order of their likelihood. This problem is challenging since genotype-phenotype databases are updated often and explicitly mine and maintain all patterns is impractical. We propose an on-demand ranking algorithm that uses a modified pattern-based subspace clustering algorithm to effectively identify the subspaces where these relevant clusters may reside. Using this algorithm, we can compute the clusters and their prediction significance for any phenotypes on the fly. Our experiments demonstrate the efficiency and effectiveness of our algorithm.
What problem does this paper attempt to address?