A nonparametric variable clustering model

David A. Knowles,Konstantina Palla,Zoubin Ghahramani
2012-12-03
Abstract:Factor analysis models effectively summarise the covariance structure of high dimensional data, but the solutions are typically hard to interpret. This motivates attempting to find a disjoint partition, i.e. a simple clustering, of observed variables into highly correlated subsets. We introduce a Bayesian non-parametric approach to this problem, and demonstrate advantages over heuristic methods proposed to date. Our Dirichlet process variable clustering (DPVC) model can discover block-diagonal covariance structures in data. We evaluate our method on both synthetic and gene expression analysis problems.
Biology,Mathematics,Computer Science
What problem does this paper attempt to address?