Quantitative Association Analysis Using Tree Hierarchies

Feng Pan,Lynda Yang,Leonard McMillan,Fernando Pardo Manuel de Villena,David Threadgill,Wei Wang
DOI: https://doi.org/10.1109/icdm.2008.100
2008-01-01
Abstract:Association analysis arises in many important applications such as bioinformatics and business intelligence. Given a large collection of measurements over a set of samples, association analysis aims to find dependencies of target variables to subsets of measurements. Most previous algorithms adopt a two-stage approach; they first group samples based on the similarity in the subset of measurements, and then they examine the association between these groups and the specified target variables without considering the inter-group similarities or alternative groupings. This can lead to cases where the strength of association depends significantly on arbitrary clustering choices. In this paper, we propose a tree-based method for quantitative association analysis. Tree hierarchies derived from sample similarities represent many possible sample groupings. They also provide a natural way to incorporate domain knowledge such as ontologies and to identify and remove outliers. Given a tree hierarchy, our association analysis evaluates all possible groupings and selects the one with strongest association to the target variable. We introduce an efficient algorithm, TreeQA, to systematically explore the search-space of all possible groupings in a set of input trees, with integrated permutation tests. Experimental results show that TreeQA is able to handlelarge-scale association analysis very efficiently and is more effective and robust in association analysis than previous methods.
What problem does this paper attempt to address?