A Tree-Based Contrast Set-Mining Approach to Detecting Group Differences

Hongyan Liu,Yinghui (Catherine) Yang,Zhuohua Chen,Yong Zheng
DOI: https://doi.org/10.1287/ijoc.2013.0558
IF: 3.288
2014-01-01
INFORMS Journal on Computing
Abstract:Understanding differences between groups in a data set is one of the fundamental tasks in data analysis. As relevant applications accumulate, data-mining methods have been developed to specifically address the problem of group difference detection. Contrast set mining discovers group differences in the form of conjunction of feature-value pairs or items. In this paper, we incorporate absolute difference, relative difference, and statistical significance in our definition of a group difference, and develop a novel method named DIFF that uses the prefix-tree structure to compress the search space, follows a tree traversal procedure to discover the complete set of significant group differences, and employs efficient pruning strategies to expedite the search process. We conducted comprehensive experiments to compare our method with existing methods on completeness of results, pruning efficiency, and computational efficiency. The experiments demonstrate that our method guarantees completeness of results and achieves higher pruning efficiency and computational efficiency compared to STUCCO. In addition, our definition of group difference is more general than STUCCO. Our method is more effective than traditional approaches, such as classification trees, in discovering the complete set of significant group differences.
What problem does this paper attempt to address?