Discovering Statistically Non-Redundant Subgroups

Jiuyong Li,Jixue Liu,Hannu Toivonen,Kenji Satou,Youqiang Sun,Bingyu Sun
DOI: https://doi.org/10.1016/j.knosys.2014.04.030
IF: 8.139
2014-01-01
Knowledge-Based Systems
Abstract:The objective of subgroup discovery is to find groups of individuals who are statistically different from others in a large data set. Most existing measures of the quality of subgroups are intuitive and do not precisely capture statistical differences of a group with the other, and their discovered results contain many redundant subgroups. Odds ratio is a statistically sound measure to quantify the statistical difference of two groups for a certain outcome and it is a very suitable measure for quantifying the quality of subgroups. In this paper, we propose a statistically sound framework for statistically non-redundant subgroup discovery: measuring the quality of subgroups by the odds ratio and defining statistically non-redundant subgroups by the error bounds of odds ratios. We show that our proposed method is faster than most existing methods and discovers complete statistically non-redundant subgroups.
What problem does this paper attempt to address?