Robustness for Evaluating Rule’s Generalization Capability in Data Mining

Dianhui Wang,Tharam S. Dillon,Xiaohang Ma
DOI: https://doi.org/10.1007/978-3-540-24581-0_60
2003-01-01
Abstract:The evaluation of production rules generated by different data mining algorithms currently depends upon the data set used, thus their generalization capability cannot be estimated. Our method consists of three steps. Firstly, we take a set of rules, copy these rules into a population of rules, and then perturb the parameters of individuals in this population. Secondly, the maximum robustness bounds for the rules is then found using genetic algorithms, where the performance of each individual is measured with respect to the training data. Finally, the relationship between maximum robustness bounds and generalization capability is constructed using statistical analysis for a large number of rules. The significance of this relationship is that it allows the algorithms that mine rules to be compared in terms of robustness bounds, independent of the test data. This technique is applied in a case study to a protein sequence classification problem.
What problem does this paper attempt to address?