Evaluation of Mixed-Valued Features Via Set Cover Criteria

Xin Xu,Wei Wang,Guilin Zhang
2013-01-01
Abstract:Traditional feature evaluation methods, such as information gain, entropy and mutual information, generally evaluate the discriminating power of individual features independently based upon a vast varieties of metrics, referred as the TopK approach. Though a few feature evaluation methods, such as wrappers and criterion function, evaluate the discriminating power of a subset of features instead, they are usually either based upon a heuristic scheme or suffer a burden of high computational cost. As a result, when applied for multi-class classification on large data sets, existing feature evaluation methods either suffer the “siren pitfall” of a surplus of discriminating features for some classes while lack of discriminating features for the remaining classes, or become inapplicable due to the problems of repeatability and computational cost. Specifically, when applied for multiclass classification, the TopK approach overweighs individual discriminating features while lack the concern of their collective discrimination, and the optimal feature subsets discovered by wrapper's method are influenced by the corresponding classifier, lack of repeatability, let alone a rather high computation cost. In this paper, we propose an effective feature evaluation method for mixed-valued data sets via set cover criteria. Our set cover feature evaluation method gains several advantages in addressing the “siren pitfall” problem: its feature selection scheme is more robust and relies on little prior knowledge, its feature evaluation process is repeatable and the computational cost is rather low. In addition to that, the set cover method is applicable on mixed-valued data sets and able to weigh the discriminating power of features quantificationally. Experimental results indicate the effectiveness of our set cover method1.
What problem does this paper attempt to address?