Determination of the Optimal Number of Features for Quadratic Discriminant Analysis Via the Normal Approximation to the Discriminant Distribution

JP Hua,ZX Xiong,ER Dougherty
DOI: https://doi.org/10.1016/j.patcog.2004.08.007
IF: 8
2004-01-01
Pattern Recognition
Abstract:Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically, for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The problem is especially acute when sample sizes are very small and the potential number of features is very large. To obtain a general understanding of the kinds of feature-set sizes that provide good performance for a particular classification rule, performance must be evaluated based on accurate error estimation, and hence a model-based setting for optimizing the number of features is needed. This paper treats quadratic discriminant analysis (QDA) in the case of unequal covariance matrices. For two normal class-conditional distributions, the QDA classifier is determined according to a discriminant. The standard plug-in rule estimates the discriminant from a feature-label sample to obtain an estimate of the discriminant by replacing the means and covariance matrices by their respective sample means and sample covariance matrices. The unbiasedness of these estimators assures good estimation for large samples, but not for small samples.
What problem does this paper attempt to address?