An Efficient Variable Selection Method for Predictive Discriminant Analysis

A. Iduseri,J. E. Osemwenkhae
DOI: https://doi.org/10.1007/s40745-015-0061-9
2015-12-01
Annals of Data Science
Abstract:Seeking a subset of relevant predictor variables for use in predictive model construction in order to simplify the model, obtain shorter training time, as well as enhance generalization by reducing overfitting is a common preprocessing step prior to training a predictive model. In predictive discriminant analysis, the use of classic variable selection methods as a preprocessing step, may lead to “good” overall correct classification within the confusion matrix. However, in most cases, the obtained best subset of predictor variables are not superior (both in terms of the number and combination of the predictor variables, as well as the hit rate obtained when used as training sample) to all other subsets from the same historical sample. Hence the obtained maximum hit rate of the obtained predictive discriminant function is often not optimal even for the training sample that gave birth to it. This paper proposes an efficient variable selection method for obtaining a subset of predictors that will be superior to all other subsets from the same historical sample. In application to real life datasets, the obtained predictive function using our proposed method achieved an actual hit rate that was essentially equal to that of the all-possible-subset method, with a significantly less computational expense.
English Else
What problem does this paper attempt to address?