Matching Model Versus Single Model: A Study Of The Requirement To Match Class Distribution Using Decision Trees

Km Ting
DOI: https://doi.org/10.1007/978-3-540-30115-8_40
2004-01-01
Abstract:A tacit assumption in classifier induction is that the class distribution of the training set must match the class distribution of the test set. A direct implementation is to retrain a model using a data set with matching class distribution every time the operating condition changes (i.e., the matching model). The alternative is to modify the decision rule of a previous trained model to the new operating condition. The latter is the single model approach commonly used and recommended by many researchers. In this paper, we argue with empirical support using decision trees that learning using the matching class distribution is desirable. We also make explicit the differences and limitations of the two methods for the single model approach: rescaling and thresholding.
What problem does this paper attempt to address?