Improving Performance of Decision Trees with Multi-Edit-nearest-neighbor Algorithm

叶晨洲,杨杰,姚莉秀,陈念贻
DOI: https://doi.org/10.3321/j.issn:1001-0920.2003.01.022
2003-01-01
Abstract:Noises and overlapped regions existing in training samples hurt the simplicity and generality of decision trees. To solve this problem, a sample selection algorithm based on multi-edit-nearest-neighbor rule is proposed. This algorithm, under ideal conditions, can eliminate the noise satisfying some prerequisites, purify the overlapped region according to its members′ posterior probabilities, and finally form a Bayesian boundary between samples of different classes. When applied to an appropriate trainingdataset,itobviouslycutsdownthesize of resulting decision trees without sacrificing the accuracy. This improves both the understandability and generality of decision trees.
What problem does this paper attempt to address?