Constructing decision tree with continuous attributes for binary classification

YanHuang Jiang,Haifang Zhou,Xuejun Yang
DOI: https://doi.org/10.1109/ICMLC.2002.1174409
2002-01-01
Abstract:Continuous attributes are hard to handle and require special treatment in decision tree induction algorithms. In this paper, we present a multisplitting algorithm, RCAT, for continuous attributes based on statistical information. When calculating information gain for a continuous attribute, it first splits the value range of the attribute into some initial intervals, computes the probability estimation of every class at each interval and finds the best threshold in the probability space, uses this threshold to separate the initial intervals into two sets, combines adjacent intervals in the same set, optimizes the boundary of every combined interval, and finally obtains the information gain of the continuous attribute. We also provide a pruning method to simplify the decision trees. Empirical results show that the RCAT algorithm can realise decision trees with much higher intelligibility than C4.5 while retaining their accuracy.
What problem does this paper attempt to address?