Improved C4.5 Algorithm Based on K-Means

Honghui Li,Yikun Xi,Hailiang Lu,Xueliang Fu
DOI: https://doi.org/10.3233/jcm-193794
2020-01-01
Journal of Computational Methods in Sciences and Engineering
Abstract:When the traditional C4.5 algorithm deals with the big data with a large number of multidimensional continuous attribute values, it may cause the issue of low classification accuracy with the related discretization method. This paper proposes a novel method to discretize continuous data based on the k-means algorithm. The method generates data clusters by combining continuous, unfeatured data with corresponding class labels, and then takes the approximate boundary points of the cluster as the candidate splitting-points of the continuous attribute. Based on this, the information gain ratio is calculated. Experimental results show that, the proposed K-C4.5 algorithm improves the classification accuracy of the decision tree in comparison with the traditional one.
What problem does this paper attempt to address?