The Unsupervised Discretization Method of Continuous Attributes Study: Based on Normal Distribution Characteristics

李晓宏,孙林岩,李刚
DOI: https://doi.org/10.3969/j.issn.1003-8256.2009.06.001
2009-01-01
Abstract:The discrete data is used to the vast majority of research methods of data mining.So it is necessary to discretize the continuous data as a part work of data preprocessing.This paper analy sis a new unsupervised discretization of continuous attributes based on normal distribution characteristics through the normal distribution characteristics and the distribution of different categories in the same attribution. After that,we study the relationship between the classify accuracy of the testing data and the setting number of the cut-points,and we find the logical number of the cut-points.F inally,the experiments show that the method can improve the classify accuracy of the testing datasets.
What problem does this paper attempt to address?