Synthetic Data Generator for Classification Rules Learning

Runzong Liu,Bin Fang,Yuan Yan Tang,Patrick P. K. Chan
DOI: https://doi.org/10.1109/ccbd.2016.076
2016-01-01
Abstract:A standard data set is useful to empirically evaluate classification rules learning algorithms. However, there is still no standard data set which is common enough for various situations. Data sets from the real world are limited to specific applications. The sizes of attributes, the rules and samples of the real data are fixed. A data generator is proposed here to produce synthetic data set which can be as big as the experiments demand. The size of attributes, rules, and samples of the synthetic data sets can be easily changed to meet the demands of evaluation on different learning algorithms. In the generator, related attributes are created at first. And then, rules are created based on the attributes. Samples are produced following the rules. Three decision tree algorithms are evaluated used synthetic data sets produced by the proposed data generator.
What problem does this paper attempt to address?