Parallel ID3 Algorithm Based on Granular Computing and Hadoop

Ping Liu,Wu Zhenggang,Hao Zhou,Yang Junping,Taorong Qiu
2015-01-01
The Open Automation and Control Systems Journal
Abstract:Large data processing has become a hot topic of current research. How to efficiently dig out useful information from large amounts of data has become an important research direction in the field of data mining. In this paper, firstly, based on the idea of granular computing, some granular concepts about the decision tree are introduced. Secondly, refer- ring to granular computing, the improvement and parallelization of ID3 algorithms are presented. Finally, the proposed algorithms are tested on two data sets, and it can be concluded that the algorithm's classification accuracy is improved. From the test on a Hadoop platform, the results demonstrate that parallel algorithms can efficiently process massive da- tasets.
What problem does this paper attempt to address?