Parallel Ordinal Decision Tree Algorithm and Its Implementation in Framework of MapReduce.

Shanshan Wang,Junhai Zhai,Hong Zhu,Xizhao Wang
DOI: https://doi.org/10.1007/978-3-662-45652-1_25
2014-01-01
Abstract:Ordinal decision tree (ODT) can effectively deal with monotonic classification problems. However, it is difficult for the existing ordinal decision tree algorithms to learning ODT from large data sets. In order to deal with the problem of generating an ODT from large datasets, this paper presents a parallel processing mechanism in the framework of MapReduce. Similar to the general ordinal decision tree algorithms, the rank mutual information (RMI) is still used to select the extended attributes. Differing from the calculation of RMI in the previous algorithms, this paper applies a strategy of attribute parallelization to calculate the RMI. Experiments on large ordered data sets (which are generated artificially) confirm that our proposed algorithm is feasible. Experimental results show that our algorithm is effective and efficient from three aspects: speed-up, scale-up and size-up.
What problem does this paper attempt to address?