MReC4.5: C4.5 Ensemble Classification with MapReduce

Gongqing Wu,Haiguang Li,Xuegang Hu,Yuanjun Bi,Jing Zhang,Xindong Wu
DOI: https://doi.org/10.1109/chinagrid.2009.39
2009-01-01
Abstract:Classification is a significant technique in data mining research and applications. C4.5 is a widely used classification method, and ensemble learning adopts a parallel and distributed computing model for classification. Based on analyses of the MapReduce computing paradigm and the process of ensemble learning, we find that the parallel and distributed computing model in MapReduce is appropriate for implementing ensemble learning. This paper takes the advantages of C4.5, ensemble learning and the MapReduce computing model, and proposes a new method MReC4.5 for parallel and distributed ensemble classification. Our experimental results show that increasing the number of nodes would benefit the effectiveness of classification modeling, and serialization operations at the model level make the MReC4.5 classifier "construct once, use anywhere".
What problem does this paper attempt to address?