Study on the Cost-Sensitive Ensemble Learning Algorithm Based on the Cloud Computing Platform

张伶卫,万文强
2012-01-01
Abstract:With respect to the classification of large scale imbalanced data, a distributed cost-sensitive ensemble learning algorithm based on cloud computing platform was proposed. The large scale data was divided on Hadoop cloud compu- ting platform and was used in parallel learning. Based on the idea of cost-sensitive, a weighted ensemble classifier was achieved, and a distributed cost-sensitive ensemble learning model based on cloud computing platform was developed. Experiment results showed that the recall rate of the minority class was improved significantly and the computational time was shortened by the ensemble learning on cloud computing platform due to the Hadoop parallel mechanism. In ad- ditron, the classification efficiency of the large-scale imbalanced problem was largely improved.
What problem does this paper attempt to address?