DMGrid: A Data Mining System Based on Grid Computing

Yi Wang,Liutong Xu,Guanhui Geng,Xiangang Zhao,Nan Du
DOI: https://doi.org/10.1007/978-3-540-88192-6_29
2008-01-01
Abstract:Researchers in the field of data mining now confront a common problem that data mining tasks are time-consuming in that these tasks have to process large-scale datasets. Grid computing focuses on integrating distributed, heterogeneous and idle computers from the Internet to be a service system with high performance. Thus, it is possible to take advantage of grid computing to provide high performance computation capability to effectively reduce task durations. Here, we have successfully developed DMGrid, a grid handling data mining applications. In DMGrid, it not only considers efficient parallel computing as a crucial aspect, but also takes into account dynamic resource configuration. Unlike many existing data mining grids, DMGrid also provides an engine to execute the algorithm flow specified in an application. Moreover, it offers application execution monitoring. At last, we perform experiments and design two applications: Customer Churning Analysis and Customer Value Analysis through which the feasibility of DMGrid is validated.
What problem does this paper attempt to address?