Research on Service-Oriented Data Mining Engine Based on Cloud Computing

YU Yonghong,XIANG Xiaojun,GAO Yang,SHANG Lin,YANG Yubin
DOI: https://doi.org/10.3778/j.issn.1673-9418.2012.01.003
2012-01-01
Abstract:The scalability of data mining algorithms is restricted when dealing with large-scale data. There are sig-nificant differences in a wide range of application areas and requirements for knowledge discovery process. It is fundamental to provide effective formalisms to design distributed data mining application and support their efficient execution. This paper proposes a novel service-oriented data minging engine based on cloud computing framework,which is named as CloudDM. Differentiating from grid-based distributed data mining framework,CloudDM ex-ploits the capacity of open source cloud computing platform-Hadoop for large-scale data analysis,supports the design and execution of distributed data mining applications according to SOA(service-oriented architecture) . Moreover,it discusses and reports the key component functions and implementation technologies. According to the design principles of SOA and data mining engine based on cloud computing,the paper can solve the problems in massive data mining systems,such as big data storage,data processing and interactive operation of algorithms,etc.
What problem does this paper attempt to address?