Large-Scale Text Clustering Based on Improved K-Means Algorithm in the Storm Platform

sheng hang wu,zhe wang,ming yuan he,huai lin dong
DOI: https://doi.org/10.4028/www.scientific.net/AMM.543-547.1913
2014-01-01
Applied Mechanics and Materials
Abstract:With the web information dramatically increases, Distributed processing of mass data through a cluster have been the focus of research field. An efficient distributed algorithm is the determinant of the scalability and performance in data analyses. This dissertation firstly studies the operation mechanism of Storm, which is a simplified distributed and real-time computation platform. Based on the Storm platform, an improved K-Means algorithm which could be used for data intensive computing is designed and implemented. Finally, the experience results show that the K-Means clustering algorithm base on Storm platform could obtain a higher performance in experience and improve the effectiveness and accuracy in large-scale text clustering.
What problem does this paper attempt to address?