A MapReduce based distributed LSI

Yang Liu,Maozhen Li,Suhel Hammoud,Nasullah Khalid Alham,Mahesh Ponraj
DOI: https://doi.org/10.1109/FSKD.2010.5569083
2010-01-01
Abstract:Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness in dealing with the problems of synonymy and polysemy within a proper matrix scale. However LSI is enormously computationally intensive especially for processing large scale data. And effective solution is to increase the computational power available to LSI using multiple computing nodes. In this paper we propose a novel MapReduce based distributed LSI using Hadoop distributed computing architecture to implement K-means algorithm to cluster the documents and then using LSI on the clustered results. We evaluated the performances of the proposed MapReduce based LSI and comparison are made with standalone LSI. The results show a great improvement of LSI's performance in terms of speed.
What problem does this paper attempt to address?