AN IMPROVED TEXT CATEGORISATION ALGORITHM BASED ON CENTROID
Chen Zhen,Wu Bin,Shen Chongwei,Zhang Zhonghui,Wang Bai
DOI: https://doi.org/10.3969/j.issn.1000-386x.2013.01.010
2013-01-01
Abstract:Text categorisation is a hot topic in data mining and information retrieval,and has been rapidly developing in recent years.Centroid-based approach is a text categorisation method modelling fast and having good effect,many researchers have studied this method thoroughly and put forward the improvement strategies to incessantly raise the performance of it.In this paper,we propose a novel algorithm to dynamically adjust the centroid position.The algorithm adjusts the centroid position dynamically based on every sample text in training set.Besides,we tackle the bottleneck aiming at mass data,make use of two current parallel computing frameworks,MapReduce and BSP,and put forward the parallel strategy of the algorithm.By the comparative experiments on 5 different datasets with other algorithms,we prove that the algorithm has quite accurate classification effect.