A New Algorithm Based on Centroid for Text Categorization

Chongwei Shen,Bin Wu
DOI: https://doi.org/10.1109/fskd.2012.6234190
2012-01-01
Abstract:Text categorization is a hot topic and a key technology in data mining and information retrieval, so that it received wide attention recently. Centroid-based algorithm is an effective and robust approach. However it often suffers from the inductive bias or model misfit. In order to solve this problem, many researchers have put forward a number of improvement strategies which makes the centroid-based algorithm have a better performance. The paper proposed a novel approach to adjust the centroids which is called Weighted Margin adjusted Centroid based Algorithm (WMCA). Then it presented a lot of experimental comparison with some other algorithms by using 5 different public corpuses. The results showed that the WMCA algorithm has the best performance.
What problem does this paper attempt to address?