News Topic Clustering Based on Topic Similarity Improvement of K-means

Long CHEN,Jian XU,Yanan YU,Jianhong HU
DOI: https://doi.org/10.3969/j.issn.1672-9722.2017.08.021
2017-01-01
Abstract:News topic clustering plays an important role in the field of public opinion supervision,hot topic detection and re-al-time tracking. The text clustering algorithm based on K-means is widely used as a news topic clustering algorithm because of its simple and easy implementation,low space-time complexity and excellent clustering results. However,the traditional K-means al-gorithm has its limitations,such as the choice of the initial center point and the user to customize the K and so on,which leads to the algorithm to converge to the local optimal and can not get the global optimal solution. According to the initial clustering center of the traditional K-means algorithm in random selection leads to clustering instability problem,topic clustering for an improved K-means algorithm is proposed,the algorithm reports similarity to select the initial cluster center based on guarantee the news topic cluster has a good discrimination. And on this basis,according to the coverage rate of the news topic to determine the number of clus-ters K. The experimental results show that the improved algorithm can generate stable and high quality topic clusters.
What problem does this paper attempt to address?