A Semi-Structured Tibetan Text Clustering Algorithm Based on Swarm Intelligence
Jian KANG,Shao-Jie QIAO,Duoji GESANG,Nan HAN,Xi-Jin HONG,Zhaxi NIMA,Xiao-Gang FAN
DOI: https://doi.org/10.3969/j.issn.1003-6059.2014.07.012
2014-01-01
Abstract:To apply swarm intelligence techniques to cluster semi-structured Tibetan Web texts, a semi-structured Tibetan text clustering algorithm based on swarm Intelligence ( SCAST) is proposed. Taking into a full consideration of accuracy and efficiency of Tibetan text clustering, a vector space model is used to express Tibetan texts, and the Tibetan texts and intelligent ants are randomly put in a two dimensional text vector space. Then, intelligent ants randomly select a Tibetan text, calculate the similarity between this text and others in the local area, and compute the probability of pick-up operation or drop-down operation to determine whether to pick up, move, or drop down the text. Finally, Tibetan texts are accurately clustered according to their similarities by iterative training of the proposed algorithm. The experimental results on real Tibetan Web text datasets show that the proposed algorithm is more accurate than the traditional κ-means clustering algorithm with average increase of 8 . 0%.