A Novel Text Clustering Algorithm Based on Inner Product Space Model of Semantic
PENG Jing,YANG Dong-Qing,TANG Shi-Wei,FU Yan,JIANG Han-Kui
DOI: https://doi.org/10.3321/j.issn:0254-4164.2007.08.017
2007-01-01
Chinese Journal of Computers
Abstract:Due to lack considering the latent similarity information among words, the clustering result using exist clustering algorithms in processing text data, especially in processing short text data, is not ideal. Considering the text characteristic of high dimensions and sparse space, this paper proposes a novel text clustering algorithm based on semantic inner space model. The paper creates similarity method among Chinese concepts, words and text based on the definition of inner space at first, and then analyzes systematically the algorithm in theory. Through a two phrase processes, i.e. top-down divide phase and a bottom-up merge phase, it finishes the clustering of text data. The method has been applied into the data clustering of Chinese short documents. Extensive experiments show that the method is better than traditional algorithms.