The Dictionary-Based Quantified Conceptual Relations For Hard And Soft Chinese Text Clustering

Yi Hu,Ruzhan Lu,Yuquan Chen,Hui Liu,Dongyi Zhang
DOI: https://doi.org/10.1007/978-3-540-73351-5_9
2007-01-01
Abstract:In this paper we present a new similarity of text on the basis of combining cosine measure with the quantified conceptual relations by linear interpolation for text clustering. These relations derive from the entries and the words in their definitions in a dictionary, which are quantified under the assumption that the entries and their definitions are equivalent in meaning. This kind of relations is regarded as "knowledge" for text clustering. Under the framework of k-means algorithm, the new interpolated similarity improves the performance of clustering system significantly in terms of optimizing hard and soft criterion functions. Our results show that introducing the conceptual knowledge from the un-structured dictionary into the similarity measure tends to provide potential contributions for text clustering in future.
What problem does this paper attempt to address?