Clustering Technology for High Dimensional Data Based on Semantics

刘铭,王晓龙,刘远超
DOI: https://doi.org/10.3321/j.issn:0372-2112.2009.05.003
2009-01-01
Abstract:A novel clustering algorithm for high dimensional data is proposed in this paper.This algorithm first partitions input document set into some clusters by constructing feature chains.Simultaneously it also considers the effects of similar features in similarity computation and weight adjustment to agglomerate documents with semantic similarities,and dynamically adjusts weights of documents to make unbalanced documents well trained.Experiment results demonstrate that it can obtain relatively better clustering results with high intra-cluster agglomeration and inter-cluster distinctness,and also has less iterative times.
What problem does this paper attempt to address?