A Model To Enhance Xml Document Clustering

Jianwu Yang,Xiaoou Chen
2001-01-01
Abstract:XML has become main data format in e-Business, e-Learning, e-Commerce, the need for tools to help manage XML documents also rises. Conventional text miners are mostly based on the vector space model of document, which views a document simply as an n-dimensional vector of terms. To retain the information in the structure and link, we have developed a structured link vector model (SLVM), which represents a document with a vector, whose elements determined by terms, element structure and neighbor documents. For briefness and clarity, they are described in the procedure of K-means: document similarity, and cluster center. Our cluster based on SLVM performs significantly more accurate than on conventional vector space model in our experiments, its K value of K-measure increase from 0.65similar to0.73 to 0.82similar to0.86.
What problem does this paper attempt to address?