FXProj – A Fuzzy XML Documents Projected Clustering Based on Structure and Content

Tengfei Ji,Xiaoyuan Bao,Dongqing Yang
DOI: https://doi.org/10.1007/978-3-642-25853-4_31
2011-01-01
Abstract:XML documents possess inherent semi-structured property, consisting of structural and content features. Most existing methods for XML documents clustering consider only one aspect of them. In this paper, we propose a fuzzy XML documents projected clustering algorithm, which can be used to cluster XML documents efficiently by combining the structural and content features. Another contribution is the adoption of some fuzzy techniques in a way that each frequent induced substructure has a fuzzy parameter associated with each cluster. Experimental results on both synthetic and real datasets show its effectiveness, especially when applying to large schemaless XML document collections.
What problem does this paper attempt to address?