A novel approach of clustering XML document based on path optimization

Houqun Yang,Zhongshi He,Jingsheng Lei,Lei Yu
2007-01-01
Journal of Computational Information Systems
Abstract:This paper proposes a XML document clustering method based on the XML structural representation by XPath which encodes the frequently occurring elements with the hierarchical information. Firstly, the approach extracts path sequences from documents, and then, the documents are transformed to form the feature vectors. Finally, the path-based clustering method is applied to groups the documents according to their frequent structures. The effectiveness and average process time of the clustering method is evaluated by testing results.
What problem does this paper attempt to address?