A Novel Path-Based Method for Clustering XML Schemas

Li Nan,Yang Weidong,Fang Fei
2011-01-01
Journal of Computer Research and Development
Abstract:Most existing XML schema clustering techniques mainly focus on the element similarity,which cannot fully reveal the structural differences between schemas,especially in semantics.This paper presents a novel clustering method for XML schemas,which can capture the semantic and structural features.A novel evaluation model is devised to measure the schemas' similarities comprehensively,in which both the element similarity between best matching element pairs and their corresponding structural information are involved.Each schema is characterized with a feature vector,which facilitates the measurement of schema similarities by computing the cosine similarities between corresponding feature vectors.Extensive experiments indicate that the proposed approach outperforms existing proposals with better clustering validity and quality.
What problem does this paper attempt to address?