A Distributed XML Data Placement and Access Method Considering Subtree Processing Cost

Y. Yoshino,Wenxin Liang,H. Yokota
2008-01-01
Abstract:Recently, we have to handle very large XML documents. To provide efficient retrieval and management functions for the large-scale XML documents, it is effective to store those XML documents into distributed RDB systems. For the approach, it is important to realize a distributed XML data placement for balancing processing costs of traversal and subtree reconstruction. In this paper, we propose a method to balance the costs of processing distributed XML data. First, we fragment an XML document into subtrees representing the same or similar meaningful unit. Next, we cluster the fragmented subtrees based on the strings of specific nodes and calculate the processing cost of each cluster using the size and the number of nodes in each subtree. Then, we allocate the clusters in distributed RDB systems to make the processing cost equal. We also propose an index structure to derive the information about data location. We evaluate the effectiveness of proposed method by experiments storing XML data of the Wikipedia into multiple PostgreSQL servers.
What problem does this paper attempt to address?