Dynamic High Dimensional Data Mapping for Efficient Similarity Query Processing

Xiangmin Zhou,Guoren Wang,Xiaofang Zhou
2005-01-01
Abstract:For efficient processing of similarity queries, the search space is often reduced by pruning inactive query subspaces which do not contain any query results so only those active query subspaces which may contain query results are examined. Among the active query subspaces, however, not all of them contain query results; an active query subspace that later turns out to contain no query results are called false active query subspaces. The performance of similarity query processing degrades in the presence of false active query subspaces. This problem becomes more serious for high dimensional data with non-uniform distribution. Our experiments show that the number of accesses to false active subspaces increases when the number of dimensions increases. To overcome this problem, we propose, in this paper, a space mapping approach that can reduce such unnecessary data accesses. For a given query space, it can be refined by filtering within its mapped space. A mapping strategy, maxgap, is proposed to improve the efficiency of refinement processing. Based on this refinement method, an index structure called theMS-tree, together with the algorithms for index construction and query processing, are designed and implemented. The MS-tree is compared with a number of existing methods for their performance to support range queries using a real data set.
What problem does this paper attempt to address?