Influence Of Data Set Splitting Methods On Similarity Indexing Performance

Xs Bai,Gy Xu,Yc Shi,Sq Yang
DOI: https://doi.org/10.1117/12.373594
2000-01-01
Abstract:Similarity indexing is the supporting technology for fast content-based retrieval of large media databases, and many similarity index structures have been proposed. Compared with the many structures present, less attention has been paid to performance evaluation of index structures and theoretic analysis on factors influencing index performance. In this paper, we attempt to solve part of the problem and focus our research on analyzing the influence of data splitting methods. To give a formal definition for index structure performance evaluation, we introduce the query distribution probability concept and propose using average search cost to evaluate the performance of a similarity indexing structure. We choose the simplest case of similarity indexing - nearest-neighbor search in our discussion and deduce an expression for the average search cost function. Based on analysis of the expression, we proposed some criteria that may be useful in index design and implementation Then we extend these conclusions to the general similarity indexing case and use these criteria as general rules in index design and implementation. Basic thoughts and analysis are detailed, as well as experiment results.
What problem does this paper attempt to address?