TOP-K Cosine Similarity Interesting Pairs Search

Shiwei Zhu,Junjie Wu,Guoping Xia
DOI: https://doi.org/10.1109/fskd.2010.5569212
2010-01-01
Abstract:Recent years have witnessed an increased interest in computing cosine similarities between documents (or commodities). Most previous studies require the specification of a minimum similarity threshold to perform cosine similarity search. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly related pairs of objects as measured by the cosine similarity. Specifically, we first define the cosine similarity measure from the association analysis point of view and identify the monotone property of an upper bound of the cosine measure, then exploit a diagonal traversal strategy for developing the TOP-DATA and TOP-DATA-R algorithms. Finally, experimental results demonstrate the computational efficiencies of above algorithms.
What problem does this paper attempt to address?