A Cost-Effective LSH Filter for Fast Pairwise Mining

Gang Zhao,Yun Xiong,Longbing Cao,Dan Luo,Xuchun Su,Yangyong Zhu
DOI: https://doi.org/10.1109/ICDM.2009.112
2009-01-01
Abstract:The pairwise mining problem is to discover pairwise objects having measures greater than the user-specified minimum threshold from a collection of objects. It is essential in a large variety of database and data-mining applications. Of late, there has been increasing interest in applying a Locality-Sensitive Hashing (LSH) scheme for pairwise mining. LSH-type methods have shown themselves to be simply implementable and capable of achieving significant performance gain in running time over most exact methods. However, the present LSH-type methods still suffer from some bottlenecks, such as ”the curse of threshold”. In this paper, we proposed a novel LSHbased method, namely Cost-effective LSH filter (Ce-LSH for short), for pairwise mining. Compared with previous LSH-type methods, it uses a lower fixed number of LSH functions and is thus more cost-effective. Substantial experiments evidence that our method gives significant improvement in running time over existing LSH-type methods and some recently reported method based on upper-bound. Experimental results also indicate that it scales well even for a relatively low minimum threshold and for a fairly small miss ratio.
What problem does this paper attempt to address?