Set Similarity Join Using Partition Index

HONG Yin-jie,CHEN Gang,CHEN Ke
DOI: https://doi.org/10.3785/j.issn.1008-973x.2012.02.017
2012-01-01
Abstract:To address the deficiency of similarity join online when using traditional indexing and filtering algorithm,we proposed several novel filtering approaches by improving the inverted based and signature based schemes.Enhancing the inverted index to reduce the search spaces,which partition the index according to the information of item's position and the record's length.In addition,we designed a novel weighted signature filtering scheme,where the upper bound of the overlap between two sets can be estimated to improve the effectiveness of filtering.Typically,the processing of set similarity join often adopts the filtering-refinement framework,which generates candidates by some filtering schemes and then produces the final results by refining the candidates.The proposed schemes can be seamlessly integrated into the filtering-refinement framework with other filtering schemes to process set similarity join online.Extensive experiments are conducted using real datasets.The experiments results show the efficiency of the proposed schemes.
What problem does this paper attempt to address?