Efficient EMD-based Similarity Search Via Batch Pruning and Incremental Computation (extended Abstract)

Yu Chen,Yong Zhang,Jin Wang,Jiacheng Wu,Chunxiao Xing
DOI: https://doi.org/10.1109/tkde.2021.3100566
IF: 9.235
2021-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:As a robust similarity measurement, Earth Mover's Distance (EMD) has been widely adopted in many real-world applications, such as machine learning, computer vision and natural language processing. In this paper, we study the problem of EMD-based similarity search, which aims at finding all histogram objects from a dataset whose EMD is within a pre-defined threshold from the given query. Since the time complexity of computing EMD is rather high, it is essential to devise effective techniques to accelerate the query processing. To this end, we propose a filter-and-verification framework: In the filter step, we devise effective strategies to prune dissimilar objects in batch by sharing the computation between multiple objects. In the verification step, we develop novel flow adjustment techniques to incrementally calculate the EMD of candidates and enable early termination. We justify our proposed framework by conducting both theoretical analysis and extensive experiments. The results on four real world datasets show that our proposed techniques achieve up to an order of magnitude performance gain than state-of-the-art approaches.
What problem does this paper attempt to address?