EMD-DSJoin: Efficient Similarity Join Over Probabilistic Data Streams Based on Earth Mover’s Distance

Jia Xu,Jiazhen Zhang,Chao Song,Qianzhen Zhang,Pin Lv,Taoshen Li,Ningjiang Chen
DOI: https://doi.org/10.1007/978-3-319-45835-9_4
2016-01-01
Abstract:Similarity joins on probabilistic data play a vital role in many practical applications, such as sensor reading monitoring and object tracking based on multiple video sources. Earth Mover’s Distance (EMD) proposed in Computer Vision is more effective in returning similar probabilistic data being more consistent to human’s perception to similarity. However, the cubic time complexity of EMD hampers its wide application, especially in the analysis of fast incoming data streams. In this paper we, to the best of our knowledge, make the first attempt to address the EMD similarity join over data streams under sliding window semantics. We first design an efficient and effective index framework, named B + Forests Index, which facilitates data pruning and offers proper strategy to deal with out-of-order data. We then propose the EMD similarity algorithm, named EMD-DSJoin, based on the proposed index framework. We perform extensive experiments on real-world datasets and verify the effectiveness and efficiency of our proposal.
What problem does this paper attempt to address?