Scalable Top-K Spatial Keyword Search

Dongxiang Zhang,Kian-Lee Tan,Anthony K. H. Tung
DOI: https://doi.org/10.1145/2452376.2452419
2013-01-01
Abstract:In this big data era, huge amounts of spatial documents have been generated everyday through various location based services. Top- k spatial keyword search is an important approach to exploring useful information from a spatial database. It retrieves k documents based on a ranking function that takes into account both textual relevance (similarity between the query and document keywords) and spatial relevance (distance between the query and document locations). Various hybrid indexes have been proposed in recent years which mainly combine the R-tree and the inverted index so that spatial pruning and textual pruning can be executed simultaneously. However, the rapid growth in data volume poses significant challenges to existing methods in terms of the index maintenance cost and query processing time. In this paper, we propose a scalable integrated inverted index, named I 3 , which adopts the Quadtree structure to hierarchically partition the data space into cells. The basic unit of I 3 is the keyword cell, which captures the spatial locality of a keyword. Moreover, we design a new storage mechanism for efficient retrieval of keyword cell and preserve additional summary information to facilitate pruning. Experiments conducted on real spatial datasets (Twitter and Wikipedia) demonstrate the superiority of I 3 over existing schemes such as IR-tree and S2I in various aspects: it incurs shorter construction time to build the index, it has lower index storage cost, it is order of magnitude faster in updates, and it is highly scalable and answers top- k spatial keyword queries efficiently.
What problem does this paper attempt to address?