An Efficient Top-K Spatial Keyword Typicality and Semantic Query

Xiaoyan Zhang,Xiangfu Meng,Jinguang Sun,Quangui Zhang,Pan Li
DOI: https://doi.org/10.1109/access.2019.2941760
IF: 3.9
2019-01-01
IEEE Access
Abstract:Existing spatial keyword query processing models mainly consider the spatial proximity and text relevancy between spatial objects and spatial keyword query, which usually makes the top-k answer objects are similar to each other. However, the user hopes to obtain the top-k results that are typical and semantically related to his/her query intention. This paper proposes a top-k spatial keyword typicality and sematic querying approach which can expeditiously provide top-k typical and semantically related objects to the given query. The approach consists of two processing steps. During the offline step, we first analyze the location-semantic relationships between spatial objects by considering both the location similarity and document semantic relevancy between them. For measuring the semantic similarity between documents associated to the spatial objects, we propose two methods, the keyword coupling relationship-based document similarity measure and the Word2Vec-CNN-based document similarity measure. Then, the Gaussian probabilistic density-based estimation method is leveraged to find a few representative objects from the dataset and then the order/permutation of remaining objects in the dataset can be generated corresponding to each representative object. The objects in the permutation are ranked in descending order according to their location-semantic relationships to the representative object. When a spatial keyword query coming, the online processing step first computes the spatial proximity and semantic relevancy between the query and each representative object, and then a small number of orders generated in the offline step can be selected and used at querying time to facilitate top-k typical and semantically related object selection by using the threshold algorithm (TA). Results of a preliminary user study demonstrate our location-semantic relationship measuring method can capture the location similarity and semantic relevancy between spatial objects accurately. The efficiency of typicality analysis and TA-based top-k selection algorithm is also demonstrated.
What problem does this paper attempt to address?