Efficient Maximal Spatial Clique Enumeration

Chen Zhang,Ying Zhang,Wenjie Zhang,Lu Qin,Jianye Yang
DOI: https://doi.org/10.1109/ICDE.2019.00083
2019-01-01
Abstract:Maximal clique enumeration is a fundamental problem in graph database. In this paper, we investigate this problem in the context of spatial database. Given a set P of spatial objects in a 2-dimensional space (e.g., geo-locations of users or point of interests) and a distance threshold r, we can come up with a spatial neighbourhood graph Pr by connecting every pair of objects (vertices) in P within distance r. Given a clique S of Pr, namely a spatial clique, it is immediate that any pairwise distance among objects in S is bounded by r. As the maximal pairwise distance has been widely used to capture the spatial cohesiveness of a group of objects, the maximal spatial clique enumeration technique can identify groups of spatially close objects in a variety of location-based-service (LBS) applications. In addition, we show that the maximal spatial clique enumeration can also be used to identify maximal clique pattern instances in the co-location pattern mining applications. Given the existing techniques for maximal clique enumeration, which can be immediately applied on the spatial neighbourhood graph Pr, two questions naturally arise for the enumeration of maximal spatial cliques: (1) the maximal clique enumeration on general graph is NP hard, can we have a polynomial time solution on the spatial neighbourhood graph? and (2) can we exploit the geometric property of the spatial clique to speed up the computation? In this paper, we give a negative answer to the first question by an example where the number of maximal spatial cliques is exponential to the number of the objects. While the answer to the second question is rather positive: we indeed develop two pruning techniques based on geometric properties of the maximal spatial clique to significantly enhance the computing efficiency. Extensive experiments on real-life geolocation data demonstrate the superior performance of proposed methods compared with two baseline algorithms.
What problem does this paper attempt to address?