Clustering Geospatial Data for Multiple Reference Points.

Ying Zhong,Jianmin Li,Shunzhi Zhu
DOI: https://doi.org/10.1109/access.2019.2941019
IF: 3.9
2019-01-01
IEEE Access
Abstract:Data clustering plays a significant role in geospatial data management and analytics. In this light, we propose and study a novel geospatial data clustering method for multiple reference points. Given a set Q of geospatial data points, a candidate set O of reference points, and a threshold k, each data point q will be matched to its closest reference point o. The multi-reference clustering (MRC) method finds a subset A (A ⊆ O ∧ |A| ≤ k) reference points from O which define the minimum global travel distance (ΣP∀q∈Q,o∈A d(q, o)), hence the data are grouped into |A| clusters. We believe that the MRC method may benefit a lot of applications including geospatial data clustering, data classification, and data analytics in general. The MRC problem is challenging due to its high computation complexity, and there exist Σi=1k C|O|i = Σi=1k (|O|!/i!(|O|-i)! possibilities for subset A. Because the exact solution cannot be computed in real time, we develop a heuristic method to select subset A from O efficiently. The experimental results show that the accuracy of A is very close to the optimal solution. In addition, we also develop a set of optimization techniques to further enhance the efficiency. Finally, we conduct extensive experiments to study the efficiency and accuracy of the heuristic method.
What problem does this paper attempt to address?