SMART-IMPALA: Efficient Querying of hyper Massive Spatiotemporal Trajectory Data
Lianjie Zhou,Wei Tu,Qingquan Li
DOI: https://doi.org/10.1109/ieeeconf54055.2021.9687505
2021-11-03
Abstract:Efficient sharing of hyper massive spatiotemporal trajectory data (HMSTD) is the foundation for establishing large-scale perception infrastructure, such as vehicle monitoring network in a smart city containing New York, Tokyo, Beijing, and Shanghai these megacities. Consequently, the daily trajectory data scale of vehicle monitoring networks in the smart city is growing rapidly, reaching daily volumes of 1 billion. Accessing HMSTD in transport, the Internet of Things, or other fields is hard and limited under the present spatiotemporal data indexing methods. Therefore, we propose a path-divided Hadoop Distributed File System (HDFS) data blocking (SMART) based on the Apache Impala (SMART -Impala) method to optimize the efficient access method of HMSTD to improve the efficiency of hyperdata sharing. Apache Impala, as a practical and powerful distributed data access means for massive data stored in memory, is widely applied in massive data sharing. In Smart-Impala, the spatiotemporal trajectory data retrieve capability of Impala is extended. Besides, a self-adaption parquet data partitioning strategy or pattern is proposed. In experiments, the Shenzhen BeiDou (BD) bus network is selected as the experimental scenario, consisting of 35809 buses equipped with BD positioning sensors, creating 1.03 billion data records each day. The buses distribution in Shenzhen city is achieved from 7:00 a.m. to 9:00 a.m. and 11:00 a.m. to 01:00 p.m. Moreover, SMART-Impala achieves approximately 8 times, 9 times, 29 times, 110 times higher performance than that in MongoDB or HBase in data scales of 10 million, 100 million, 500 million, 1 billion, whose results outperform that of the average division in Impala, MongoDB, and HBase methods.