Abstract:With the explosive use of GPS-enabled devices, increasingly massive volumes of trajectory data capturing the movements of people and vehicles are becoming available, which is useful in many application areas, such as transportation, traffic management, and location-based services. As a result, many trajectory data management and analytic systems have emerged that target either offline or online settings. However, some applications call for both offline and online analyses. For example, in traffic management scenarios, offline analyses of historical trajectory data can be used for traffic planning purposes, while online analyses of streaming trajectories can be adopted for congestion monitoring purposes. Existing trajectory-based systems tend to perform offline and online trajectory analysis separately, which is inefficient. In this paper, we propose a hybrid and efficient framework, called Dragoon , based on Spark, to support both offline and online big trajectory management and analytics. The framework features a mutable resilient distributed dataset model, including RDD Share, RDD Update, and RDD Mirror, which enables hybrid storage of historical and streaming trajectories. It also contains a real-time partitioner capable of efficiently distributing trajectory data and supporting both offline and online analyses. Therefore, Dragoon provides a hybrid analysis pipeline. Support for several typical trajectory queries and mining tasks demonstrates the flexibility of Dragoon. An extensive experimental study using both real and synthetic trajectory datasets shows that Dragoon (1) has similar offline trajectory query performance with the state-of-the-art system UlTraMan; (2) decreases up to doubled storage overhead compared with UlTraMan during trajectory editing; (3) achieves at least 40% improvement of scalability compared with popular streaming processing frameworks (i.e., Flink and Spark Streaming); and (4) offers an average doubled performance improvement for online trajectory data analytics.

Distributed Skyline Trajectory Query Processing

Search Model of the Region with the Maximum Coverage Value Based on Trajectory Data

Real-time Detection of Traffic Congestion Based on Trajectory Data

Time-Based Trajectory Data Partitioning for Efficient Range Query.

Dragoon: a Hybrid and Efficient Big Trajectory Management System for Offline and Online Analytics

Distributed trajectory similarity search

Cloud-Based Framework for Spatio-Temporal Trajectory Data Segmentation and Query

High-performance spatiotemporal trajectory matching across heterogeneous data sources

Towards Efficient Search for Activity Trajectories

Detecting Trajectory Outliers Based On Spark

Mining Massive-Scale Spatiotemporal Trajectories in Parallel: A Survey.

SQUID: Subtrajectory Query in Trillion-Scale GPS Database

Popularity-aware spatial keyword search on activity trajectories

A Hilbert-GeoSOT Spatio-Temporal Meshing and Coding Method for Efficient Spatio-Temporal Range Query on Massive Trajectory Data

Optimizing segmented trajectory data storage with HBase for improved spatio-temporal query efficiency

Parallel grid-based density peak clustering of big trajectory data

A General and Parallel Platform for Mining Co-Movement Patterns over Large-scale Trajectories.

Efficient Spatial Keyword Search in Trajectory Databases

Non-Uniform Spatial Partitions and Optimized Trajectory Segments for Storage and Indexing of Massive GPS Trajectory Data

Efficient Path Query Processing over Massive Trajectories on the Cloud

Skyline-Join in Distributed Databases