Abstract:With the explosive use of GPS-enabled devices, increasingly massive volumes of trajectory data capturing the movements of people and vehicles are becoming available, which is useful in many application areas, such as transportation, traffic management, and location-based services. As a result, many trajectory data management and analytic systems have emerged that target either offline or online settings. However, some applications call for both offline and online analyses. For example, in traffic management scenarios, offline analyses of historical trajectory data can be used for traffic planning purposes, while online analyses of streaming trajectories can be adopted for congestion monitoring purposes. Existing trajectory-based systems tend to perform offline and online trajectory analysis separately, which is inefficient. In this paper, we propose a hybrid and efficient framework, called Dragoon , based on Spark, to support both offline and online big trajectory management and analytics. The framework features a mutable resilient distributed dataset model, including RDD Share, RDD Update, and RDD Mirror, which enables hybrid storage of historical and streaming trajectories. It also contains a real-time partitioner capable of efficiently distributing trajectory data and supporting both offline and online analyses. Therefore, Dragoon provides a hybrid analysis pipeline. Support for several typical trajectory queries and mining tasks demonstrates the flexibility of Dragoon. An extensive experimental study using both real and synthetic trajectory datasets shows that Dragoon (1) has similar offline trajectory query performance with the state-of-the-art system UlTraMan; (2) decreases up to doubled storage overhead compared with UlTraMan during trajectory editing; (3) achieves at least 40% improvement of scalability compared with popular streaming processing frameworks (i.e., Flink and Spark Streaming); and (4) offers an average doubled performance improvement for online trajectory data analytics.

DAMOCRO: A Data Migration Framework Using Online Classification and Reordering

Optimizing Data Migration Using Online Clustering.

A Delayed Container Organization Approach to Improve Restore Speed for Deduplication Systems.

DP: Dynamic Prepage in Postcopy Migration for Fixed-Size Data Load.

Dominoes: Speculative Repair in Erasure-Coded Hadoop System.

Reliability-Based Design Optimization for Cloud Migration

Dragoon: a Hybrid and Efficient Big Trajectory Management System for Offline and Online Analytics

Moving Big Data to The Cloud: An Online Cost-Minimizing Approach

Moving big data to the cloud

Model Transformation and Data Migration from Relational Database to MongoDB.

On Data Staging Strategies for Mobile Accesses to Cloud Services

Optimal Operator State Migration for Elastic Data Stream Processing

ADOM: An Adaptive Objective Migration Strategy for Grid Platform

On Optimizing Replica Migration in Distributed Cloud Storage Systems

RDMA-driven MongoDB: an Approach of RDMA Enhanced NoSQL Paradigm for Large-Scale Data Processing

Schema Optimization and Conflict Mechanism in Relational Database System Migration

RPC: Joint Online Reducer Placement and Coflow Bandwidth Scheduling for Clusters

High-performance Migration Tool for Live Container in a Workflow

Scaling Reverse Time Migration Performance Through Reconfigurable Dataflow Engines

Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing

On Time-Aware Cross-Blockchain Data Migration