Abstract:On-line Transaction Processing (OLTP) applications often rely on shared-nothing distributed databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-distributed servers can adversely impact the performance of such databases, especially when the transactions are short-lived and these require immediate responses. The k-way min-cut graph clustering based database repartitioning algorithms can be used to reduce the number of DTs with acceptable level of load balancing. Web applications, where DT profile changes over time due to dynamically varying workload patterns, frequent database repartitioning is needed to keep up with the change. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k-way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the non-DTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server data-migration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster—the many-to-one version minimising data migrations alone and the one-to-one version reducing data migration without affecting load balancing. A distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hypergraph, and compressed hypergraph representations used in the literature. To compare the performance of any incremental repartitioning framework without any bias of the external min-cut algorithm due to graph size variations, a transaction generation model is developed that can maintain a target number of unique transactions in any arbitrary observation window, irrespective of new transaction arrival rate. The overall impact of DTs at any instance is estimated from the exponential moving average of the recurrence period of unique transactions to avoid transient fluctuations. The effectiveness and adaptability of the proposed incremental repartitioning framework for transactional workloads have been established with extensive simulations on both range partitioned and consistent hash partitioned databases.

Optimizing Inter-data-center Large-Scale Database Parallel Replication with Workload-Driven Partitioning

Data Based Application Partitioning and Workload Balance in Distributed Environment

Load Balance Optimization with Replication Degree Customization.

Coexistence of Multiple Partition Plan Based Physical Database Design.

Efficient Task Replication for Fast Response Times in Parallel Computation

Workload-aware incremental repartitioning of shared-nothing distributed databases for scalable OLTP applications

Heterogeneous Replicas for Multi-dimensional Data Management

Optimizing Parallel I/O Accesses Through Pattern-Directed and Layout-Aware Replication

Efficient Straggler Replication in Large-Scale Parallel Computing

Optimizing Write Operation on Replica in Data Grid

An Adaptive Model For Building Service-Partition System

Cost-Based Optimization Of Logical Partitions For A Query Workload In A Hadoop Data Warehouse

A novel agent-based parallel ETL system for massive data

Lauca: A Workload Duplicator for Benchmarking Transactional Database Performance

Hihooi: A Database Replication Middleware for Scaling Transactional Databases Consistently

Software Pipeline–Based Partitioning Method with Trade-Off Between Workload Balance and Communication Optimization

New Balanced Data Allocating and Online Migrating Algorithms in Database Cluster

BDS: a Centralized Near-Optimal Overlay Network for Inter-Datacenter Data Replication

Hypergraph-partitioning-based online joint scheduling of tasks and data

Explore Data Placement Algorithm for Balanced Recovery Load Distribution.

Lion: Minimizing Distributed Transactions through Adaptive Replica Provision (Extended Version)