Scaling Strongly Consistent Replication

Aleksey Charapko,Ailidani Ailijiang,Murat Demirbas
DOI: https://doi.org/10.48550/arXiv.2003.07760
2020-03-17
Distributed, Parallel, and Cluster Computing
Abstract:Strong consistency replication helps keep application logic simple and provides significant benefits for correctness and manageability. Unfortunately, the adoption of strongly-consistent replication protocols has been curbed due to their limited scalability and performance. To alleviate the leader bottleneck in strongly-consistent replication protocols, we introduce Pig, an in-protocol communication aggregation and piggybacking technique. Pig employs randomly selected nodes from follower subgroups to relay the leader's message to the rest of the followers in the subgroup, and to perform in-network aggregation of acknowledgments back from these followers. By randomly alternating the relay nodes across replication operations, Pig shields the relay nodes as well as the leader from becoming hotspots and improves throughput scalability. We showcase Pig in the context of classical Paxos protocols employed for strongly consistent replication by many cloud computing services and databases. We implement and evaluate PigPaxos, in comparison to Paxos and EPaxos protocols under various workloads over clusters of size 5 to 25 nodes. We show that the aggregation at the relay has little latency overhead, and PigPaxos can provide more than 3 folds improved throughput over Paxos and EPaxos with little latency deterioration. We support our experimental observations with the analytical modeling of the bottlenecks and show that the rotating of the relay nodes provides the most benefit for reducing the bottlenecks and that the throughput is maximized when employing only 1 randomly rotating relay node.
What problem does this paper attempt to address?