Spider: A BFT Architecture for Geo-Replicated Cloud Services

Michael Eischer,Tobias Distler
2024-05-18
Abstract:Traditionally, Byzantine fault tolerance (BFT) in geo-replicated systems is achieved by executing complex agreement protocols over large-distance communication links, and therefore typically incurs high response times. In this paper we address this problem with Spider, a resilient and modular BFT replication architecture for geo-distributed systems that leverages characteristic features of today's public-cloud infrastructures to minimize both complexity as well as latency. Spider is composed of multiple largely independent replica groups that each are distributed across different availability zones of their respective cloud region. This design offers the possibility to provide low response times by placing replica groups in close geographic distance to clients, while at the same time enabling intra-group communication over short-distance links. To handle the interaction between groups that is necessary for strong consistency, Spider uses a novel message-channel abstraction with first-in-first-out semantics and built-in flow control that greatly simplifies system design.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?