AVA: Fault-tolerant Reconfigurable Geo-Replication on Heterogeneous Clusters

Tejas Mane,Xiao Li,Mohammad Sadoghi,Mohsen Lesani
2024-12-03
Abstract:Fault-tolerant replicated database systems consume less energy than the compute-intensive proof-of-work blockchain. Thus, they are promising technologies for the building blocks that assemble global financial infrastructure. To facilitate global scaling, clustered replication protocols are essential in orchestrating nodes into clusters based on proximity. However, the existing approaches often assume a homogeneous and fixed model in which the number of nodes across clusters is the same and fixed, and often limited to a fail-stop fault model. This paper presents heterogeneous and reconfigurable clustered replication for the general environment with arbitrary failures. In particular, we present AVA, a fault-tolerant reconfigurable geo-replication that allows dynamic membership: replicas are allowed to join and leave clusters. We formally state and prove the safety and liveness properties of the protocol. Furthermore, our replication protocol is consensus-agnostic, meaning each cluster can utilize any local replication mechanism. In our comprehensive evaluation, we instantiate our replication with both HotStuff and BFT-SMaRt. Experiments on geo-distributed deployments on Google Cloud demonstrates that members of clusters can be reconfigured without considerably affecting transaction processing, and that heterogeneity of clusters may significantly improve throughput.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the deficiencies of existing Byzantine fault - tolerant replication protocols in terms of scalability and dynamic membership management. Specifically: 1. **Existing cluster replication protocols are usually homogeneous and fixed**: that is, the number of nodes in each cluster is the same and fixed, which limits the flexibility and adaptability of the system. 2. **Existing systems are often limited to specific fault models (such as the stop - fault model)**, and cannot handle a wider range of fault types. 3. **Lack of support for heterogeneous environments**: the number of active nodes in different regions may be different, but existing systems cannot support this heterogeneity well. 4. **Lack of dynamic membership management**: existing cluster replication protocols usually do not allow nodes to join or leave the cluster dynamically, which limits the decentralization and flexibility of the system. To solve these problems, the paper proposes AVA (Adaptive and Versatile Architecture), a fault - tolerant and reconfigurable geo - replication protocol that allows nodes to join and leave the cluster dynamically and supports cluster replication in heterogeneous environments. The main features of AVA include: - **Support for heterogeneous clusters**: different clusters can have different numbers of nodes, so as to better adapt to the resource distribution in different regions. - **Dynamic membership management**: allows nodes to join and leave the cluster dynamically, enhancing the flexibility and decentralization characteristics of the system. - **Security and liveness guarantees**: the security and liveness properties of the protocol are proved by formal methods to ensure the correct operation of the system in any fault situation. - **Consensus - independence**: each cluster can choose different local replication mechanisms (such as HotStuff and BFT - SMaRt), enhancing the generality and flexibility of the system. Through these improvements, AVA aims to provide an efficient, flexible and secure solution for global distributed systems, especially suitable for building global financial infrastructure.