Raft with Out-of-Order Executions

GU Xiao-Song,WEI Heng-Feng,QIAO Lei,HUANG Yu
DOI: https://doi.org/10.21655/ijsi.1673-7288.00257
2021-01-01
International Journal of Software and Informatics
Abstract:PolarFS is a distributed file system developed by Alibaba with ultra-low latency and high availability. It implements a variant of the Raft consensus protocol, called ParallelRaft. ParallelRaft breaks Raft's strict serialization restrictions in the commitment and execution of log entries and enables state machines to commit and execute log entries in an out-of-order way. However, ParallelRaft is not open-sourced. It has only a brief description, lacking formal specification. Moreover, the correctness of ParallelRaft has not been manually proven or formally checked. The purpose of the study is to provide a precise formal specification for ParallelRaft and to prove its correctness. Specifically, the following main contributions are accomplished. First, to clarify the relationship between Raft and ParallelRaft, ParallelRaft-SE (Sequential Execution) is proposed, which allows out-of-order commitment but prohibits out-of-order executions. Also, a refinement mapping from ParallelRaft-SE to Multi-Paxos is established. Second, it is discovered that ParallelRaft, according to its brief description in the literature, neglects the so-called ghost log entries phenomenon, which may violate the consistency among state machines. Therefore, based on ParallelRaft-SE, ParallelRaft-CE (Concurrent Execution) is proposed. ParallelRaft-CE avoids the ghost log entries phenomenon and ensures the consistency among state machines when executing concurrently by limiting parallelism in the commitment of log entries. The correctness of ParallelRaft-CE is proved manually. Finally, the formal specifications of ParallelRaft-SE and ParallelRaft-CE are provided by TLA+ (TLA stands for temporal logic of actions), and the refinement mapping from ParallelRaft-SE to Multi-Paxos and the correctness of ParallelRaft-CE are verified using the TLC model checker when the number of participants of the protocols is small.
What problem does this paper attempt to address?