Abstract:PolarFS is a distributed file system developed by Alibaba with ultra-low latency and high availability. It implements a variant of the Raft consensus protocol, called ParallelRaft. ParallelRaft breaks Raft's strict serialization restrictions in the commitment and execution of log entries and enables state machines to commit and execute log entries in an out-of-order way. However, ParallelRaft is not open-sourced. It has only a brief description, lacking formal specification. Moreover, the correctness of ParallelRaft has not been manually proven or formally checked. The purpose of the study is to provide a precise formal specification for ParallelRaft and to prove its correctness. Specifically, the following main contributions are accomplished. First, to clarify the relationship between Raft and ParallelRaft, ParallelRaft-SE (Sequential Execution) is proposed, which allows out-of-order commitment but prohibits out-of-order executions. Also, a refinement mapping from ParallelRaft-SE to Multi-Paxos is established. Second, it is discovered that ParallelRaft, according to its brief description in the literature, neglects the so-called ghost log entries phenomenon, which may violate the consistency among state machines. Therefore, based on ParallelRaft-SE, ParallelRaft-CE (Concurrent Execution) is proposed. ParallelRaft-CE avoids the ghost log entries phenomenon and ensures the consistency among state machines when executing concurrently by limiting parallelism in the commitment of log entries. The correctness of ParallelRaft-CE is proved manually. Finally, the formal specifications of ParallelRaft-SE and ParallelRaft-CE are provided by TLA+ (TLA stands for temporal logic of actions), and the refinement mapping from ParallelRaft-SE to Multi-Paxos and the correctness of ParallelRaft-CE are verified using the TLC model checker when the number of participants of the protocols is small.

ECRaft: A Raft Based Consensus Protocol for Highly Available and Reliable Erasure-Coded Storage Systems

CRaft: an Erasure-coding-supported Version of Raft for Reducing Storage Cost and Network Cost.

FlexRaft: Exploiting Flexible Erasure Coding for Minimum-Cost Consensus and Fast Recovery

Study on Data Redundancy Scheme in Kademlia Cloud Storage System

ACRS-Raft: A Raft Consensus Protocol for Adaptive Data Maintenance in the Metaverse Based On Cauchy Reed-Solomon Codes

Raft with Out-of-Order Executions

The CORE Storage Primitive: Cross-Object Redundancy for Efficient Data Repair & Access in Erasure Coded Storage

ASSER: an Efficient, Reliable, and Cost-Effective Storage Scheme for Object-Based Cloud Storage Systems

ESetStore: An Erasure-Coded Storage System With Fast Data Recovery

CFT-Forensics: High-Performance Byzantine Accountability for Crash Fault Tolerant Protocols

Erasure-Coded Hybrid Writes Based on Data Delta

A Comprehensive Repair Scheme for Distributed Storage Systems

Advanced Elastic Reed-Solomon Codes for Erasure-Coded Key Value Stores

An Adaptive Erasure-Coded Storage Scheme with an Efficient Code-Switching Algorithm.

Dependency Preserved Raft for Transactions

When Paxos Meets Erasure Code

Rack-Aware Regenerating Codes with Multiple Erasure Tolerance

Building Efficient and Available Distributed Transaction with Paxos-based Coding Consensus

Demand-Aware Erasure Coding for Distributed Storage Systems

A Layered Architecture for Erasure-Coded Consistent Distributed Storage

Data repair accelerating scheme for erasure-coded storage system based on FPGA and hierarchical parallel decoding structure