Abstract:Distributed replicated databases play a crucial role in modern computer systems enabling scalable, fault-tolerant, and high-performance data management. However, achieving these qualities requires resolving a number of trade-offs between various properties during system design and operation. This paper reviews trade-offs in distributed replicated databases and provides a survey of recent research papers studying distributed data storage. The paper first discusses a compromise between consistency and latency that appears in distributed replicated data storages and directly follows from CAP and PACELC theorems. Consistency refers to the guarantee that all clients in a distributed system observe the same data at the same time. To ensure strong consistency, distributed systems typically employ coordination mechanisms and synchronization protocols that involve communication and agreement among distributed replicas. These mechanisms introduce additional overhead and latency and can dramatically increase the time taken to complete operations when replicas are globally distributed across the Internet. In addition, we study trade-offs between other system properties including availability, durability, cost, energy consumption, read and write latency, etc. In this paper we also provide a comprehensive review and classification of recent research works in distributed replicated databases. Reviewed papers showcase several major areas of research, ranging from performance evaluation and comparison of various NoSQL databases to suggest new strategies for data replication and putting forward new consistency models. In particular, we observed a shift towards exploring hybrid consistency models of causal consistency and eventual consistency with causal ordering due to their ability to strike a balance between operations ordering guarantees and high performance. Researchers have also proposed various consistency control algorithms and consensus quorum protocols to coordinate distributed replicas. Insights from this review can empower practitioners to make informed decisions in designing and managing distributed data storage systems as well as help identify existing gaps in the body of knowledge and suggest further research directions.

Quality-of-Data for Consistency Levels in Geo-replicated Cloud Data Stores

A Two-Layer Geo-Cloud Based Dynamic Replica Creation Strategy

Qos-Aware Indiscriminate Volume Storage Cloud

Consistency in Distributed Data Stores

QoSC: A QoS-Aware Storage Cloud Based on HDFS

Consistency Maintenance in Replication: A Novel Strategy Based on Diamond Topology in Cloud Storage

Consistency issue and related trade-offs in distributed replicated systems and databases: a review

Almost Strong Consistency: "Good Enough" in Distributed Storage Systems

RECODS: Replica consistency-on-demand store

An Application-Based Adaptive Replica Consistency for Cloud Storage

Stabilizer: Geo-Replication with User-defined Consistency

Flexible Consistency for Distributed Storage Systems

Minimizing Content Staleness in Dynamo-Style Replicated Storage Systems

A consistency maintenance approach for replicated data in storage grid

A novel replication consistency maintenance strategy in cloud storage system

MDCC: Multi-Data Center Consistency

Grouping-Based Consistency Protocol Design for End-Edge-Cloud Hierarchical Storage System

Latency Bounding by Trading off Consistency in NoSQL Store: A Staging and Stepwise Approach

A Unified, Practical, and Understandable Summary of Non-transactional Consistency Levels in Distributed Replication

Achieving Probabilistic Atomicity with Well-Bounded Staleness and Low Read Latency in Distributed Datastores

Probabilistically-Atomic 2-Atomicity: Enabling Almost Strong Consistency in Distributed Storage Systems