Abstract:Abstract Cloud computing refers to maximizing efficiency by sharing computational and storage resources, while data-parallel systems exploit the resources available in the cloud to perform parallel transformations over large amounts of data. In the same line, considerable emphasis has been recently given to two apparently disjoint research topics: data-parallel , and eventually consistent, distributed systems. Declarative networking has been recently proposed to ease the task of programming in the cloud, by allowing the programmer to express only the desired result and leave the implementation details to the responsibility of the run-time system. In this context, we deem it appropriate to propose a study on a logic-programming-based computational model for eventually consistent, data-parallel systems, the keystone of which is provided by the recent finding that the class of programs that can be computed in an eventually consistent, coordination-free way is that of monotonic programs . This principle is called Consistency and Logical Monotonicity (CALM) and has been proven by Ameloot et al. for distributed, asynchronous settings. We advocate that CALM should be employed as a basic theoretical tool also for data-parallel systems, wherein computation usually proceeds synchronously in rounds and where communication is assumed to be reliable. We deem this problem relevant and interesting, especially for what concerns parallel dataflow optimizations . Nowadays, we are in fact witnessing an increasing concern about understanding which properties distinguish synchronous from asynchronous parallel processing, and when the latter can replace the former. It is general opinion that coordination-freedom can be seen as a major discriminant factor. In this work, we make the case that the current form of CALM does not hold in general for data-parallel systems, and show how, using novel techniques, the satisfiability of the CALM principle can still be obtained although just for the subclass of programs called connected monotonic queries . We complete the study with considerations on the relationships between our model and the one employed by Ameloot et al. , showing that our techniques subsume the latter when the synchronization constraints imposed on the system are loosened.

Fault tolerance via idempotence

Reliable Actors with Retry Orchestration

A Behavioral Theory for Distributed Systems with Weak Recovery

Fundamentals of fault-tolerant distributed computing in asynchronous environments

Reliability and Fault-Tolerance by Choreographic Design

Towards Distributed Software Resilience in Asynchronous Many-Task Programming Models

Soft Error Resilience and Failure Recovery for Continuum Dynamics Applications

Exploiting Universal Redundancy

Fault Tolerance in Distributed Systems using Fused State Machines

Invited Paper: Failure is (literally) an Option: Atomic Commitment vs Optionality in Decentralized Finance

Fault-tolerance in a distributed management system: a case study

Failover of Software Services with State Replication

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism

A fault-tolerance shim for serverless computing

Application fault tolerance with armor middleware : Recovery-Oriented Computing

Non-determinism in Byzantine Fault-Tolerant Replication

Building a Fault Tolerant Application Using the GASPI Communication Layer

Algorithmic Based Fault Tolerance Applied to High Performance Computing

A model of actors and grey failures

A datalog-based computational model for coordination-free, data-parallel systems

Invalidation-Based Protocols for Replicated Datastores