Abstract:Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to supercomputers (the Computing Continuum). Understanding the performance trade-offs of large-scale workflows deployed on such complex Edge-to-Cloud Continuum is challenging. To achieve this, one needs to systematically perform experiments, to enable their reproducibility and allow other researchers to replicate the study and the obtained conclusions on different infrastructures. This breaks down to the tedious process of reconciling the numerous experimental requirements and constraints with low-level infrastructure design <a class="link-external link-http" href="http://choices.To" rel="external noopener nofollow">this http URL</a> address the limitations of the main state-of-the-art approaches for distributed, collaborative experimentation, such as Google Colab, Kaggle, and Code Ocean, we propose KheOps, a collaborative environment specifically designed to enable cost-effective reproducibility and replicability of Edge-to-Cloud experiments. KheOps is composed of three core elements: (1) an experiment repository; (2) a notebook environment; and (3) a multi-platform experiment methodology.We illustrate KheOps with a real-life Edge-to-Cloud application. The evaluations explore the point of view of the authors of an experiment described in an article (who aim to make their experiments reproducible) and the perspective of their readers (who aim to replicate the experiment). The results show how KheOps helps authors to systematically perform repeatable and reproducible experiments on the Grid5000 + FIT IoT LAB testbeds. Furthermore, KheOps helps readers to cost-effectively replicate authors experiments in different infrastructures such as Chameleon Cloud + CHI@Edge testbeds, and obtain the same conclusions with high accuracies (> 88% for all performance metrics).

Automating chaos experiments in production

A Platform for Automating Chaos Experiments

Chaos Engineering

Chaos as a Software Product Line—A platform for improving open hybrid‐cloud systems resiliency

Automated metrics calculation in a dynamic heterogeneous environment

Engineering for a Science-Centric Experimentation Platform

Chaos Engineering of Ethereum Blockchain Clients

Chaos Engineering: A Multi-Vocal Literature Review

Chaos Engineering for Enhanced Resilience of Cyber-Physical Systems

OXN -- Automated Observability Assessments for Cloud-Native Applications

Uncovering Bugs In Distributed Storage Systems During Testing (Not In Production!)

Harvesting Randomness to Optimize Distributed Systems

KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

On Evaluating Self-Adaptive and Self-Healing Systems using Chaos Engineering

Automating a Massive Open Online Course's Production

Observability and Chaos Engineering on System Calls for Containerized Applications in Docker

A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling

CHAOS: Accurate and Realtime Detection of Aging-Oriented Failure Using Entropy.

Efficient Autonomy Validation in Simulation with Adaptive Stress Testing

Online Data-Driven Safety Certification for Systems Subject to Unknown Disturbances

An Execution Environment for Robust Parallel Computing on Volunteer PC Grids.