Mutiny! How does Kubernetes fail, and what can we do about it?

Marco Barletta,Marcello Cinque,Catello Di Martino,Zbigniew T. Kalbarczyk,Ravishankar K. Iyer
2024-04-17
Abstract:In this paper, we i) analyze and classify real-world failures of Kubernetes (the most popular container orchestration system), ii) develop a framework to perform a fault/error injection campaign targeting the data store preserving the cluster state, and iii) compare results of our fault/error injection experiments with real-world failures, showing that our fault/error injections can recreate many real-world failure patterns. The paper aims to address the lack of studies on systematic analyses of Kubernetes failures to date.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?