CheckMate: Evaluating Checkpointing Protocols for Streaming Dataflows

George Siachamis,Kyriakos Psarakis,Marios Fragkoulis,Arie van Deursen,Paris Carbone,Asterios Katsifodimos
2024-03-20
Abstract:Stream processing in the last decade has seen broad adoption in both commercial and research settings. One key element for this success is the ability of modern stream processors to handle failures while ensuring exactly-once processing guarantees. At the moment of writing, virtually all stream processors that guarantee exactly-once processing implement a variant of Apache Flink's coordinated checkpoints - an extension of the original Chandy-Lamport checkpoints from 1985. However, the reasons behind this prevalence of the coordinated approach remain anecdotal, as reported by practitioners of the stream processing community. At the same time, common checkpointing approaches, such as the uncoordinated and the communication-induced ones, remain largely unexplored.
Distributed, Parallel, and Cluster Computing,Databases
What problem does this paper attempt to address?