DXRAM's Fault-Tolerance Mechanisms Meet High Speed I/O Devices

Kevin Beineke,Stefan Nothaas,Michael Schoettner
DOI: https://doi.org/10.48550/arXiv.1807.03562
2018-07-14
Abstract:In-memory key-value stores provide consistent low-latency access to all objects which is important for interactive large-scale applications like social media networks or online graph analytics and also opens up new application areas. But, when storing the data in RAM on thousands of servers one has to consider server failures. Only a few in-memory key-value stores provide automatic online recovery of failed servers. The most prominent example of these systems is RAMCloud. Another system with sophisticated fault-tolerance mechanisms is DXRAM which is optimized for small data objects. In this report, we detail the remote replication process which is based on logs, investigate selection strategies for the reorganization of these logs and evaluate the reorganization performance for sequential, random, zipf and hot-and-cold distributions in DXRAM. This is also the first time DXRAM's backup system is evaluated with high speed I/O devices, specifically with 56 GBit/s InfiniBand interconnect and PCI-e SSDs. Furthermore, we discuss the copyset replica distribution to reduce the probability for data loss and the adaptations to the original approach for DXRAM.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?