Towards Scalable And Reliable In-Memory Storage System: A Case Study With Redis

Shanshan Chen,Xiaoxin Tang,Hongwei Wang,Han Zhao,Minyi Guo
DOI: https://doi.org/10.1109/trustcom.2016.0255
2016-01-01
Abstract:In recent years, in-memory key-value storage systems have become more and more popular in solving real-time and interactive tasks. Compared with disks, memories have much higher throughput and lower latency which enables them to process data requests with much higher performance. However, since memories have much smaller capacity than disks, how to expand the capacity of in-memory storage system while maintain its high performance become a crucial problem. At the same time, since data in memories are non-persistent, the data may be lost when the system is down.In this paper, we make a case study with Redis, which is one popular in-memory key-value storage system. We find that although the latest release of Redis support clustering so that data can be stored in distributed nodes to support a larger storage capacity, its performance is limited by its decentralized design that clients usually need two connections to get their request served. To make the system more scalable, we propose a Clientside Key-to-Node Caching method that can help direct request to the right service node. Experimental results show that by applying this technique, it can significantly improve the system's performance by near 2 times.We also find that although Redis supports data replication on slave nodes to ensure data safety, it still gets a chance of losing a part of the data due to a weak consistency between master and slave nodes that its defective order of data replication and request reply may lead to losing data without notifying the client. To make it more reliable, we propose a Master-slave Semi Synchronization method which utilizes TCP protocol to ensure the order of data replication and request reply so that when a client receives an "OK" message, the corresponding data must have been replicated. With a significant improvement in data reliability, its performance overhead is limited within 5%.
What problem does this paper attempt to address?