When Cloud Storage Meets RDMA.
Yixiao Gao,Qiang Li,Lingbo Tang,Yongqing Xi,Pengcheng Zhang,Wenwen Peng,Bo Li,Yaohui Wu,Shaozong Liu,Lei Yan,Fei Feng,Yan Zhuang,Fan Liu,Pan Liu,Xingkui Liu,Zhongjie Wu,Junping Wu,Zheng Cao,Chen Tian,Jinbo Wu,Jiaji Zhu,Haiyong Wang,Dennis Cai,Jiesheng Wu
2021-01-01
Abstract:A production-level cloud storage system must be high performing and readily available. It should also meet a Service-Level Agreement (SLA). The rapid advancement in storage media has left networking lagging behind, resulting in a major performance bottleneck for new cloud storage generations. Remote Direct Memory Access (RDMA) running on lossless fabrics can potentially overcome this bottleneck. In this paper, we present our experience in introducing RDMA into the storage networks of Pangu, a cloud storage system developed by Alibaba. Since its introduction in 2009, it has proven to be crucial for Alibaba's core businesses. In addition to the performance, availability, and SLA requirements, the deployment planning of Pangu at the production scale should consider storage volume and hardware costs. We present an RDMA-enabled Pangu system that exhibits superior performance, with the availability and SLA standards matching those of traditional TCP-backed versions. RDMA-enabled Pangu has been demonstrated to successfully serve numerous online mission-critical services across four years, including several important shopping festivals.