On Optimizing Traffic Imbalance in Large-scale Block-based Cloud Storage: Trace Analysis and Algorithm Design
Haoyu Mao,Yongkun Li,Wenzhe Zhu,Fei Li,Yinlong Xu
DOI: https://doi.org/10.1109/icpads56603.2022.00100
2022-01-01
Abstract:Cloud block storage (CBS) serves as the fundamental infrastructure of modern cloud computing services like the cloud disk service. Large-scale cloud block storage usually adopts a layered architecture, including a forwarding layer with a cluster of proxy servers as proxies to provide cloud disk abstraction, and a unified distributed storage engine providing persisted data storage. However, as all I/O traffics go through the proxy servers in the forwarding layer, there may be a severe traffic imbalance between the proxy servers, which finally degrades the performance of cloud disks. To investigate the traffic imbalance problem in the forwarding layer, we first conduct an in-depth analysis on the workload traces of a large-scale cloud block storage system in production. We find that both the traffic of individual cloud disks and the consolidated traffic of cloud disks at proxy servers are highly skewed and fluctuate violently and frequently at a fine-grained time granularity, and thus causing severe traffic imbalance. To address the traffic imbalance issue, we then develop a low-cost migration algorithm, weighted partial migration (WPM), and conduct simulation analysis via trace replay to study its effectiveness. Experiments under real-world workloads show that for 84.3% of clusters, WPM can make the imbalance factor be smaller than 3 (i.e., the maximum traffic at a proxy server is within 3$\times$ of the median traffic), with a very small migration cost by migrating only 0.1% segments.