Radio: Reconciling Disk I/O Interference in a Para-virtualized Cloud

Guangwen Yang,Liana Wang,Wei Xue
DOI: https://doi.org/10.1109/cloud55607.2022.00034
2022-01-01
Abstract:As more virtual machines (VMs) are consolidated in the cloud system, interference among VMs sharing underlying resources may occur more frequently than ever. In particular, certain VMs’ disk I/O performance gets impacted, leading to related cloud services being seriously compromised. Existing interference analysis approaches cannot guarantee desired results due to 1) lack of effective techniques for characterizing disk I/O interference and 2) considerable runtime overhead for determining interference and related culprits. To overcome these barriers, we present Radio, an end-to-end analysis tool for disk I/O interference diagnostics in a para-virtualized cloud. Radio quantifies the dynamic changes in I/O strength across virtual CPUs (vCPUs), constructs the performance repository to efficiently identify VMs’ abnormal behaviors, and then exploits interference heat maps and non-constant correlation approaches to infer the culprits of interference. With Radio's deployment at the National Supercomputing Center in Wuxi for more than 10 months, we demonstrate its effectiveness in real-world use cases on the cloud system with more than 300 VMs deployed. Radio can effectively analyze the interference issues within 20 seconds, incurring only 0.2% extra CPU overhead on the host machine. With this achievement, Radio has successfully assisted system administrators in reducing the daily incidence of interference from more than 65% to less than 10% and improving the overall disk throughput of the cloud system by more than 27.5%.
What problem does this paper attempt to address?