Optimizing I/O Performance Through Effective Vcpu Scheduling Interference Management

Liang Wang,Jinzhe Yang,Jidong Zhai,Guangwen Yang
DOI: https://doi.org/10.1109/tpds.2023.3329298
2024-01-01
Abstract:Virtual machines (VMs) heavily rely on virtual CPUs (vCPUs) scheduling to achieve efficient I/O performance. The vCPU scheduling interference can cause inconsistent scheduling latency and degraded I/O performance, potentially compromising the services provided by affected VMs. Existing solutions have limitations, such as inefficiency in diagnosing interference issues or imposing undesired side effects on cloud systems. To address these challenges, we present Otter, a holistic technique for optimizing I/O performance in the presence of vCPU scheduling interference. Otter employs innovative methods to enhance interference diagnosis efficiency. First, we propose lightweight methods to measure the dynamic changes in scheduling latencies for co-running vCPUs, ensuring both flexibility and accuracy. Second, we propose fine-grained quantification methods to timely determine the interference, with low false positive and false negative rates. Third, we identify interference patterns that aid in analyzing the root causes of interference and preventing similar issues from recurring. Otter has been operational for one year in the production cloud at the National Supercomputing Center (Wuxi). It diagnoses and helps fix more than 470 vCPU scheduling interference-related issues, resulting in a 19.6% improvement in cloud service I/O performance with negligible overhead in production.
What problem does this paper attempt to address?