An End-to-end and Adaptive I/O Optimization Tool for Modern HPC Storage Systems

Bin Yang,Yanliang Zou,Weiguo Liu,Wei Xue
DOI: https://doi.org/10.1109/ipdps53621.2022.00128
2022-01-01
Abstract:Real-world large-scale applications expose more and more pressures to storage services of modern supercomputers. Supercomputers have been introducing new storage devices and technologies to meet the performance requirements of various applications, leading to more complicated architectures. High I/O demand of applications and the complicated and shared storage architectures make the issues, such as unbalanced load, I/O interference, system parameter configuration error, and node performance degradation, more frequently observed. And it is challenging to both achieve high I/O performance on application level and efficiently utilize scarce storage resources. We propose AIOT, an end-to-end and adaptive I/O optimization tool for HPC storage systems, which introduces effective I/O performance modeling and several active tuning strategies to improve both the I/O performance of applications and the utilization of storage resources. AIOT provides a global view of the whole storage system and searches for the optimal end-to-end I/O path through flow network modeling. Moreover, AIOT tunes system parameters across multiple layers of the storage system by using the automated identified application I/O behaviors and the instant status of the workload of storage system. We verified the effectiveness of AIOT for balancing I/O load, resolving I/O interference, improving I/O performance by configuring appropriate system parameters, and avoiding I/O performance degradation caused by abnormal nodes through quite a few real-world cases. AIOT has helped to save over ten millions of core-hours during the deployment on Sunway TaihuLight since July 2021. It's worth mentioning that our proposed AIOT is capable of managing other I/O optimization methods across various storage platforms.
What problem does this paper attempt to address?