A Quantitative Study of the Spatiotemporal I/O Burstiness of HPC Application

Wenxiang Yang,Xiangke Liao,Dezun Dong,Jie Yu
DOI: https://doi.org/10.1109/ipdps53621.2022.00133
2022-01-01
Abstract:Understanding the I/O characteristics of applications on supercomputers is crucial to paving the path for application optimization and system resource allocation. We collect and analyze I/O traces of applications on a production supercomputer and reconfirm that I/O bursts exist in most applications. What's more, we find that the I/O bursts not only occur in short periods of time but also originate from a minority of adjacent compute nodes allocated to the applications, which we call spatiotemporal I/O burstiness. The concentration of I/O traffic in both time and space dimension will make applications experience poor I/O performance and incur I/O inefficiency of the storage system. Although there are some solutions, such as burst buffer, can help alleviate such inefficiency, there is still no work that measures, analyzes and further predicts the application I/O characteristic in terms of spatiotemporal burstiness, which we think is vital for application-aware optimizations, including but not limited to burst buffer allocation and job scheduling. In this paper, we first propose a mathematical model to measure the spatiotemporal I/O burstiness. Then a thorough analysis on the spatiotemporal I/O characteristic of all applications on the system is elaborated. We further make use of the job's submitting path to explore the I/O characteristic similarity among jobs, based on which a machine learning classification algorithm is proposed to accurately predict the job spatiotemporal I/O burstiness in advance. With accurate job I/O characteristic at hand, some useful suggestions are put forward to hedge the impacts of the spatiotemporal I/O burstiness.
What problem does this paper attempt to address?