Predicting Output Performance of a Petascale Supercomputer

Bing Xie,Yezhou Huang,Jeffrey S. Chase,Jong Youl Choi,Scott Klasky,Jay F. Lofstead,Sarp Oral
DOI: https://doi.org/10.1145/3078597.3078614
2017-01-01
Abstract:In this paper, we develop a predictive model useful for output performance prediction of supercomputer file systems under production load. Our target environment is Titan---the 3rd fastest supercomputer in the world---and its Lustre-based multi-stage write path. We observe from Titan that although output performance is highly variable at small time scales, the mean performance is stable and consistent over typical application run times. Moreover, we find that output performance is non-linearly related to its correlated parameters due to interference and saturation on individual stages on the path. These observations enable us to build a predictive model of expected write times of output patterns and I/O configurations, using feature transformations to capture non-linear relationships. We identify the candidate features based on the structure of the Lustre/Titan write path, and use feature transformation functions to produce a model space with 135,000 candidate models. By searching for the minimal mean square error in this space we identify a good model and show that it is effective.
What problem does this paper attempt to address?