Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs

Mengze Wei,Wenyi Zhao,Quan Chen,Hao Dai,Jingwen Leng,Chao Li,Wenli Zheng,Minyi Guo
DOI: https://doi.org/10.1016/j.jpdc.2020.03.009
IF: 4.542
2020-01-01
Journal of Parallel and Distributed Computing
Abstract:Predicting performance degradation of a GPU application when it is co-located with other applications on a spatial multitasking GPU without prior application knowledge is essential in public Clouds. Prior work mainly targets CPU co-location, and is inaccurate and/or inefficient for predicting performance of applications at co-location on spatial multitasking GPUs. Our investigation shows that hardware event statistics caused by co-located applications, which can be collected with negligible overhead, strongly correlate with their slowdowns. Based on this observation, we present Themis, an online slowdown predictor that can precisely and efficiently predict application slowdown without prior application knowledge. We first train a precise slowdown model offline using hardware event statistics collected from representative co-locations. When new applications co-run, Themis collects event statistics and predicts their slowdowns simultaneously. Our evaluation shows that Themis has negligible runtime overhead and can precisely predict application-level slowdown with prediction error smaller than 9.5%. Based on Themis, we also implement an SM allocation engine to rein in application slowdown at co-location. Case studies show that the engine successfully enforces fair sharing and QoS.
What problem does this paper attempt to address?