A Holistic Evaluation Methodology for Configuring Production Data Centers

Yingying Wen,Yiming Zhang,Guanjie Cheng,Shuiguang Deng,Jianwei Yin
DOI: https://doi.org/10.1002/cpe.7257
2022-01-01
Abstract:SummaryPerformance evaluation is the basis for choosing appropriate system‐level configurations for large‐scale data centers. While the change of a system‐level configuration would impact lots of jobs in the data centers, traditional load‐testing benchmarks are not sufficient to support the decision‐making because they cannot accurately reproduce the complex behaviors of a large number of jobs. Therefore, we expect to further evaluate the system configuration based on the production environment. However, there are technical challenges, namely, the lack of a holistic evaluation method that can unite the evaluation results of various jobs, and the uninterruptable production environment that should not be affected by the evaluation procedure. To address these challenges, we propose a holistic performance evaluation methodology and design its implementation platform. We introduce a simple but powerful performance metric, ERU (effectiveness of resource usage), and combine the ERU of involved jobs into a summarized value to measure the effect of a configuration change. We validate our ERU metric by comparing it with the CPI (Cycle per Instruction) and QPS (query per second) metrics, deploy the platform to production data centers and demonstrate the effectiveness for measuring system‐level configurations of both software (JVM compiler update) and hardware (NUMA on/off) to save 14.44% and 11% resources respectively in advance.
What problem does this paper attempt to address?