Abstract:Resource management in data centres continues to be a critical problem due to increased infrastructure complexity and dynamic workload conditions. Workload and energy consumption prediction are crucial for efficient resource management decisions in cloud data centres. Existing solutions only consider forecasting the usage of virtual machine resources such as CPU and memory; they do not consider provisioned resources (CPU and memory) and disk, network transmission rates, which significantly affect the energy consumption of the host as well. VM-level energy consumption can be estimated for automated energy management decisions in modern data centres. However, it is not easy to measure energy for VM devices such as CPU, memory, and disk at the software level. In this way, we propose an ML-based model to predict load and energy to aid resource management decisions. For modelling workload predictions, we investigated several distinctive ML algorithms such as Linear Regression (LR), Ridge Regression (RR), ARD Regression (ARDR), ElasticNet (EN) and deep learning (DL) algorithm like Gated Recurrent Unit (GRU). The model’s predictions are measured using standard evaluation metrics like root mean square error (RMSE). We have discovered that GRU has performed very well by accomplishing the most negligible RMSE value for all the workload performances based on experimental results. For energy state estimation, we propose four diverse clustering algorithms, including, semi-supervised affinity propagation based on transfer learning (TSSAP), CLA based on transfer learning (TCLA), kmeans based on transfer learning (TKmeans), P-teda based on transfer learning (TP-teda) to discover similar groups of VMs dependent on features that may influence energy consumption as opposed to estimating it for each VM. The TSSAP has acquired promising clustering accuracy with 87.48% and 53.80% in identifying the VM classes which have been calculated using standard metric such as micro-precision for the chosen workload in compassion to affinity propagation (AP) and the average of other proposed clustering algorithms respectively.

Workload Failure Prediction for Data Centers

Load Prediction for Data Centers Based on Database Service.

On Workload-Aware DRAM Failure Prediction in Large-Scale Data Centers.

Three-Way Ensemble Prediction for Workload in the Data Center

Workload Forecasting and Energy State Estimation in Cloud Data Centres: ML-centric Approach

Performance Analysis of Machine Learning Centered Workload Prediction Models for Cloud

Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters

Predicting machine behavior from Google cluster workload traces

Energy Prediction for MapReduce Workloads

Towards Thermal-Aware Workload Distribution in Cloud Data Centers Based on Failure Models

Prediction of the Running Time of Tasks Based on Load

Long Short Term Memory Recurrent Neural Network (LSTM-RNN) Based Workload Forecasting Model For Cloud Datacenters

Predicting Scheduling Failures in the Cloud

Energy efficient job scheduling with workload prediction on cloud data center

Reasoning Based Workload Performance Prediction in Cloud Data Centers

Holistic energy and failure aware workload scheduling in Cloud datacenters.

CWD: A Machine Learning based Approach to Detect Unknown Cloud Workloads

Cloud failure prediction based on traditional machine learning and deep learning

Analysis of Job Failure and Prediction Model for Cloud Computing Using Machine Learning

Online Job Failure Prediction in an HPC System

Job Failures in High Performance Computing Systems: A Large-Scale Empirical Study