Analyzing the impact of various parameters on job scheduling in the Google cluster dataset

DOI: https://doi.org/10.1007/s10586-024-04377-8
2024-03-30
Cluster Computing
Abstract:Cloud architecture and its operations interest both general consumers and researchers. Google, as a technology giant, offers cloud services globally. This paper analyzes the Google cluster usage trace, focusing on three key aspects: task execution times, rescheduling frequency, and the relationship between task priority and rescheduling. Firstly, we examine how memory and processor performance impact task execution times across different machines. Next, we investigate how the number of task constraints influences rescheduling frequency and overall environmental efficiency. Furthermore, we analyze how task priority affects rescheduling and explore its correlation with task constraints. The results reveal that doubling the memory size can accelerate tasks by a factor of nine and that 90% of rescheduling is associated with tasks having less than seven constraints. We aim to enhance data center performance by identifying bottlenecks in the Google Cluster Dataset and providing recommendations for all cloud service providers. Our key findings indicate that memory plays a more significant role than the processor, and tasks with higher constraints have a less pronounced impact on rescheduling than anticipated.
computer science, information systems, theory & methods
What problem does this paper attempt to address?