Two stage cluster for resource optimization with Apache Mesos

Gourav Rattihalli,Pankaj Saha,Madhusudhan Govindaraju,Devesh Tiwari
DOI: https://doi.org/10.48550/arXiv.1905.09166
2019-05-22
Distributed, Parallel, and Cluster Computing
Abstract:As resource estimation for jobs is difficult, users often overestimate their requirements. Both commercial clouds and academic campus clusters suffer from low resource utilization and long wait times as the resource estimates for jobs, provided by users, is inaccurate. We present an approach to statistically estimate the actual resource requirement of a job in a Little cluster before the run in a Big cluster. The initial estimation on the little cluster gives us a view of how much actual resources a job requires. This initial estimate allows us to accurately allocate resources for the pending jobs in the queue and thereby improve throughput and resource utilization. In our experiments, we determined resource utilization estimates with an average accuracy of 90% for memory and 94% for CPU, while we make better utilization of memory by an average of 22% and CPU by 53%, compared to the default job submission methods on Apache Aurora and Apache Mesos.
What problem does this paper attempt to address?