Shed+: Optimal Dynamic Speculation to Meet Application Deadlines in Cloud

Sultan Alamro,Maotong Xu,Tian Lan,Suresh Subramaniam
DOI: https://doi.org/10.1109/tnsm.2020.2986477
2020-01-01
IEEE Transactions on Network and Service Management
Abstract:With the growing deadline-sensitivity of cloud applications, adherence to specific deadlines is becoming increasingly crucial, particularly in shared clusters. A few slow tasks called stragglers can potentially adversely affect job execution times. Equally, inadequate slotting of data analytics applications could result in inappropriate resource deployment, ultimately damaging system performance. Against this backdrop, one effective way of tackling stragglers is by making extra attempts (or clones)1 for every single straggler after the submission of a job. This paper proposes Shed+, which is an optimization framework utilizing dynamic speculation that aims to maximize the jobs' PoCD (Probability of Completion before Deadline) by making full use of available resources. Notably, our work encompasses a new online scheduler that dynamically recomputes and reallocates resources during the course of a job's execution. According to our findings, Shed+ successfully leverages cloud resources and maximizes the percentage of jobs meeting their deadlines. In our experiments, we have seen this percentage for heavy load going up to 98% for Shed+ as opposed to nearly 68%, 40%, 35% and 37% for Shed, Dolly, Hopper and Hadoop with speculation enabled, respectively.
What problem does this paper attempt to address?