Resource allocation and aging priority-based scheduling of linear workflow applications with transient failures and selective imprecise computations

Helen D. Karatza,Georgios L. Stavrinides
DOI: https://doi.org/10.1007/s10586-023-04249-7
2024-02-01
Cluster Computing
Abstract:A wide range of applications in distributed environments have a linear structure, varying priorities, and may experience transient software failures. As the computational demands of such linear workflow (LW) jobs continue to grow, their efficient, fair, and fault-tolerant resource allocation and scheduling is becoming more challenging. To address this problem, we propose a fair and efficient scheduling approach, which considers that the priorities of the jobs age with time. We jointly use this scheduling strategy with three practical routing techniques, as well as two variants of an application-directed checkpointing scheme. The first variant of this scheme incorporates imprecise computations in a selective manner, whereas the second one does not use imprecise computations at all. Our aim is to dynamically allocate and schedule LW jobs with different priorities and transient software failures in a distributed system. Through extensive experimentation, we evaluate the system performance under the considered routing methods and checkpointing schemes, utilizing various load cases and failure probabilities. The simulation results showcase the impact of selective imprecise computations on the system performance, while providing insights into how the examined routing strategies perform in each of the investigated scenarios.
computer science, information systems, theory & methods
What problem does this paper attempt to address?