TimeLink: enabling dynamic runtime prediction for Flink iterative jobs
Xiaofei Yue,Qingyang Ding,Jianming Zhu,Yanbing Ding
DOI: https://doi.org/10.1007/s11227-024-06085-x
IF: 3.3
2024-04-14
The Journal of Supercomputing
Abstract:With the increasing growth of data scale and computing complexity, Flink, a novel distributed computing system, has been applied in various scenarios (e.g., machine learning) due to its excellent iterative nature. Predicting the runtime of Flink iterative jobs is critical to optimizing their performance. However, existing offline works generally ignore relevant runtime information, such as cluster state variations and inter-iteration dependencies, resulting in high actual prediction errors. Online methods, on the other hand, have a non-negligible time overhead. In light of this, we propose TimeLink , a dynamic runtime prediction algorithm for Flink iterative jobs. Its key idea consists of three stages: (1) TimeLink incorporates both offline and online execution features during runtime to measure the fine-grained similarity of iterative jobs, (2) it matches historical jobs with similar performance consumption to the current running iterative job in real time, and (3) its remaining runtime is predicted by combining the continuity of runtime bias between completed supersteps of matched jobs and the current iterative job. We implement TimeLink and evaluate it using realistic iterative workloads. The experimental results show that TimeLink exhibits relative average prediction errors of 5.91–12.86%. Moreover, it outperforms existing solutions with an improvement of over 6.24% in prediction accuracy.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture