Design and Performance Analysis of Partial Computation Output Schemes for Accelerating Coded Machine Learning

Xinping Xu,Xiaojun Lin,Lingjie Duan
DOI: https://doi.org/10.1109/tnse.2022.3228322
IF: 6.6
2023-02-25
IEEE Transactions on Network Science and Engineering
Abstract:Coded machine learning is a technique to use codes, such as -maximum-distance-separable ( -MDS) codes, to reduce the negative effect of stragglers by requiring out of workers to complete their computation. However, the MDS scheme incurs significant inefficiency in wasting stragglers' unfinished computation and keeping faster workers idle. Accordingly, this paper proposes to fragment each worker's load into small pieces and utilizes all workers' partial computation outputs (PCO) to reduce the overall runtime. While easy-to-implement, the theoretical runtime performance analysis of our PCO scheme is challenging. We present new bounds and asymptotic analysis to prove that our PCO scheme always reduces the overall runtime for any random distribution of workers' speeds, and its performance gain over the MDS scheme can be arbitrarily large under high variability of workers' speeds. Moreover, our analysis shows another advantage: the PCO scheme's performance is robust and insensitive to system parameter variations, while the MDS scheme has to know workers' speeds for carefully optimizing . Finally, our realistic experiments validate that the PCO scheme reduces the overall runtime from that of the MDS scheme by at least and we implement our PCO scheme for solving a typical machine learning problem of linear regression.
engineering, multidisciplinary,mathematics, interdisciplinary applications
What problem does this paper attempt to address?