Online Management for Edge-Cloud Collaborative Continuous Learning: A Two-timescale Approach
Shaohui Lin,Xiaoxi Zhang,Yupeng Li,Carlee Joe- Wong,Jingpu Duan,Dongxiao Yu,Yu Wu,Xu Chen
DOI: https://doi.org/10.1109/tmc.2024.3451715
IF: 6.075
2024-01-01
IEEE Transactions on Mobile Computing
Abstract:Deep learning (DL) powered real-time applications usually need continuous training using data streams generated over time and across different geographical locations. Enabling data offloading among computation nodes through model training is promising to mitigate the problem that devices generating large datasets may have low computation capability. However, offloading can compromise model convergence and incur communication costs, which must be balanced with the long-term cost spent on computation and model synchronization. Therefore, this paper proposes EdgeC3, a novel framework that can optimize the frequency of model aggregation and dynamic offloading for continuously generated data streams, navigating the trade-off between long-term accuracy and cost. We first provide a new error bound to capture the impacts of data dynamics that are varying over time and heterogeneous across devices, as well as quantifying varied data heterogeneity between local models and the global one. Based on the bound, we design a two-timescale online optimization framework. We periodically learn the synchronization frequency to adapt with uncertain future offloading and network changes. In the finer timescale, we manage online offloading by extending Lyapunov optimization techniques to handle an unconventional setting, where our long-term global constraint can have abruptly changed aggregation frequencies that are decided in the longer timescale. Finally, we theoretically prove the convergence of EdgeC3 by integrating the coupled effects of our two-timescale decisions, and we demonstrate its advantage through extensive experiments performing distributed DL training for different domains.