Towards Energy-efficient Federated Learning Via INT8-based Training on Mobile DSPs

Jinliang Yuan,Shangguang Wang,hongyu li,Daliang Xu,Yuanchun Li,Mengwei Xu,Xuanzhe Liu
DOI: https://doi.org/10.1145/3589334.3645341
2024-01-01
Abstract:AI is making the Web an even cooler place, but also introduces serious privacy risks due to the extensive user data collection. Federated learning (FL), as a privacy-preserving machine learning paradigm, enables mobile devices to collaboratively learn a shared prediction model while keeping all training data on devices. However, a key obstacle towards practical cross-device FL training is huge energy consumption, especially for lightweight mobile devices. In this work, we perform the first-of-its-kind analysis of improving FL performance through low-precision training with an energy-friendly Digital Signal Processor (DSP) on mobile devices. We first demonstrate that directly integrating the state-of-the-art INT8 (8-bit integer) training algorithm and classic FL protocols will significantly degrade the model accuracy. Moreover, we observe that there are still unavoidable frequent quantization operations on devices that cause extreme load stress on DSP-enabled INT8 training. To address the above challenges, we present Q-FedUpdate, an FL framework that efficiently preserves model accuracy with ultra-low energy consumption. It maintains a global full-precision model and allows the tiny model updates to be continuously accumulated, instead of being erased by the quantization. Furthermore, it introduces pipelining technology to parallel CPU-based quantization and DSP-enabled training, which reduces the floating-point computation overhead of frequent data quantization. Extensive experiments show that Q-FedUpdate can effectively reduce the on-device energy consumption by 21×, and accelerate the FL convergence by 6.1× with only 2% accuracy loss.
What problem does this paper attempt to address?