Model Pruning for Distributed Learning over the Air

Zhongyuan Zhao,Kailei Xu,Wei Hong,mugen peng,Zhiguo Ding,Tony Q.S. Quek,Howard H. Yang
DOI: https://doi.org/10.1109/tsp.2024.3486169
IF: 4.875
2024-01-01
IEEE Transactions on Signal Processing
Abstract:Analog over-the-air (A-OTA) computing is an effective approach to achieving distributed learning among multiple end-user devices within a bandwidth-constrained spectrum. In this paradigm, users’ intermediate parameters, such as gradients, are modulated onto a set of common waveforms and concurrently transmitted to the parameter server. Benefiting from the superposition property of multi-access channels, the server can obtain an automatically aggregated global gradient from the received signal without decoding individual user’s information. Nonetheless, the scarcity of orthogonal waveforms constrains such a paradigm from adopting complex deep learning models. In this paper, we develop model pruning strategies for A-OTA distributed learning, balancing the tradeoff between communication efficiency and learning performance. Specifically, we design an importance measure to evaluate the contribution of each entry in the model parameter based on the noisy aggregated gradient introduced by A-OTA computing. We also derive an analytical expression for the training error bound, which shows that the proposed scheme can converge even when the aggregated gradient is corrupted by heavy-tailed interference with unbounded variance. We further improve the developed algorithm by incorporating the momentum method to (a) enhance the design of the importance measure and (b) accelerate the model convergence rate. Extensive experiments are conducted to validate the performance gains achieved by our proposed scheme and verify the correctness of analytical results.
What problem does this paper attempt to address?