Abstract:We study federated machine learning at the wireless network edge, where limited power wireless devices, each with its own dataset, build a joint model with the help of a remote parameter server (PS). We consider a bandwidth-limited fading multiple access channel (MAC) from the wireless devices to the PS, and propose various techniques to implement distributed stochastic gradient descent (DSGD). We first propose a digital DSGD (D-DSGD) scheme, in which one device is selected opportunistically for transmission at each iteration based on the channel conditions; the scheduled device quantizes its gradient estimate to a finite number of bits imposed by the channel condition, and transmits these bits to the PS in a reliable manner. Next, motivated by the additive nature of the wireless MAC, we propose a novel analog communication scheme, referred to as the compressed analog DSGD (CA-DSGD), where the devices first sparsify their gradient estimates while accumulating error, and project the resultant sparse vector into a low-dimensional vector for bandwidth reduction. Numerical results show that D-DSGD outperforms other digital approaches in the literature; however, in general the proposed CA-DSGD algorithm converges faster than the D-DSGD scheme and other schemes in the literature, and reaches a higher level of accuracy. We have observed that the gap between the analog and digital schemes increases when the datasets of devices are not independent and identically distributed (i.i.d.). Furthermore, the performance of the CA-DSGD scheme is shown to be robust against imperfect channel state information (CSI) at the devices. Overall these results show clear advantages for the proposed analog over-the-air DSGD scheme, which suggests that learning and communication algorithms should be designed jointly to achieve the best end-to-end performance in machine learning applications at the wireless edge.

Stochastic gradient compression for federated learning over wireless network

Adaptive Batchsize Selection and Gradient Compression for Wireless Federated Learning

Secure Federated Learning over Wireless Communication Networks with Model Compression

Analog Gradient Aggregation for Federated Learning Over Wireless Networks: Customized Design and Convergence Analysis

Secure Federated Learning with Model Compression.

Federated Learning over Wireless Fading Channels

Communication-Efficient Federated Learning via Quantized Compressed Sensing

Lazily Aggregated Quantized Gradient Innovation for Communication-Efficient Federated Learning.

Wyner-Ziv Gradient Compression for Federated Learning

Communication-efficient Federated Learning Via Quantized Clipped SGD

Federated Split Learning with Model Pruning and Gradient Quantization in Wireless Networks

Data-Aware Gradient Compression for FL in Communication-Constrained Mobile Computing

AC-SGD: Adaptively Compressed SGD for Communication-Efficient Distributed Learning

Hyper-Sphere Quantization: Communication-Efficient SGD for Federated Learning

DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

Sparse Gradient Compression For Distributed Sgd

Efficient Wireless Federated Learning via Low-Rank Gradient Factorization

Stochastic Controlled Averaging for Federated Learning with Communication Compression

Decentralized Federated Learning: Balancing Communication and Computing Costs

Snowball: Energy Efficient and Accurate Federated Learning with Coarse-to-Fine Compression over Heterogeneous Wireless Edge Devices

Federated Learning over Wireless Device-to-Device Networks: Algorithms and Convergence Analysis