FedDQ: Communication-Efficient Federated Learning with Descending Quantization

Linping Qu,Shenghui Song,Chi-Ying Tsui
DOI: https://doi.org/10.48550/arXiv.2110.02291
2022-11-10
Abstract:Federated learning (FL) is an emerging learning paradigm without violating users' privacy. However, large model size and frequent model aggregation cause serious communication bottleneck for FL. To reduce the communication volume, techniques such as model compression and quantization have been proposed. Besides the fixed-bit quantization, existing adaptive quantization schemes use ascending-trend quantization, where the quantization level increases with the training stages. In this paper, we first investigate the impact of quantization on model convergence, and show that the optimal quantization level is directly related to the range of the model updates. Given the model is supposed to converge with the progress of the training, the range of the model updates will gradually shrink, indicating that the quantization level should decrease with the training stages. Based on the theoretical analysis, a descending quantization scheme named FedDQ is proposed. Experimental results show that the proposed descending quantization scheme can save up to 65.2% of the communicated bit volume and up to 68% of the communication rounds, when compared with existing schemes.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the communication bottleneck problem in Federated Learning (FL). Specifically, Federated Learning requires frequent transmission of large - scale machine - learning models between the server and clients during the training process, which results in significant communication overhead. To reduce the amount of communication, existing techniques such as model compression and quantization have been proposed. However, most of these methods adopt fixed - bit quantization or adaptive quantization schemes with an increasing trend, and fail to fully consider the dynamic changes during the training process. The main contribution of the paper is to propose a quantization scheme with a decreasing trend (FedDQ) and prove that this scheme can more effectively reduce the amount of communication and accelerate the convergence speed. The following is a summary of the core content of the paper: 1. **Research Background and Problems**: - Federated Learning trains models through distributed data, but frequent communication leads to a communication bottleneck. - Existing methods such as fixed - bit quantization and adaptive quantization with an increasing trend fail to fully utilize the dynamic characteristics during the training process. 2. **Theoretical Analysis**: - The quantization level should be related to the model update range, and as the training phase progresses, the model update range gradually shrinks. - A quantization scheme with a decreasing trend (that is, the number of quantization bits decreases with the training phase) is theoretically superior. 3. **Proposed Method**: - A decreasing - trend quantization scheme named FedDQ is proposed. - By optimizing the allocation of the number of quantization bits, the amount of communication is minimized while maintaining high training accuracy. 4. **Experimental Results**: - Experiments show that FedDQ can save up to 65.2% of the communication bit amount and 68% of the communication rounds, which has significant advantages compared with the existing increasing - trend quantization schemes. ### Formula Summary - Definition of the model update range: \[ \text{range}(\Delta X_i)=\Delta X_{\max,i}-\Delta X_{\min,i} \] where \(\Delta X_{\max,i}=\max_{1\leq j\leq d}\Delta X_i(j)\), \(\Delta X_{\min,i}=\min_{1\leq j\leq d}\Delta X_i(j)\) - Calculation of the quantized value: \[ Q(\Delta X_i(j)) = \begin{cases} h', & \text{with probability }\frac{h'' - \Delta X_i(j)}{h'' - h'}\\ h'', & \text{otherwise} \end{cases} \] - Convergence theorem: \[ \frac{1}{K\tau}\sum_{m = 0}^{K - 1}\sum_{t = 0}^{\tau - 1}E\left\|\nabla f(\bar{X}_{m,t})\right\|^2\leq\frac{Ld}{n^2\eta K\tau}\sum_{m = 0}^{K - 1}\sum_{i\in[n]}\left(\frac{\text{range}_i^m}{s_i^m}\right)^2+\frac{2(f(X_0)-f^*)}{\eta K\tau}+\frac{\eta^2\sigma^2(n + 1)(\tau - 1)L^2}{n}+\frac{\eta\sigma^2L}{n} \] Through the above formulas and theoretical analysis, the paper proves that the quantization scheme with a decreasing trend can improve training efficiency while reducing the amount of communication.