Distributed Pruning Towards Tiny Neural Networks in Federated Learning

Hong Huang,Lan Zhang,Chaoyue Sun,Ruogu Fang,Xiaoyong Yuan,Dapeng Wu
2023-07-11
Abstract:Neural network pruning is an essential technique for reducing the size and complexity of deep neural networks, enabling large-scale models on devices with limited resources. However, existing pruning approaches heavily rely on training data for guiding the pruning strategies, making them ineffective for federated learning over distributed and confidential datasets. Additionally, the memory- and computation-intensive pruning process becomes infeasible for recourse-constrained devices in federated learning. To address these challenges, we propose FedTiny, a distributed pruning framework for federated learning that generates specialized tiny models for memory- and computing-constrained devices. We introduce two key modules in FedTiny to adaptively search coarse- and finer-pruned specialized models to fit deployment scenarios with sparse and cheap local computation. First, an adaptive batch normalization selection module is designed to mitigate biases in pruning caused by the heterogeneity of local data. Second, a lightweight progressive pruning module aims to finer prune the models under strict memory and computational budgets, allowing the pruning policy for each layer to be gradually determined rather than evaluating the overall model structure. The experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art approaches, particularly when compressing deep models to extremely sparse tiny models. FedTiny achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91% and the memory footprint by 94.01% compared to state-of-the-art methods.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to generate specialized tiny neural networks on distributed and confidential datasets in the federated learning environment to adapt to the memory and computing power of resource - constrained devices. Specifically: 1. **Existing pruning methods rely on training data**: Traditional neural network pruning methods rely heavily on training data to guide the pruning strategy, which makes them ineffective when dealing with confidential data distributed across multiple devices. 2. **Challenges of resource - constrained devices**: The pruning process itself is memory - and compute - intensive, which is not feasible for resource - constrained devices. 3. **Non - iid data problem**: The data distribution on different devices may be inconsistent, resulting in biased pruning results on the server side. To solve these problems, the authors propose FedTiny, a distributed pruning framework for federated learning, aiming to generate specialized tiny neural networks for resource - constrained devices. FedTiny introduces two key modules: 1. **Adaptive Batch Normalization Selection Module**: - Through an indirect pruning method, it evaluates the server - side pruning results on the device, thereby identifying a specialized coarse - pruning model. - The device only evaluates the server - side pruning and feeds back the batch normalization parameters to the server to reduce the computing and communication costs. 2. **Lightweight Progressive Pruning Module**: - It gradually adjusts the model structure and evaluates only a part of the parameters (for example, a single layer) each time, thereby significantly reducing the memory, computing, and communication costs. - By iteratively growing and pruning parameters, the model structure gradually approaches the optimal structure. Through these two modules, FedTiny can effectively perform pruning on resource - constrained devices and generate efficient tiny neural networks while maintaining high accuracy and low computing costs. ### Formula Summary - Batch normalization transformation formula: \[ \hat{x}_i=\frac{x_i - \mu}{\sqrt{\sigma^2+\epsilon}} \] where \(\mu\) and \(\sigma\) are the mean and standard deviation respectively, and \(\epsilon\) is a small constant. - Batch normalization parameter update formula: \[ \mu_t = \gamma\mu_{t - 1}+(1 - \gamma)\mu_i,\quad\sigma_t^2=\gamma\sigma_{t - 1}^2+(1 - \gamma)\sigma_i^2 \] where \(\gamma\) is the momentum coefficient and \(t\) is the number of training iterations. - Global batch normalization parameter aggregation formula in the Adaptive Batch Normalization Selection Module: \[ \mu^{(c)}=\frac{\sum_{k = 1}^K|D_k|\mu_k^{(c)}}{\sum_{k = 1}^K|D_k|},\quad\sigma^{(c)}=\frac{\sum_{k = 1}^K|D_k|\sigma_k^{(c)}}{\sum_{k = 1}^K|D_k|} \] where \(|D_k|\) represents the number of samples of the \(k\)-th device. - Gradient calculation formula in the Progressive Pruning Module: \[ \tilde{g}_{k,l}^t=\text{TopK}(g_{k,l}^t,a_l^t) \] where \(\text{TopK}(v,k)\) is a threshold function that replaces elements with absolute values less than the \(k\)-th largest absolute value with 0. Through these methods, FedTiny achieves efficient and low - resource - consumption neural network pruning in the federated learning environment.