Abstract:In practical federated learning (FL) systems, the presence of malicious Byzantine attacks and data heterogeneity often introduces biases into the learning process. However, existing Byzantine-robust methods typically only achieve a compromise between adaptability to different loss function types (including both strongly convex and non-convex) and robustness to heterogeneous datasets, but with non-zero optimality gap. Moreover, this compromise often comes at the cost of high computational complexity for aggregation, which significantly slows down the training speed. To address this challenge, we propose a federated learning approach called Federated Normalized Gradients Algorithm (Fed-NGA). Fed-NGA simply normalizes the uploaded local gradients to be unit vectors before aggregation, achieving a time complexity of $\mathcal{O}(pM)$, where $p$ represents the dimension of model parameters and $M$ is the number of participating clients. This complexity scale achieves the best level among all the existing Byzantine-robust methods. Furthermore, through rigorous proof, we demonstrate that Fed-NGA transcends the trade-off between adaptability to loss function type and data heterogeneity and the limitation of non-zero optimality gap in existing literature. Specifically, Fed-NGA can adapt to both non-convex loss functions and non-IID datasets simultaneously, with zero optimality gap at a rate of $\mathcal{O} (1/T^{\frac{1}{2} - \delta})$, where T is the iteration number and $\delta \in (0,\frac{1}{2})$. In cases where the loss function is strongly convex, the zero optimality gap achieving rate can be improved to be linear. Experimental results provide evidence of the superiority of our proposed Fed-NGA on time complexity and convergence performance over baseline methods.

FedGSNR: Accelerating Federated Learning on Non-IID Data via Maximum Gradient Signal to Noise Ratio

FedDGP: Disentangling Global and Personal Models for Federated Learning

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

FedAgg: Adaptive Federated Learning with Aggregated Gradients

Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach

Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients

Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets

Gradient-Congruity Guided Federated Sparse Training

FedVeca: Federated Vectorized Averaging on Non-IID Data with Adaptive Bi-directional Global Objective

On the Convergence of FedAvg on Non-IID Data

Fine-tuning Global Model Via Data-Free Knowledge Distillation for Non-IID Federated Learning

Collaboratively Learning Federated Models from Noisy Decentralized Data

FedLGA: Towards System-Heterogeneity of Federated Learning via Local Gradient Approximation

Node Selection Toward Faster Convergence for Federated Learning on Non-IID Data

Federated Learning on Non-Independent and Identically Distributed Data

Accelerating Federated Learning with Adaptive Extra Local Updates Upon Edge Networks

FedGK: Communication-Efficient Federated Learning through Group-Guided Knowledge Distillation

FedNorm: an Efficient Federated Learning Framework with Dual Heterogeneity Coexistence on Edge Intelligence Systems.

FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction

Generalized Federated Learning via Gradient Norm-Aware Minimization and Control Variables

Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data