FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning

Xinyuan Ji,Zhaowei Zhu,Wei Xi,Olga Gadyatskaya,Zilong Song,Yong Cai,Yang Liu
2024-03-25
Abstract:Federated Learning (FL) heavily depends on label quality for its performance. However, the label distribution among individual clients is always both noisy and heterogeneous. The high loss incurred by client-specific samples in heterogeneous label noise poses challenges for distinguishing between client-specific and noisy label samples, impacting the effectiveness of existing label noise learning approaches. To tackle this issue, we propose FedFixer, where the personalized model is introduced to cooperate with the global model to effectively select clean client-specific samples. In the dual models, updating the personalized model solely at a local level can lead to overfitting on noisy data due to limited samples, consequently affecting both the local and global models' performance. To mitigate overfitting, we address this concern from two perspectives. Firstly, we employ a confidence regularizer to alleviate the impact of unconfident predictions caused by label noise. Secondly, a distance regularizer is implemented to constrain the disparity between the personalized and global models. We validate the effectiveness of FedFixer through extensive experiments on benchmark datasets. The results demonstrate that FedFixer can perform well in filtering noisy label samples on different clients, especially in highly heterogeneous label noise scenarios.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the model performance degradation in Federated Learning (FL) due to the heterogeneity of label noise. Specifically: 1. **Heterogeneity of label noise**: In federated learning, the data labels of each client may contain noise, and the distribution of these noises varies from client to client. This heterogeneity makes it difficult to distinguish between client - specific samples and noisy - label samples, thus affecting the effectiveness of existing label - noise learning methods. 2. **Over - fitting problem**: Due to the limited amount of client data, personalized models are prone to over - fit to noisy - label data, which in turn affects the performance of the global model. To solve the above problems, the authors propose the FedFixer method. Its main goal is to effectively select clean client - specific samples by introducing a dual - model structure (combining the global model and the personalized model), and to mitigate the impact of over - fitting through two regularizers. The specific contributions are as follows: - **Dual - model structure**: A dual - model structure is designed, which can adapt to the heterogeneity of label noise among different clients. - **Regularizers**: - **Confidence Regularizer**: Used to reduce uncertain predictions caused by label noise. - **Distance Regularizer**: Constrains the difference between the personalized model and the global model to prevent the personalized model from over - fitting local noisy data. - **Experimental verification**: Through extensive experiments on multiple benchmark datasets, the effectiveness of FedFixer in dealing with the heterogeneity of label noise has been verified. Especially in highly heterogeneous label - noise scenarios, it performs better than existing methods. ### Formula summary - **Loss function**: \[ F_k(w)=\frac{1}{\bar{n}_k}\sum_{n\in [n_k]}v_n\cdot\ell(x_n,\tilde{y}_n;\theta_k)+\frac{\lambda}{2}\|\theta_k - w\|^2 \] where \(v_n\in\{0, 1\}\) indicates whether sample \(n\) is a clean sample, and \(\ell(\cdot)\) is the loss function. - **Confidence Regularizer**: \[ \ell_{CR}(f(x_n)) := -\beta\cdot E_{eY|f_D}[\ell_{CE}(f(x_n),eY)] \] where \(\beta\geq0\) is a hyper - parameter, and \(P(eY|eD)\) is the prior probability determined based on the noisy dataset. - **Distance Regularizer**: \[ \frac{\lambda}{2}\|\theta_k - w\|^2 \] where \(\theta_k\) is the personalized model of client \(k\), and \(\lambda\in(0,+\infty)\) is a regularization parameter. Through these methods, FedFixer can effectively deal with the heterogeneity problem of label noise in the federated learning environment and improve the generalization performance of the model.