Federated Learning with Extreme Label Skew: A Data Extension Approach

Saheed A. Tijani,Xingjun Ma,Ran Zhang,Frank Jiang,Robin Doss
DOI: https://doi.org/10.1109/IJCNN52387.2021.9533879
2021-01-01
Abstract:The real-world data sets often leveraged by Federated Learning (FL) applications are mostly non-independent and non-identically distributed (non-IID). This usually results from the diverse nature of the participating clients and their individual data-gathering contexts. An effective FL algorithm must incorporate the capability to produce a joint model that generalizes and captures these diverse patterns. In this work, we show how using some wild external data samples as placeholders for missing classes on client devices can alleviate the learning difficulty often posed by inbalance data distributions. Our exploration showed that this strategy enhances learning and can significantly boost test accuracy, particularly in extreme label skew scenarios. We recorded over 25% reduction in test error rate for the pathological non-IID partitions of the CIFAR10 data set. Our results are similar to those obtainable through boundexpanding strategies such as direct data sharing among clients. But unlike these techniques, our approach rules out the risk of exposing client's private data.
What problem does this paper attempt to address?