An Experimental Study of Data Heterogeneity in Federated Learning Methods for Medical Imaging

Liangqiong Qu,Niranjan Balachandar,Daniel L Rubin
DOI: https://doi.org/10.48550/arXiv.2107.08371
IF: 5.414
2021-07-18
Machine Learning
Abstract:Federated learning enables multiple institutions to collaboratively train machine learning models on their local data in a privacy-preserving way. However, its distributed nature often leads to significant heterogeneity in data distributions across institutions. In this paper, we investigate the deleterious impact of a taxonomy of data heterogeneity regimes on federated learning methods, including quantity skew, label distribution skew, and imaging acquisition skew. We show that the performance degrades with the increasing degrees of data heterogeneity. We present several mitigation strategies to overcome performance drops from data heterogeneity, including weighted average for data quantity skew, weighted loss and batch normalization averaging for label distribution skew. The proposed optimizations to federated learning methods improve their capability of handling heterogeneity across institutions, which provides valuable guidance for the deployment of federated learning in real clinical applications.
What problem does this paper attempt to address?