Understanding the Bug Characteristics and Fix Strategies of Federated Learning Systems

Xiaohu Du,Xiao Chen,Jialun Cao,Ming Wen,Shing-Chi Cheung,Hai Jin
DOI: https://doi.org/10.1145/3611643.3616347
2023-01-01
Abstract:Federated learning (FL) is an emerging machine learning paradigm that aims to address the problem of isolated data islands. To preserve privacy, FL allows machine learning models and deep neural networks to be trained from decentralized data kept privately at individual devices. FL has been increasingly adopted in mission-critical fields such as finance and healthcare. However, bugs in FL systems are inevitable and may result in catastrophic consequences such as financial loss, inappropriate medical decision, and violation of data privacy ordinance. While many recent studies were conducted to understand the bugs in machine learning systems, there is no existing study to characterize the bugs arising from the unique nature of FL systems. To fill the gap, we collected 395 real bugs from six popular FL frameworks (Tensorflow Federated, PySyft, FATE, Flower, PaddleFL, and Fedlearner) in GitHub and StackOverflow, and then manually analyzed their symptoms and impacts, prone stages, root causes, and fix strategies. Furthermore, we report a series of findings and actionable implications that can potentially facilitate the detection of FL bugs.
What problem does this paper attempt to address?