What problem does this paper attempt to address?

The paper attempts to address the problem of how to effectively conduct Federated Learning in the presence of random and dynamic communication failures. Specifically, the paper focuses on how to improve existing Federated Learning algorithms to ensure they can converge to the stable point of the global objective function when the uplink connection between the parameter server and clients is unreliable. ### Background and Problem Description 1. **Basic Concept of Federated Learning**: - Federated Learning is a distributed machine learning method where a parameter server and multiple clients (such as mobile devices) collaboratively train a model without sharing raw data. - Clients process local data and report updates to the parameter server in each training round, and the parameter server aggregates these updates to generate a new model. 2. **Communication Unreliability**: - In practical applications, Federated Learning systems are often deployed in congested and uncontrollable environments, such as mobile devices (smartphones, IoT devices). - The mobility of clients and environmental complexity lead to unreliable communication, which can vary significantly over time and across devices. - For example, when a smartphone passes through a tunnel on a train, the network connection to the base station may be lost. 3. **Limitations of Existing Research**: - Previous research assumes that communication failures are symmetric and have fixed statistical properties. - Some studies consider time-varying communication constraints but assume that the evolution of the client set follows a homogeneous Markov chain with a steady-state distribution. - These assumptions are difficult to hold in practical Federated Learning systems because the communication capabilities of clients are dynamically changing, and their dynamic characteristics are unknown and arbitrary. ### Main Contributions of the Paper 1. **Problem Identification**: - The authors theoretically and numerically demonstrate that when the connection probabilities of different clients are uneven, the most widely adopted Federated Learning algorithm—Federated Averaging (FedAvg)—cannot minimize the global objective function, even for simple convex loss functions. 2. **Proposed New Algorithm**: - The authors propose Federated Delayed Broadcast (FedPBC), a simple variant of FedAvg. In FedPBC, the parameter server delays the broadcast of the global model until the end of each round. - By delaying the broadcast, FedPBC can converge to the stable point of the non-convex global objective function even in the presence of uplink failures. - The delayed broadcast introduces an implicit gossip mechanism, allowing information mixing among clients with active links, thereby mitigating the bias caused by uneven and time-varying connection probabilities. 3. **Experimental Validation**: - The authors conducted extensive experiments on three real-world datasets to validate the effectiveness of the algorithm. - Experimental results show that FedPBC performs well under various unreliable uplink patterns, including time-varying and time-invariant Bernoulli, Markov, and periodic patterns. ### Conclusion The paper addresses the convergence problem of Federated Learning in the presence of random and dynamic communication failures by proposing the FedPBC algorithm. Experimental results validate the effectiveness of the algorithm, providing a new solution for the reliability and robustness of Federated Learning in practical applications.

Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics

Decentralized Federated Learning under Communication Delays.

GossipFL: A Decentralized Federated Learning Framework with Sparsified and Adaptive Communication

Vanishing Variance Problem in Fully Decentralized Neural-Network Systems

Decentralized Federated Learning: A Segmented Gossip Approach

Efficient Federated Learning against Heterogeneous and Non-stationary Client Unavailability

Asynchronous Federated Learning over Wireless Communication Networks.

Decentralized Federated Learning with Unreliable Communications

On the Convergence of Decentralized Federated Learning Under Imperfect Information Sharing

Federated Learning in the Presence of Adversarial Client Unavailability

Asynchronous Wireless Federated Learning with Probabilistic Client Selection

Take History as a Mirror in Heterogeneous Federated Learning

Robust Federated Learning in a Heterogeneous Environment.

Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks

Probabilistic Inference for Learning from Untrusted Sources

Scale-Robust Timely Asynchronous Decentralized Learning

Asynchronous Byzantine Federated Learning

Network Fault-tolerant and Byzantine-resilient Social Learning via Collaborative Hierarchical Non-Bayesian Learning

SF-CABD: Secure Byzantine fault tolerance federated learning on Non-IID data

Heterogeneous Federated Learning on a Graph

Fairness-aware Agnostic Federated Learning