Abstract:Real-time machine learning (ML) has recently attracted significant interest due to its potential to support instantaneous learning, adaptation, and decision making in a wide range of application domains, including self-driving vehicles, intelligent transportation, and industry automation. In this paper, we investigate real-time ML in a federated edge intelligence (FEI) system, an edge computing system that implements federated learning (FL) solutions based on data samples collected and uploaded from decentralized data networks, e.g., Internet-of-Things (IoT) and/or wireless sensor networks. FEI systems often exhibit heterogenous communication and computational resource distribution, as well as non-i.i.d. data samples arrived at different edge servers, resulting in long model training time and inefficient resource utilization. Motivated by this fact, we propose a time-sensitive federated learning (TS-FL) framework to minimize the overall run-time for collaboratively training a shared ML model with desirable accuracy. Training acceleration solutions for both TS-FL with synchronous coordination (TS-FL-SC) and asynchronous coordination (TS-FL-ASC) are investigated. To address the straggler effect in TS-FL-SC, we develop an analytical solution to characterize the impact of selecting different subsets of edge servers on the overall model training time. A server dropping-based solution is proposed to allow some slow-performance edge servers to be removed from participating in the model training if their impact on the resulting model accuracy is limited. A joint optimization algorithm is proposed to minimize the overall time consumption of model training by selecting participating edge servers, the local epoch number (the number of model training iterations per coordination), and the data batch size (the number of data samples for each model training iteration). Motivated by the fact that data samples at the slowest edge server may exhibit special characteristics that cannot be removed from model training, we develop an analytical expression to characterize the impact of both staleness effect of asynchronous coordination and straggler effect of FL on the time consumption of TS-FL-ASC. We propose a load forwarding-based solution that allows a slow edge server to offload part of its training samples to trusted edge servers with higher processing capability. We develop a hardware prototype to evaluate the model training time of a heterogeneous FEI system. Experimental results show that our proposed TS-FL-SC and TS-FL-ASC can provide up to 63% and 28% of reduction, in the overall model training time, respectively, compared with traditional FL solutions.

DetFed: Dynamic Resource Scheduling for Deterministic Federated Learning over Time-sensitive Networks

Low-Latency Federated Learning With DNN Partition in Distributed Industrial IoT Networks

Resource Efficient Asynchronous Federated Learning for Digital Twin Empowered IoT Network

Optimizing Federated Learning in Distributed Industrial IoT: A Multi-Agent Approach.

T-FedHA: A Trusted Hierarchical Asynchronous Federated Learning Framework for Internet of Things

Time-triggered Federated Learning over Wireless Networks

FedDCT: A Dynamic Cross-Tier Federated Learning Framework in Wireless Networks

Spectrum and Computing Resource Management for Federated Learning in Distributed Industrial IoT.

Time-sensitive Learning for Heterogeneous Federated Edge Intelligence

Resources-efficient Adaptive Federated Learning for Digital Twin-Enabled IIoT

Adaptive Training and Aggregation for Federated Learning in Multi-tier Computing Networks

Speed Up Federated Learning in Heterogeneous Environment: A Dynamic Tiering Approach

FedDdrl: Federated Double Deep Reinforcement Learning for Heterogeneous IoT with Adaptive Early Client Termination and Local Epoch Adjustment

AsyFed: Accelerated Federated Learning with Asynchronous Communication Mechanism

Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better

Optimizing Federated Learning With Deep Reinforcement Learning for Digital Twin Empowered Industrial IoT

Heterogeneous Training Intensity for Federated Learning: A Deep Reinforcement Learning Approach

Device Scheduling and Resource Allocation for Federated Learning under Delay and Energy Constraints

Digital Twin-Assisted Federated Learning Service Provisioning over Mobile Edge Networks

FedStar: Efficient Federated Learning on Heterogeneous Communication Networks

Energy-Efficient Federated Learning Framework for Digital Twin-Enabled Industrial Internet of Things