Abstract:Billions of IoT devices lacking proper security mechanisms have been manufactured and deployed for the last years, and more will come with the development of Beyond 5G technologies. Their vulnerability to malware has motivated the need for efficient techniques to detect infected IoT devices inside networks. With data privacy and integrity becoming a major concern in recent years, increasing with the arrival of 5G and Beyond networks, new technologies such as federated learning and blockchain emerged. They allow training machine learning models with decentralized data while preserving its privacy by design. This work investigates the possibilities enabled by federated learning concerning IoT malware detection and studies security issues inherent to this new learning paradigm. In this context, a framework that uses federated learning to detect malware affecting IoT devices is presented. N-BaIoT, a dataset modeling network traffic of several real IoT devices while affected by malware, has been used to evaluate the proposed framework. Both supervised and unsupervised federated models (multi-layer perceptron and autoencoder) able to detect malware affecting seen and unseen IoT devices of N-BaIoT have been trained and evaluated. Furthermore, their performance has been compared to two traditional approaches. The first one lets each participant locally train a model using only its own data, while the second consists of making the participants share their data with a central entity in charge of training a global model. This comparison has shown that the use of more diverse and large data, as done in the federated and centralized methods, has a considerable positive impact on the model performance. Besides, the federated models, while preserving the participant’s privacy, show similar results as the centralized ones. As an additional contribution and to measure the robustness of the federated approach, an adversarial setup with several malicious participants poisoning the federated model has been considered. The baseline model aggregation averaging step used in most federated learning algorithms appears highly vulnerable to different attacks, even with a single adversary. The performance of other model aggregation functions acting as countermeasures is thus evaluated under the same attack scenarios. These functions provide a significant improvement against malicious participants, but more efforts are still needed to make federated approaches robust.

Robust Federated Learning Based on Metrics Learning and Unsupervised Clustering for Malicious Data Detection

FedDMC: Efficient and Robust Federated Learning via Detecting Malicious Clients

Learning to Detect Malicious Clients for Robust Federated Learning

Distinguishing Good from Bad: Distributed-Collaborative-Representation-Based Data Fraud Detection in Federated Learning.

Privacy-Preserving Federated Learning Against Label-Flipping Attacks on Non-IID Data

Robust Hierarchical Federated Learning with Anomaly Detection in Cloud-Edge-End Cooperation Networks

Provably Secure Federated Learning against Malicious Clients

Federated learning for malware detection in IoT devices

A Knowledge Transfer-based Semi-Supervised Federated Learning for IoT Malware Detection

Enabling Privacy-Preserving Cyber Threat Detection with Federated Learning

FedCC: Robust Federated Learning against Model Poisoning Attacks

Robust federated learning with voting and scaling

Robust Federated Learning: Maximum Correntropy Aggregation Against Byzantine Attacks

FedRMA: A Robust Federated Learning Resistant to Multiple Poisoning Attacks

Fed-Credit: Robust Federated Learning with Credibility Management

Mitigation of a poisoning attack in federated learning by using historical distance detection

Symbolic analysis meets federated learning to enhance malware identifier

Robust Federated Training via Collaborative Machine Teaching using Trusted Instances

Safe: Synergic Data Filtering for Federated Learning in Cloud-Edge Computing.

Mitigating Malicious Attacks in Federated Learning via Confidence-aware Defense

Robust Federated Learning with Noisy Labeled Data Through Loss Function Correction