Abstract:Insider threats usually occur from within the workplace, where the attacker is an entity closely associated with the organization. The sequence of actions the entities take on the resources to which they have access rights allows us to identify the insiders. Insider Threat Detection (ITD) using Machine Learning (ML)-based approaches gained attention in the last few years. However, most techniques employed centralized ML methods to perform such an ITD. Organizations operating from multiple locations cannot contribute to the centralized models as the data is generated from various locations. In particular, the user behavior data, which is the primary source of ITD, cannot be shared among the locations due to privacy concerns. Additionally, the data distributed across various locations result in extreme class imbalance due to the rarity of attacks. Federated Learning (FL), a distributed data modeling paradigm, gained much interest recently. However, FL-enabled ITD is not yet explored, and it still needs research to study the significant issues of its implementation in practical settings. As such, our work investigates an FL-enabled multiclass ITD paradigm that considers non-Independent and Identically Distributed (non-IID) data distribution to detect insider threats from different locations (clients) of an organization. Specifically, we propose a Federated Adversarial Training (FedAT) approach using a generative model to alleviate the extreme data skewness arising from the non-IID data distribution among the clients. Besides, we propose to utilize a Self-normalized Neural Network-based Multi-Layer Perceptron (SNN-MLP) model to improve ITD. We perform comprehensive experiments and compare the results with the benchmarks to manifest the enhanced performance of the proposed FedATdriven ITD scheme.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of Insider Threat Detection (ITD) in a distributed environment. Specifically, it addresses the following problems: 1. **Data privacy and security**: - In organizations distributed across multiple locations, user behavior data is the main source of ITD, but this data cannot be shared between different locations for privacy and security reasons. - Centralized machine - learning methods require all data to be centralized in one location for processing, which will lead to the risk of privacy leakage. 2. **Uneven data distribution (non - IID data)**: - The data distribution in different locations may be very different, resulting in extreme class imbalance problems. For example, some malicious behaviors may only occur in specific locations, while being rare or non - existent in other locations. - Such uneven data distribution will seriously affect the performance of the model because traditional federated learning methods have difficulty handling non - independent and identically distributed (non - IID) data. 3. **Data scarcity**: - Internal threat events are relatively rare, resulting in scarce training data. In particular, on each individual client, the amount of data may not be sufficient to train an effective model. To solve these problems, the author proposes a method based on Federated Learning (FL) and Adversarial Training (AT) - FedAT (Federated Adversarial Training). This method improves ITD in the following ways: - **Federated Learning (FL)**: Allows multiple clients to collaboratively train a model without sharing the original data, thereby protecting data privacy. - **Adversarial Training (AT)**: Utilizes Generative Adversarial Networks (GAN) to generate synthetic data to alleviate data scarcity and class imbalance problems and improve the robustness and generalization ability of the model. - **Self - normalized Neural Network - based Multi - Layer Perceptron (SNN - MLP)**: Used to improve the performance of the classifier and enhance the model's adaptability to different data distributions. Through these techniques, FedAT can effectively improve the performance of distributed ITD while protecting privacy.

FedAT: Federated Adversarial Training for Distributed Insider Threat Detection

Ensemble Federated Adversarial Training with Non-IID data

A Federated Learning Approach for Multi-stage Threat Analysis in Advanced Persistent Threat Campaigns

Cyber Threat Intelligence Sharing Scheme Based on Federated Learning for Network Intrusion Detection

Federated transfer learning for attack detection for Internet of Medical Things

Federated Learning for Intrusion Detection System: Concepts, Challenges and Future Directions

XFedHunter: An Explainable Federated Learning Framework for Advanced Persistent Threat Detection in SDN

FedSBS: Federated-Learning participant-selection method for Intrusion Detection Systems

AIDTF: Adversarial training framework for network intrusion detection

A Cluster-Driven Adaptive Training Approach for Federated Learning

Adversarial training in communication constrained federated learning

FedDMC: Efficient and Robust Federated Learning via Detecting Malicious Clients

Distributed Malicious Traffic Detection

Federated Deep Learning for Intrusion Detection in IoT Networks

Fedward: Flexible Federated Backdoor Defense Framework with Non-IID Data

ARFED: Attack-Resistant Federated averaging based on outlier elimination

Data analysis algorithm for internet of things based on federated learning with optical technology

TrustFed: A Reliable Federated Learning Framework with Malicious-Attack Resistance

FedMADE: Robust Federated Learning for Intrusion Detection in IoT Networks Using a Dynamic Aggregation Method

FAT: Federated Adversarial Training

Network-Level Adversaries in Federated Learning