FedAT: Federated Adversarial Training for Distributed Insider Threat Detection

R G Gayathri,Atul Sajjanhar,Md Palash Uddin,Yong Xiang
2024-09-20
Abstract:Insider threats usually occur from within the workplace, where the attacker is an entity closely associated with the organization. The sequence of actions the entities take on the resources to which they have access rights allows us to identify the insiders. Insider Threat Detection (ITD) using Machine Learning (ML)-based approaches gained attention in the last few years. However, most techniques employed centralized ML methods to perform such an ITD. Organizations operating from multiple locations cannot contribute to the centralized models as the data is generated from various locations. In particular, the user behavior data, which is the primary source of ITD, cannot be shared among the locations due to privacy concerns. Additionally, the data distributed across various locations result in extreme class imbalance due to the rarity of attacks. Federated Learning (FL), a distributed data modeling paradigm, gained much interest recently. However, FL-enabled ITD is not yet explored, and it still needs research to study the significant issues of its implementation in practical settings. As such, our work investigates an FL-enabled multiclass ITD paradigm that considers non-Independent and Identically Distributed (non-IID) data distribution to detect insider threats from different locations (clients) of an organization. Specifically, we propose a Federated Adversarial Training (FedAT) approach using a generative model to alleviate the extreme data skewness arising from the non-IID data distribution among the clients. Besides, we propose to utilize a Self-normalized Neural Network-based Multi-Layer Perceptron (SNN-MLP) model to improve ITD. We perform comprehensive experiments and compare the results with the benchmarks to manifest the enhanced performance of the proposed FedATdriven ITD scheme.
Cryptography and Security,Artificial Intelligence,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of Insider Threat Detection (ITD) in a distributed environment. Specifically, it addresses the following problems: 1. **Data privacy and security**: - In organizations distributed across multiple locations, user behavior data is the main source of ITD, but this data cannot be shared between different locations for privacy and security reasons. - Centralized machine - learning methods require all data to be centralized in one location for processing, which will lead to the risk of privacy leakage. 2. **Uneven data distribution (non - IID data)**: - The data distribution in different locations may be very different, resulting in extreme class imbalance problems. For example, some malicious behaviors may only occur in specific locations, while being rare or non - existent in other locations. - Such uneven data distribution will seriously affect the performance of the model because traditional federated learning methods have difficulty handling non - independent and identically distributed (non - IID) data. 3. **Data scarcity**: - Internal threat events are relatively rare, resulting in scarce training data. In particular, on each individual client, the amount of data may not be sufficient to train an effective model. To solve these problems, the author proposes a method based on Federated Learning (FL) and Adversarial Training (AT) - FedAT (Federated Adversarial Training). This method improves ITD in the following ways: - **Federated Learning (FL)**: Allows multiple clients to collaboratively train a model without sharing the original data, thereby protecting data privacy. - **Adversarial Training (AT)**: Utilizes Generative Adversarial Networks (GAN) to generate synthetic data to alleviate data scarcity and class imbalance problems and improve the robustness and generalization ability of the model. - **Self - normalized Neural Network - based Multi - Layer Perceptron (SNN - MLP)**: Used to improve the performance of the classifier and enhance the model's adaptability to different data distributions. Through these techniques, FedAT can effectively improve the performance of distributed ITD while protecting privacy.