Abstract:Machine learning-based systems have presented increasing learning performance, in a wide variety of tasks. However, the problem with some state-of-the-art models is their lack of transparency, trustworthiness, and explainability. To address this problem, eXplainable Artificial Intelligence (XAI) appeared. It is a research field that aims to make black-box models more understandable to humans. The research on this topic has increased in recent years, and many methods, such as LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations) have been proposed. Machine learning-based Intrusion Detection Systems (IDS) are one of the many application domains of XAI. However, most of the works about model interpretation focus on other fields, like computer vision, natural language processing, biology, healthcare, etc. This poses a challenge for cybersecurity professionals tasked with analyzing IDS results, thereby impeding their capacity to make informed decisions. In an attempt to address this problem, we have selected two XAI methods, LIME, and SHAP. Using the methods, we have retrieved explanations for the results of a black-box model, part of an IDS solution that performs intrusion detection on IoT devices, increasing its interpretability. In order to validate the explanations, we carried out a perturbation analysis where we tried to obtain a different classification based on the features present in the explanations. With the explanations and the perturbation analysis we were able to draw conclusions about the negative impact of particular features on the model results when present in the input data, making it easier for cybersecurity experts when analyzing the model results and it serves as an aid to the continuous improvement the model. The perturbations also serve as a comparison of performance between LIME and SHAP. To evaluate the degree of interpretability increase, and the explanations provided by each XAI method of the model and directly compare the XAI methods, we have performed a survey analysis.

Explanation Leaks: Explanation-guided Model Extraction Attacks

Extracting Robust Models with Uncertain Examples

AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Interpretable Models

Privacy Implications of Explainable AI in Data-Driven Systems

AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Decision Tree Models

MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI

Knowledge Distillation-Based Model Extraction Attack using GAN-based Private Counterfactual Explanations

A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures

Privacy-preserving explainable AI: a survey

Thief, Beware of What Get You There: Towards Understanding Model Extraction Attack

Inferring Sensitive Attributes from Model Explanations

A Comprehensive Analysis of Explainable AI for Malware Hunting

Adversarial attacks and defenses in explainable artificial intelligence: A survey

XAI and Android Malware Models

Black-box Attacks on Image Activity Prediction and its Natural Language Explanations

The privacy issue of counterfactual explanations: explanation linkage attacks

Defense Against Explanation Manipulation

Distance-Restricted Explanations: Theoretical Underpinnings & Efficient Implementation

Interpretability and Transparency of Machine Learning in File Fragment Analysis with Explainable Artificial Intelligence

XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution

Explainable AI for Intrusion Detection Systems: LIME and SHAP Applicability on Multi-Layer Perceptron