Abstract:Label manipulation attacks are a subclass of data poisoning attacks in adversarial machine learning used against different applications, such as malware detection. These types of attacks represent a serious threat to detection systems in environments having high noise rate or uncertainty, such as complex networks and Internet of Thing (IoT). Recent work in the literature has suggested using the $K$-Nearest Neighboring (KNN) algorithm to defend against such attacks. However, such an approach can suffer from low to wrong detection accuracy. In this paper, we design an architecture to tackle the Android malware detection problem in IoT systems. We develop an attack mechanism based on Silhouette clustering method, modified for mobile Android platforms. We proposed two Convolutional Neural Network (CNN)-type deep learning algorithms against this \emph{Silhouette Clustering-based Label Flipping Attack (SCLFA)}. We show the effectiveness of these two defense algorithms - \emph{Label-based Semi-supervised Defense (LSD)} and \emph{clustering-based Semi-supervised Defense (CSD)} - in correcting labels being attacked. We evaluate the performance of the proposed algorithms by varying the various machine learning parameters on three Android datasets: Drebin, Contagio, and Genome and three types of features: API, intent, and permission. Our evaluation shows that using random forest feature selection and varying ratios of features can result in an improvement of up to 19\% accuracy when compared with the state-of-the-art method in the literature.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to propose effective defense methods against Label Flipping Attacks in malware detection systems. Specifically, label flipping attacks are a subclass of data poisoning attacks. Attackers mislead machine - learning models by tampering with the labels in the training data, thereby degrading their performance. Such attacks are particularly severe in complex network environments and Internet of Things (IoT) systems because there is a high noise rate or uncertainty in these environments. ### Main Problem Description in the Paper 1. **The Hazards of Label Flipping Attacks**: Label flipping attacks can significantly reduce the classification performance of machine - learning models by changing the labels of the training data, even if the attacker's other capabilities are limited. 2. **The Deficiencies of Existing Defense Methods**: Existing defense methods such as the KNN algorithm can be used to relabel samples, but they are not effective in dealing with label flipping attacks, especially when facing complex data sets. 3. **The Vulnerability of Deep - Learning Models**: Although deep neural networks (DNNs) perform well in classification tasks, they are very sensitive to label flipping attacks, resulting in a decline in accuracy. ### Goals of the Paper To address the above problems, the paper proposes the following goals: - Design an architecture to learn the flipped - label data and improve the robustness of the malware detection system. - Propose a label - flipping - attack method based on Silhouette Clustering to evaluate the vulnerability of existing systems. - Introduce two semi - supervised defense methods based on deep learning (LSD and CSD) to correct the attacked labels and improve classification accuracy. ### Main Contributions 1. **Proposed a New Attack Model**: Label Flipping Attack based on Silhouette Clustering (SCLFA), which selects appropriate samples for label flipping to deceive classification algorithms. 2. **Developed Two Defense Algorithms**: - **Label - based Semi - supervised Defense (LSD)**: Combines the Label Propagation (LP) and Label Spreading (LS) algorithms to predict and correct the flipped labels. - **Clustering - based Semi - supervised Defense (CSD)**: A semi - supervised defense method based on clustering algorithms, which uses four clustering metrics and validation data to relabel the contaminated labels. 3. **Experimental Verification**: Experiments were carried out on three real Android data sets (Drebin, Contagio, Genome), and the results show that the proposed defense methods improve the accuracy by up to 19% compared with the existing methods. ### Conclusion By proposing new attack and defense methods, this paper effectively solves the problem of label flipping attacks in malware detection systems and improves the robustness and accuracy of the system.

On Defending Against Label Flipping Attacks on Malware Detection Systems

Analysis of Label-Flip Poisoning Attack on Machine Learning Based Malware Detector

Privacy-Preserving Federated Learning Against Label-Flipping Attacks on Non-IID Data

Mitigating Label Flipping Attacks in Malicious URL Detectors Using Ensemble Trees

Automated Poisoning Attacks and Defenses in Malware Detection Systems: An Adversarial Machine Learning Approach

Label Sanitization against Label Flipping Poisoning Attacks

Fast Adversarial Label-Flipping Attack on Tabular Data

Rethinking Label Flipping Attack: From Sample Masking to Sample Thresholding

Artificial Intelligence Algorithms for Malware Detection in Android-Operated Mobile Devices

Label Poisoning is All You Need

Query-efficient label-only attacks against black-box machine learning models

Can Machine Learning Model with Static Features be Fooled: an Adversarial Machine Learning Approach

A malware detection system using a hybrid approach of multi-heads attention-based control flow traces and image visualization

hybrid-Falcon: Hybrid Pattern Malware Detection and Categorization with Network Traffic and Program Code

MalWhiteout: Reducing Label Errors in Android Malware Detection.

Defending against label-flipping attacks in federated learning systems using uniform manifold approximation and projection

A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping Attacks

RecMaL: Rectify the malware family label via hybrid analysis

Label flipping attacks against Naive Bayes on spam filtering systems

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

LFighter: Defending against the label-flipping attack in federated learning