Abstract:Machine learning is widely used in security applications, particularly in the form of statistical classification aimed at distinguishing benign from malicious entities. Recent research has shown that such classifiers are often vulnerable to evasion attacks, whereby adversaries change behavior to be categorized as benign while preserving malicious functionality. Research into evasion attacks has followed two paradigms: attacks in problem space, where the actual malicious instance, such as the PDF file, is modified, and attacks in feature space, where the evasion attack is abstracted into directly modifying numerical features corresponding to malicious instances, rather than instances themselves. The feature space abstraction facilitates elegant mathematical modeling and analysis of evasion attacks, and has been the prevalent framework for designing evasion-robust classifiers. However, there exists no prior validation of the effectiveness of feature space threat models in representing real evasion attacks. We make several contributions to address this gap, using PDF malware detection as a case study, with four PDF malware detectors. First, we use iterative retraining to create a baseline for evasion-robust PDF malware detection by using an automated problem space attack generator in the retraining loop. Second, we use this baseline to demonstrate that replacing problem space attacks with feature space attacks may significantly reduce the robustness of the resulting classifier. Third, we demonstrate the existence of conserved (or invariant) features, show how these can be leveraged to design evasion-robust classifiers that are nearly as effective as those relying on the problem space attack, and present an approach for automatically identifying conserved features of PDF malware detectors. Finally, we evaluate generalizability of evasion defense through retraining by considering two additional evasion attacks. We show, surprisingly, that feature space retraining with conserved features can be dramatically more robust to the new attacks than classifiers retrained with the problem space model. This suggesting that when we properly account for conserved features, hardening classifiers with abstract feature space models of evasion can yield more generalizable evasion robustness than using specific problem space evasion attacks.

On the Effectiveness of Adversarial Training on Malware Classifiers

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

On the Effectiveness of Adversarial Training Against Backdoor Attacks.

ATWM: Defense against adversarial malware based on adversarial training

How to Train your Antivirus: RL-based Hardening through the Problem-Space

Adversarial Training: A Survey

Enhancing Deep Neural Networks Against Adversarial Malware Examples.

On The Empirical Effectiveness of Unrealistic Adversarial Hardening Against Realistic Adversarial Attacks

Weighted Adaptive Perturbations Adversarial Training for Improving Robustness

Hardening Classifiers against Evasion: the Good, the Bad, and the Ugly

A2: Efficient Automated Attacker for Boosting Adversarial Training

A Framework for Enhancing Deep Neural Networks Against Adversarial Malware

Assessing Vulnerabilities of Adversarial Learning Algorithm through Poisoning Attacks

Attack and Defense of Dynamic Analysis-Based, Adversarial Neural Malware Classification Models

Strength-Adaptive Adversarial Training

Improved Adversarial Training Through Adaptive Instance-wise Loss Smoothing

Improving Adversarial Robustness in Android Malware Detection by Reducing the Impact of Spurious Correlations

Defending Against Unforeseen Failure Modes with Latent Adversarial Training

On Visual Hallmarks of Robustness to Adversarial Malware

Auditing static machine learning anti-Malware tools against metamorphic attacks

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification