Hardening Classifiers against Evasion: the Good, the Bad, and the Ugly
Liang Tong,Bo Li,Chen Hajaj,Chaowei Xiao,Yevgeniy Vorobeychik
2017-08-28
Abstract:Machine learning is widely used in security applications, particularly in the form of statistical classification aimed at distinguishing benign from malicious entities. Recent research has shown that such classifiers are often vulnerable to evasion attacks, whereby adversaries change behavior to be categorized as benign while preserving malicious functionality. Research into evasion attacks has followed two paradigms: attacks in problem space, where the actual malicious instance, such as the PDF file, is modified, and attacks in feature space, where the evasion attack is abstracted into directly modifying numerical features corresponding to malicious instances, rather than instances themselves. The feature space abstraction facilitates elegant mathematical modeling and analysis of evasion attacks, and has been the prevalent framework for designing evasion-robust classifiers. However, there exists no prior validation of the effectiveness of feature space threat models in representing real evasion attacks. We make several contributions to address this gap, using PDF malware detection as a case study, with four PDF malware detectors. First, we use iterative retraining to create a baseline for evasion-robust PDF malware detection by using an automated problem space attack generator in the retraining loop. Second, we use this baseline to demonstrate that replacing problem space attacks with feature space attacks may significantly reduce the robustness of the resulting classifier. Third, we demonstrate the existence of conserved (or invariant) features, show how these can be leveraged to design evasion-robust classifiers that are nearly as effective as those relying on the problem space attack, and present an approach for automatically identifying conserved features of PDF malware detectors. Finally, we evaluate generalizability of evasion defense through retraining by considering two additional evasion attacks. We show, surprisingly, that feature space retraining with conserved features can be dramatically more robust to the new attacks than classifiers retrained with the problem space model. This suggesting that when we properly account for conserved features, hardening classifiers with abstract feature space models of evasion can yield more generalizable evasion robustness than using specific problem space evasion attacks.
Computer Science