Evading Classifier in the Dark: Guiding Unpredictable Morphing Using Binary-Output Blackboxes.

Hung Dang,Yue Huang,Ee-Chien Chang
2017-01-01
Abstract:Learning-based systems have been shown to be vulnerable to adversarial data manipulation attacks. These attacks have been studied under assumptions that the adversary has certain knowledge of either the target model internals, its training dataset or at least classification scores it assigns to input samples. In this paper, we investigate a much more constrained and realistic attack scenario that does not assume any of the above mentioned knowledge. The target classifier is minimally exposed to the adversary, revealing on its final classification decision (e.g., reject or accept an input sample). Moreover, the adversary can only manipulate malicious samples using a blackbox morpher. That is, the adversary has to evade the target classifier by morphing malicious samples “in the dark”. We present a scoring mechanism that can assign a real-value score which reflects evasion progress to each sample based on limited information available. Leveraging on such scoring mechanism, we propose a hill-climbing method, dubbed EvadeHC, that operates without the help of any domain-specific knowledge, and evaluate it against two PDF malware detectors, namely PDFrate and Hidost. The experimental evaluation demonstrates that the proposed evasion attacks are effective, attaining 100% evasion rate on our dataset. Interestingly, EvadeHC outperforms the known classifier evasion technique that operates based on classification scores output by the classifiers. Although our evaluations are conducted on PDF malware classifier, the proposed approaches are domain-agnostic and is of wider application to other learning-based systems.
What problem does this paper attempt to address?