Evasion and Hardening of Tree Ensemble Classifiers

Alex Kantchelian,J. D. Tygar,Anthony D. Joseph
DOI: https://doi.org/10.48550/arXiv.1509.07892
2016-05-27
Abstract:Classifier evasion consists in finding for a given instance $x$ the nearest instance $x'$ such that the classifier predictions of $x$ and $x'$ are different. We present two novel algorithms for systematically computing evasions for tree ensembles such as boosted trees and random forests. Our first algorithm uses a Mixed Integer Linear Program solver and finds the optimal evading instance under an expressive set of constraints. Our second algorithm trades off optimality for speed by using symbolic prediction, a novel algorithm for fast finite differences on tree ensembles. On a digit recognition task, we demonstrate that both gradient boosted trees and random forests are extremely susceptible to evasions. Finally, we harden a boosted tree model without loss of predictive accuracy by augmenting the training set of each boosting round with evading instances, a technique we call adversarial boosting.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?