Robust Decision Trees Against Adversarial Examples

Hongge Chen,Huan Zhang,Duane Boning,Cho-Jui Hsieh
DOI: https://doi.org/10.48550/arXiv.1902.10660
2019-06-11
Abstract:Although adversarial examples and model robustness have been extensively studied in the context of linear models and neural networks, research on this issue in tree-based models and how to make tree-based models robust against adversarial examples is still limited. In this paper, we show that tree based models are also vulnerable to adversarial examples and develop a novel algorithm to learn robust trees. At its core, our method aims to optimize the performance under the worst-case perturbation of input features, which leads to a max-min saddle point problem. Incorporating this saddle point objective into the decision tree building procedure is non-trivial due to the discrete nature of trees --- a naive approach to finding the best split according to this saddle point objective will take exponential time. To make our approach practical and scalable, we propose efficient tree building algorithms by approximating the inner minimizer in this saddle point problem, and present efficient implementations for classical information gain based trees as well as state-of-the-art tree boosting models such as XGBoost. Experimental results on real world datasets demonstrate that the proposed algorithms can substantially improve the robustness of tree-based models against adversarial examples.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: the vulnerability of decision - tree models in the face of adversarial examples. Although adversarial examples and model robustness have been widely studied in deep - learning models (such as neural networks), research on this problem in tree - based models is still limited. This paper shows that tree - based models are also vulnerable to adversarial examples and proposes a new algorithm to learn robust decision trees. Specifically, the article points out that traditional tree - based models (such as decision trees and gradient - boosted decision trees) perform well in many application scenarios, but they are very fragile when facing adversarial attacks. The authors aim to improve the robustness of these models by optimizing the performance in the worst - case scenario, thus solving this problem. To this end, they introduce a max - min saddle point problem and propose effective tree - construction algorithms to deal with this problem. ### Key Formulas 1. **Definition of Adversarial Examples**: \[ B_\infty^\epsilon(x_i) := [x_i^{(1)} - \epsilon, x_i^{(1)} + \epsilon] \times \cdots \times [x_i^{(d)} - \epsilon, x_i^{(d)} + \epsilon] \] where \( x_i \) is the input feature vector and \( \epsilon \) is the perturbation radius. 2. **Robust Score Function**: \[ RS(j, \eta, I) := \min_{I'=\{(x'_i, y_i)\}} S(j, \eta, I') \] such that \( x'_i \in B_\infty^\epsilon(x_i) \) for all \( x'_i \in I' \). 3. **Robust Split of Information Gain**: \[ S(j, \eta, I) := IG(j, \eta) = H(y) - H(y | x^{(j)} < \eta) \] where \( H(\cdot) \) and \( H(\cdot|\cdot) \) represent entropy and conditional entropy respectively. 4. **Robust Split of GBDT Model**: \[ S(j, \eta, I) := \frac{1}{2} \left[ \frac{\left( \sum_{i \in I_L} g_i \right)^2}{\sum_{i \in I_L} h_i + \lambda} + \frac{\left( \sum_{i \in I_R} g_i \right)^2}{\sum_{i \in I_R} h_i + \lambda} - \frac{\left( \sum_{i \in I} g_i \right)^2}{\sum_{i \in I} h_i + \lambda} \right] - \gamma \] Through these methods, the authors have successfully improved the robustness of tree - based models under adversarial - example attacks.