Abstract:Although adversarial examples and model robustness have been extensively studied in the context of linear models and neural networks, research on this issue in tree-based models and how to make tree-based models robust against adversarial examples is still limited. In this paper, we show that tree based models are also vulnerable to adversarial examples and develop a novel algorithm to learn robust trees. At its core, our method aims to optimize the performance under the worst-case perturbation of input features, which leads to a max-min saddle point problem. Incorporating this saddle point objective into the decision tree building procedure is non-trivial due to the discrete nature of trees --- a naive approach to finding the best split according to this saddle point objective will take exponential time. To make our approach practical and scalable, we propose efficient tree building algorithms by approximating the inner minimizer in this saddle point problem, and present efficient implementations for classical information gain based trees as well as state-of-the-art tree boosting models such as XGBoost. Experimental results on real world datasets demonstrate that the proposed algorithms can substantially improve the robustness of tree-based models against adversarial examples.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: the vulnerability of decision - tree models in the face of adversarial examples. Although adversarial examples and model robustness have been widely studied in deep - learning models (such as neural networks), research on this problem in tree - based models is still limited. This paper shows that tree - based models are also vulnerable to adversarial examples and proposes a new algorithm to learn robust decision trees. Specifically, the article points out that traditional tree - based models (such as decision trees and gradient - boosted decision trees) perform well in many application scenarios, but they are very fragile when facing adversarial attacks. The authors aim to improve the robustness of these models by optimizing the performance in the worst - case scenario, thus solving this problem. To this end, they introduce a max - min saddle point problem and propose effective tree - construction algorithms to deal with this problem. ### Key Formulas 1. **Definition of Adversarial Examples**: \[ B_\infty^\epsilon(x_i) := [x_i^{(1)} - \epsilon, x_i^{(1)} + \epsilon] \times \cdots \times [x_i^{(d)} - \epsilon, x_i^{(d)} + \epsilon] \] where \( x_i \) is the input feature vector and \( \epsilon \) is the perturbation radius. 2. **Robust Score Function**: \[ RS(j, \eta, I) := \min_{I'=\{(x'_i, y_i)\}} S(j, \eta, I') \] such that \( x'_i \in B_\infty^\epsilon(x_i) \) for all \( x'_i \in I' \). 3. **Robust Split of Information Gain**: \[ S(j, \eta, I) := IG(j, \eta) = H(y) - H(y | x^{(j)} < \eta) \] where \( H(\cdot) \) and \( H(\cdot|\cdot) \) represent entropy and conditional entropy respectively. 4. **Robust Split of GBDT Model**: \[ S(j, \eta, I) := \frac{1}{2} \left[ \frac{\left( \sum_{i \in I_L} g_i \right)^2}{\sum_{i \in I_L} h_i + \lambda} + \frac{\left( \sum_{i \in I_R} g_i \right)^2}{\sum_{i \in I_R} h_i + \lambda} - \frac{\left( \sum_{i \in I} g_i \right)^2}{\sum_{i \in I} h_i + \lambda} \right] - \gamma \] Through these methods, the authors have successfully improved the robustness of tree - based models under adversarial - example attacks.

Robust Decision Trees Against Adversarial Examples

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

Robustness of Deep Learning Models on Graphs: A Survey

Extracting Robust Models with Uncertain Examples

Towards Desirable Decision Boundary by Moderate-Margin Adversarial Training

GAAT: Group Adaptive Adversarial Training to Improve the Trade-Off Between Robustness and Accuracy

Fast Provably Robust Decision Trees and Boosting

Coevolutionary Algorithm for Building Robust Decision Trees under Minimax Regret

ROBY: Evaluating the adversarial robustness of a deep model by its decision boundaries

Robust optimization for adversarial learning with finite sample complexity guarantees

Genetic Adversarial Training of Decision Trees

Splitting the Difference on Adversarial Training

Beyond Robustness: Resilience Verification of Tree-Based Classifiers

Adversarial Distributional Training for Robust Deep Learning

On Model Robustness Against Adversarial Examples

Toward Adversarial Robustness via Semi-supervised Robust Training

Robust Loss Functions for Training Decision Trees with Noisy Labels

Deep Repulsive Prototypes for Adversarial Robustness

Self-Progressing Robust Training

Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space.

Precise Tradeoffs in Adversarial Training for Linear Regression