Approximating XGBoost with an interpretable decision tree

Omer Sagi,Lior Rokach
DOI: https://doi.org/10.1016/j.ins.2021.05.055
IF: 8.1
2021-09-01
Information Sciences
Abstract:<p>The increasing usage of machine-learning models in critical domains has recently stressed the necessity of interpretable machine-learning models. In areas like healthcare, finance, and judiciary - the model consumer must understand the rationale behind the model output in order to use it when making a decision. For this reason, it is impossible to use black-box models in these scenarios, regardless of their high predictive performance. Decision forests, and in particular Gradient Boosting Decision Trees (GBDT), are examples of this kind of model. GBDT models are considered the state-of-the-art in many classification challenges, reflected by the fact that the majority of Kaggle's recent winners used GBDT methods as a part of their solution (such as XGBoost). But despite their superior predictive performance, they cannot be used in tasks that require transparency. This paper presents a novel method for transforming a decision forest of any kind into an interpretable decision tree. The method extends the tool-set available for machine learning practitioners, who want to exploit the interpretability of decision trees without significantly impairing the predictive performance gained by GBDT models like XGBoost. We show in an empirical evaluation that in some cases the generated tree is able to approximate the predictive performance of a XGBoost model while enabling better transparency of the outputs.</p>
computer science, information systems
What problem does this paper attempt to address?