Abstract:We present DeforestVis, a visual analytics tool that offers summarization of the behaviour of complex ML models by providing AdaBoost‐based surrogate decision stumps. Our proposed tool helps users explore the complexity versus fidelity trade‐off, create attribute‐based explanations with weighted stumps, and analyse the impact of rule overriding. As the complexity of machine learning (ML) models increases and their application in different (and critical) domains grows, there is a strong demand for more interpretable and trustworthy ML. A direct, model‐agnostic, way to interpret such models is to train surrogate models—such as rule sets and decision trees—that sufficiently approximate the original ones while being simpler and easier‐to‐explain. Yet, rule sets can become very lengthy, with many if–else statements, and decision tree depth grows rapidly when accurately emulating complex ML models. In such cases, both approaches can fail to meet their core goal—providing users with model interpretability. To tackle this, we propose DeforestVis, a visual analytics tool that offers summarization of the behaviour of complex ML models by providing surrogate decision stumps (one‐level decision trees) generated with the Adaptive Boosting (AdaBoost) technique. DeforestVis helps users to explore the complexity versus fidelity trade‐off by incrementally generating more stumps, creating attribute‐based explanations with weighted stumps to justify decision making, and analysing the impact of rule overriding on training instance allocation between one or more stumps. An independent test set allows users to monitor the effectiveness of manual rule changes and form hypotheses based on case‐by‐case analyses. We show the applicability and usefulness of DeforestVis with two use cases and expert interviews with data analysts and model developers.

Approximating XGBoost with an interpretable decision tree

Explainable decision forest: Transforming a decision forest into an interpretable tree

Implementing local-explainability in Gradient Boosting Trees: Feature Contribution

Inherently Interpretable Tree Ensemble Learning

An Explainable Bayesian Decision Tree Algorithm

GPTree: Towards Explainable Decision-Making via LLM-powered Decision Trees

Distillation Decision Tree

Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation

Interactive Decision Tree Creation and Enhancement with Complete Visualization for Explainable Modeling

Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

Learning accurate and interpretable decision trees

DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps

DeforestVis: Behaviour Analysis of Machine Learning Models with Surrogate Decision Stumps

Interpreting Models via Single Tree Approximation

Fast Interpretable Greedy-Tree Sums

Optimal Interpretability-Performance Trade-off of Classification Trees with Black-Box Reinforcement Learning

Dive into Decision Trees and Forests: A Theoretical Demonstration

GBDT4CTRVis: visual analytics of gradient boosting decision tree for advertisement click-through rate prediction

Interpretability as Approximation: Understanding Black-Box Models by Decision Boundary

Concept Tree: High-Level Representation of Variables for More Interpretable Surrogate Decision Trees

A Theory of Interpretable Approximations