A survey and taxonomy of methods interpreting random forest models

Maissae Haddouchi,Abdelaziz Berrado
2024-07-18
Abstract:The interpretability of random forest (RF) models is a research topic of growing interest in the machine learning (ML) community. In the state of the art, RF is considered a powerful learning ensemble given its predictive performance, flexibility, and ease of use. Furthermore, the inner process of the RF model is understandable because it uses an intuitive and intelligible approach for building the RF decision tree ensemble. However, the RF resulting model is regarded as a "black box" because of its numerous deep decision trees. Gaining visibility over the entire process that induces the final decisions by exploring each decision tree is complicated, if not impossible. This complexity limits the acceptance and implementation of RF models in several fields of application. Several papers have tackled the interpretation of RF models. This paper aims to provide an extensive review of methods used in the literature to interpret RF resulting models. We have analyzed these methods and classified them based on different axes. Although this review is not exhaustive, it provides a taxonomy of various techniques that should guide users in choosing the most appropriate tools for interpreting RF models, depending on the interpretability aspects sought. It should also be valuable for researchers who aim to focus their work on the interpretability of RF or ML black boxes in general.
Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is the interpretability of the Random Forest (RF) model. Although the Random Forest model is considered a powerful machine learning ensemble method due to its predictive performance, flexibility, and ease of use, the final model is often viewed as a "black box" because it contains numerous deep decision trees, making its internal decision-making process difficult to understand. This complexity limits the acceptance and implementation of the Random Forest model in various application domains. The goal of the paper is to provide a comprehensive review of the various methods used to interpret the Random Forest model in the literature and to classify these methods according to different axes (such as the stage of interpretation, the objective of interpretation, the type of problem being addressed, the input-output format, the methodology used to provide explanations, and the programming languages used for implementation). Through such an analysis, the authors hope to guide users in selecting the most suitable interpretation tools for their needs, while also providing valuable references for researchers to further study the interpretability of Random Forest or machine learning models.