Policy Trees for Prediction: Interpretable and Adaptive Model Selection for Machine Learning

Dimitris Bertsimas,Matthew Peroni
2024-05-31
Abstract:As a multitude of capable machine learning (ML) models become widely available in forms such as open-source software and public APIs, central questions remain regarding their use in real-world applications, especially in high-stakes decision-making. Is there always one best model that should be used? When are the models likely to be error-prone? Should a black-box or interpretable model be used? In this work, we develop a prescriptive methodology to address these key questions, introducing a tree-based approach, Optimal Predictive-Policy Trees (OP2T), that yields interpretable policies for adaptively selecting a predictive model or ensemble, along with a parameterized option to reject making a prediction. We base our methods on learning globally optimized prescriptive trees. Our approach enables interpretable and adaptive model selection and rejection while only assuming access to model outputs. By learning policies over different feature spaces, including the model outputs, our approach works with both structured and unstructured datasets. We evaluate our approach on real-world datasets, including regression and classification tasks with both structured and unstructured data. We demonstrate that our approach provides both strong performance against baseline methods while yielding insights that help answer critical questions about which models to use, and when.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of selection and use of machine - learning models in practical applications, especially in high - risk decision - making. Specifically, the authors attempt to answer the following key questions: 1. **Given a set of models, which one should we use?** - The paper explores how to select the most suitable model to make predictions according to the characteristics of the input data. 2. **Should we use model ensemble?** - The authors study under what circumstances the combination of multiple models (i.e., ensemble) will perform better than a single model. 3. **Under what circumstances are models prone to error?** - By analyzing the model performance in different feature spaces, identify the areas where the model may perform poorly. 4. **Should we use black - box models or interpretable models?** - Explore the advantages and disadvantages of choosing black - box models or interpretable models in different application scenarios. To solve these problems, the authors introduce a tree - based method - **Optimal Predictive - Policy Trees (OP2T)**. This method can adaptively select the most appropriate prediction model or ensemble and can refuse to make predictions according to the situation. OP2T constructs a decision tree through global optimization, making the model selection process both interpretable and adaptable. Moreover, this method only depends on the model output and does not require further training of the model, so it is suitable for structured and unstructured data sets. ### Method overview - **Problem definition**: Suppose we have a set of models \(\{h_1, h_2,\ldots, h_m\}\), and the output of each model \(h_i\) can be a classification result or a regression value. We need to select a model or a model - integrated \(w^T h(x)\) to make a prediction according to the input data \(x\). - **Reward function**: For classification tasks, use cross - entropy loss \(R_{CE}(x_j, y_j, h_i)\) or misclassification rate \(R_{MIS}(x_j, y_j, h_i)\) as the reward function; for regression tasks, use mean - square error \(R_{SE}(x_j, y_j, h_i)\) as the reward function. - **Decision - tree construction**: By learning the optimal decision tree, maximize the total reward. Specifically, for each leaf node \(l\), select the model or model - integrated weight \(w_l\) that maximizes the reward. - **Rejection option**: Introduce a virtual rejection model \(h_r\). When the performance of all candidate models is not satisfactory, we can choose to refuse to make a prediction. The reward of the rejection model can be adjusted by setting the parameter \(\alpha\). ### Experiments and applications The authors conducted experiments on multiple real - world data sets, including classification and regression tasks, and demonstrated the performance advantages of the OP2T method and the interpretability it provides. For example, in the case of hurricane forecasting, OP2T can select the most suitable model combination for prediction according to features such as wind speed and air pressure, thereby improving the accuracy and robustness of the forecast. In conclusion, this paper provides a systematic method to help users make more informed choices when facing multiple machine - learning models and can avoid using unreliable models in some cases.