Fast Linear Model Trees by PILOT

Jakob Raymaekers,Peter J. Rousseeuw,Tim Verdonck,Ruicong Yao
DOI: https://doi.org/10.1007/s10994-024-06590-3
2023-02-08
Abstract:Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an $L^2$ boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for $PI$ecewise $L$inear $O$rganic $T$ree, where `organic' refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.
Machine Learning,Methodology
What problem does this paper attempt to address?
The main problem this paper attempts to address is the high computational cost, susceptibility to overfitting, and large extrapolation errors of existing Linear Model Trees (LMT) algorithms when handling large-scale datasets. Specifically: 1. **High computational cost**: Most existing linear model tree methods require multiple regression fittings at leaf or internal nodes, introducing a time complexity of \(O(p^2)\) or \(O(p^3)\), making these methods not scalable for large-scale datasets. 2. **Overfitting problem**: Existing linear model tree methods are more prone to overfitting compared to standard regression trees, especially when the dataset is small or has many features. 3. **Large extrapolation errors**: Existing linear model tree methods may exhibit large extrapolation errors in the test data, particularly when some predictor variable values in the test data fall outside the range of the training data. To address these issues, the paper proposes a new linear model tree algorithm—PILOT (PIecewise Linear Organic Tree). The main features of PILOT include: - **Fast speed**: PILOT has the same low time complexity as CART but does not require pruning. - **Regularization**: A model selection procedure is applied at each node to select the best linear model without additional computational complexity. - **High interpretability**: Since the linear models in the leaf nodes are simple, the final tree structure remains highly interpretable and feature importance can be calculated. - **Stable extrapolation**: Two truncation procedures are applied to avoid extreme fitting and extrapolation errors in both training and test data. - **Theoretical support**: PILOT has theoretical consistency under the additive model setting. When the data is generated by a linear model, PILOT achieves polynomial convergence rates, which is not guaranteed by CART. With these improvements, PILOT not only enhances the efficiency and stability of the model but also performs excellently on various benchmark datasets, outperforming other tree-based methods and other linear model tree methods.