Practical considerations for variable screening in the Super Learner

Brian D. Williamson,Drew King,Ying Huang
2023-11-07
Abstract:Estimating a prediction function is a fundamental component of many data analyses. The Super Learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms, including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a Super Learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screening algorithms should be used to protect against poor performance of any one screen, similar to the guidance for choosing a library of prediction algorithms for the Super Learner.
Machine Learning
What problem does this paper attempt to address?
The paper primarily explores the issue of using variable selection algorithms in Super Learner (an ensemble learning method), with a particular focus on the performance of lasso regression as a selection tool and its impact on overall predictive performance. Specifically, the paper aims to address the following key questions: 1. **Exploring the performance of lasso in Super Learner**: The researchers want to understand whether lasso, known to perform poorly in certain situations, negatively impacts the overall performance of Super Learner. 2. **Evaluating the effectiveness of different selection algorithms**: Through experiments, the study compares different variable selection methods (including lasso, rank-based correlation selection, univariate correlation-based selection, and random forests) to determine which selection strategies can improve predictive accuracy. 3. **Proposing a diverse combination of selection algorithms**: Given that a single selection method may perform poorly in specific scenarios, the authors suggest using multiple selection algorithms to construct the Super Learner, thereby protecting the model from the potential adverse effects of any single selection method. The paper validates these hypotheses through a series of numerical experiments, covering different types of variable relationships (linear and nonlinear), different feature correlations (correlated or uncorrelated), varying numbers of features (from low-dimensional to high-dimensional cases), and different sample sizes. The experimental results indicate that using lasso alone for selection leads to poor predictive performance in nonlinear relationships; however, if the Super Learner includes a rich set of candidate selection algorithms, the inclusion of lasso does not significantly degrade performance. This suggests that researchers should consider using a diverse library of selection algorithms when constructing a Super Learner.