Instability of Variable-selection Algorithms Used to Identify True Predictors of an Outcome in Intermediate-dimension Epidemiologic Studies

Solène Cadiou,Rémy Slama
DOI: https://doi.org/10.1097/ede.0000000000001340
2021-02-17
Epidemiology
Abstract:Machine-learning algorithms are increasingly used in epidemiology to identify true predictors of a health outcome when many potential predictors are measured. However, these algorithms can provide different outputs when repeatedly applied to the same dataset, which can compromise research <span class="ej-keyword">reproducibility</span>. We aimed to illustrate that commonly used algorithms are unstable and, using the example of Least Absolute Shrinkage and Selection Operator (LASSO), that stabilization method choice is crucial.
public, environmental & occupational health
What problem does this paper attempt to address?