Why significant variables aren’t automatically good predictors

Adeline Lo,Herman Chernoff,Tian Zheng,Shaw-Hwa Lo
DOI: https://doi.org/10.1073/pnas.1518285112
IF: 11.1
2015-10-26
Proceedings of the National Academy of Sciences
Abstract:Significance A recent puzzle in the big data scientific literature is that an increase in explanatory variables found to be significantly correlated with an outcome variable does not necessarily lead to improvements in prediction. This problem occurs in both simple and complex data. We offer explanations and statistical insights into why higher significance does not automatically imply stronger predictivity and why variables with strong predictivity sometimes fail to be significant. We suggest shifting the research agenda toward searching for a criterion to locate highly predictive variables rather than highly significant variables. We offer an alternative approach, the partition retention method, which was effective in reducing prediction error from 30% to 8% on a long-studied breast cancer data set.
multidisciplinary sciences
What problem does this paper attempt to address?