Are Thousands of Samples Really Needed to Generate Robust Gene-List for Prediction of Cancer Outcome?

Royi Jacobovic
DOI: https://doi.org/10.48550/arXiv.1701.03159
2016-12-26
Applications
Abstract:The prediction of cancer prognosis and metastatic potential immediately after the initial diagnoses is a major challenge in current clinical research. The relevance of such a signature is clear, as it will free many patients from the agony and toxic side-effects associated with the adjuvant chemotherapy automatically and sometimes carelessly subscribed to them. Motivated by this issue, Ein-Dor (2006) and Zuk (2007) presented a Bayesian model which leads to the following conclusion: Thousands of samples are needed to generate a robust gene list for predicting outcome. This conclusion is based on existence of some statistical assumptions. The current work raises doubts over this determination by showing that: (1) These assumptions are not consistent with additional assumptions such as sparsity and Gaussianity. (2) The empirical Bayes methodology which was suggested in order to test the relevant assumptions doesn't detect severe violations of the model assumptions and consequently an overestimation of the required sample size might be incurred.
What problem does this paper attempt to address?