Inferactive data analysis

Nan Bi,Jelena Markovic,Lucy Xia,Jonathan Taylor
DOI: https://doi.org/10.1111/sjos.12425
2019-12-10
Scandinavian Journal of Statistics
Abstract:<p>We describe <i>inferactive data analysis</i>, so‐named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey's exploratory and confirmatory data analysis allowing also for Bayesian data analysis. We see this as a useful step in concrete providing tools (with statistical guarantees) for current data scientists. The basis of inference we use is (a conditional approach to) <i>selective inference</i> , in particular its randomized form. The relevant reference distributions are constructed from what we call a DAG‐DAG – a Data Analysis Generative DAG, and a selective change of variables formula is crucial to any practical implementation of inferactive data analysis via sampling these distributions. We discuss a canonical example of an incomplete cross‐validation test statistic to discriminate between black box models, and a real HIV dataset example to illustrate inference after making multiple queries on data.</p>
statistics & probability
What problem does this paper attempt to address?