Elements and Principles for Characterizing Variation between Data Analyses

Stephanie C. Hicks,Roger D. Peng
DOI: https://doi.org/10.48550/arXiv.1903.07639
2019-03-18
Applications
Abstract:The data revolution has led to an increased interest in the practice of data analysis. For a given problem, there can be significant or subtle differences in how a data analyst constructs or creates a data analysis, including differences in the choice of methods, tooling, and workflow. In addition, data analysts can prioritize (or not) certain objective characteristics in a data analysis, leading to differences in the quality or experience of the data analysis, such as an analysis that is more or less reproducible or an analysis that is more or less exhaustive. However, data analysts currently lack a formal mechanism to compare and contrast what makes analyses different from each other. To address this problem, we introduce a vocabulary to describe and characterize variation between data analyses. We denote this vocabulary as the elements and principles of data analysis, and we use them to describe the fundamental concepts for the practice and teaching of creating a data analysis. This leads to two insights: it suggests a formal mechanism to evaluate data analyses based on objective characteristics, and it provides a framework to teach students how to build data analyses.
What problem does this paper attempt to address?