Teaching precursors to data science in introductory and second courses in statistics

Nicholas J Horton,Benjamin S Baumer,Hadley Wickham
DOI: https://doi.org/10.48550/arXiv.1401.3269
2014-01-15
Abstract:Statistics students need to develop the capacity to make sense of the staggering amount of information collected in our increasingly data-centered world. Data science is an important part of modern statistics, but our introductory and second statistics courses often neglect this fact. This paper discusses ways to provide a practical foundation for students to learn to "compute with data" as defined by Nolan and Temple Lang (2010), as well as develop "data habits of mind" (Finzer, 2013). We describe how introductory and second courses can integrate two key precursors to data science: the use of reproducible analysis tools and access to large databases. By introducing students to commonplace tools for data management, visualization, and reproducible analysis in data science and applying these to real-world scenarios, we prepare them to think statistically in the era of big data.
Computation,Computers and Society,Other Statistics
What problem does this paper attempt to address?