A First Course in Data Science

Donghui Yan,Gary E. Davis
DOI: https://doi.org/10.1080/10691898.2019.1623136
2019-05-08
Abstract:Data science is a discipline that provides principles, methodology and guidelines for the analysis of data for tools, values, or insights. Driven by a huge workforce demand, many academic institutions have started to offer degrees in data science, with many at the graduate, and a few at the undergraduate level. Curricula may differ at different institutions, because of varying levels of faculty expertise, and different disciplines (such as Math, computer science, and business etc) in developing the curriculum. The University of Massachusetts Dartmouth started offering degree programs in data science from Fall 2015, at both the undergraduate and the graduate level. Quite a few articles have been published that deal with graduate data science courses, much less so dealing with undergraduate ones. Our discussion will focus on undergraduate course structure and function, and specifically, a first course in data science. Our design of this course centers around a concept called the data science life cycle. That is, we view tasks or steps in the practice of data science as forming a process, consisting of states that indicate how it comes into life, how different tasks in data science depend on or interact with others until the birth of a data product or the reach of a conclusion. Naturally, different pieces of the data science life cycle then form individual parts of the course. Details of each piece are filled up by concepts, techniques, or skills that are popular in industry. Consequently, the design of our course is both "principled" and practical. A significant feature of our course philosophy is that, in line with activity theory, the course is based on the use of tools to transform real data in order to answer strongly motivated questions related to the data.
Other Statistics
What problem does this paper attempt to address?
The paper attempts to address the issue of how to design an effective introductory undergraduate course in data science. Specifically, the paper discusses the following aspects: 1. **Course Objectives**: The course aims to introduce students to the basic concepts and practices of data science and to spark their interest in the field. It also provides the main components of data science and teaches some practical techniques and tools so that students can apply them in their future studies or work. 2. **Course Design Philosophy**: The course design revolves around the data science lifecycle, viewing the tasks and steps of data science as a process that covers various stages from data collection to data analysis, and finally to drawing conclusions or generating data products. Each stage includes popular concepts, techniques, and skills in the industry. 3. **Practical Teaching**: The course emphasizes practicality by using real datasets and industry cases to give students hands-on data analysis experience. The course uses the R language for programming instruction, as R is increasingly popular in the industry. 4. **Theoretical Foundation**: The paper introduces activity theory as the theoretical foundation for course design, emphasizing the transformation of raw data into valuable information through practical operations and the application of tools. The course design focuses on the actual needs and interests of students, enabling them to gain a sense of accomplishment in solving real-world problems. In summary, the paper focuses on proposing a design approach centered on the data science lifecycle to cultivate students' practical abilities and interest in data science.