Undergraduate data science education: Who has the microphone and what are they saying?

Mine Dogucu,Sinem Demirci,Harry Bendekgey,Federica Zoe Ricci,Catalina M. Medina
2024-03-06
Abstract:The presence of data science has been profound in the scientific community in almost every discipline. An important part of the data science education expansion has been at the undergraduate level. We conducted a systematic literature review to (1) specify current evidence and knowledge gaps in undergraduate data science education and (2) inform policymakers and data science educators/practitioners about the present status of data science education research. The majority of the publications in data science education that met our search criteria were available open-access. Our results indicate that data science education research lacks empirical data and reproducibility. Not all disciplines contribute equally to the field of data science education. Computer science and data science as a separate field emerge as the leading contributors to the literature. In contrast, fields such as statistics, mathematics, as well as other fields closely related to data science exhibit a limited presence in studies. We recommend that federal agencies and researchers 1) invest in empirical data science education research; 2) diversify research efforts to enrich the spectrum of types of studies; 3) encourage scholars in key data science fields that are currently underrepresented in the literature to contribute more to research and publications.
Other Statistics
What problem does this paper attempt to address?
The main focus of this paper is on undergraduate data science education. The research team conducted a systematic literature review to identify the current evidence and knowledge gaps in undergraduate data science education and provide information on the state of the field for policymakers and educators. They found that despite the significant impact of data science across disciplines, research on data science education lacks empirical data and reproducibility. Computer science and data science dominate the literature as standalone fields, while other relevant fields such as statistics and mathematics contribute relatively less to the research. The paper suggests that federal agencies and researchers should invest in empirical data science education research, diversify research efforts to enrich the types of research, and encourage scholars from key data science fields that are currently underrepresented in the literature to increase research and publication. Additionally, the research highlights the need to promote interdisciplinary communication in data science education, as the current literature primarily focuses on statistical education and lacks a comprehensive interdisciplinary perspective. The research team excluded some publications that did not meet their criteria during the analysis process, ultimately focusing on undergraduate-level data science education. They found that although the literature on data science education is relatively new and mostly open access, there is a lack of data collection and clear research questions, indicating the need for more empirical research and reproducibility in the field. The paper also discusses the contributions of different content areas such as educational technology, curriculum examples, and course activities, and identifies knowledge gaps for future research, including a lack of balanced contributions from interdisciplinary perspectives.