Data Science and Predictive Analytics, 2nd ed.

Xing Qiu
DOI: https://doi.org/10.1080/01621459.2024.2303323
IF: 4.369
2024-02-16
Journal of the American Statistical Association
Abstract:Data science is a new and fast growing field that received a lot of attention lately. In the most broad sense, data science is the study of the generalizable extraction of knowledge from data Dhar ( Citation 2013 ). To this end, it draws upon various methods in statistics, informatics, computer science, and applied mathematics. Notably, there is a well-known internet meme, often attributed to Josh Wills, the former Director of Data Engineering at Slack, which humorously defines a data scientist as someone who is "better at statistics than any software engineer and better at software engineering than any statistician." While this meme is partly playful, it underscores the fact that, compared to traditional statisticians, data scientists must possess a more interdisciplinary skill set and hands-on computational abilities to navigate large, noisy, and sometimes unstructured datasets.
statistics & probability
What problem does this paper attempt to address?
This paper is actually a book review of "Data Science and Predictive Analytics" (2nd Edition). The main problem the book attempts to solve is how to provide a practical and comprehensive learning resource in the rapidly - developing field of data science. Specifically: 1. **Practicality**: Compared with traditional statistics and machine - learning textbooks, this book places more emphasis on the cultivation of practical skills, such as data processing, data exploration and analysis, handling of special data types (such as time - series data), text mining and natural language processing. These skills are crucial for data scientists in actual work. 2. **Wide Coverage**: The book covers a series of analysis methods from basic linear regression models to advanced deep - learning techniques, and deepens understanding through practical case studies. In addition, the book also includes some important theoretical foundations, such as linear algebra, regression analysis, and general theories and methods of optimization. 3. **Ethics and Responsibility**: The book particularly emphasizes the importance of responsible data science and ethical predictive analysis, which is an awareness that modern data scientists must possess. 4. **Programming and Visualization**: All analysis methods are implemented through R - language scripts, accompanied by attractive visual results and result interpretations, which help readers better understand and apply the knowledge they have learned. In conclusion, this book aims to provide a practical and comprehensive guide for those who hope to pursue a career in data science, teaching not only specific skills and techniques, but also focusing on the cultivation of theoretical foundations and professional ethics.