Is machine learning good or bad for the natural sciences?

David W. Hogg,Soledad Villar

2024-06-01

Abstract:Machine learning (ML) methods are having a huge impact across all of the sciences. However, ML has a strong ontology - in which only the data exist - and a strong epistemology - in which a model is considered good if it performs well on held-out training data. These philosophies are in strong conflict with both standard practices and key philosophies in the natural sciences. Here we identify some locations for ML in the natural sciences at which the ontology and epistemology are valuable. For example, when an expressive machine learning model is used in a causal inference to represent the effects of confounders, such as foregrounds, backgrounds, or instrument calibration parameters, the model capacity and loose philosophy of ML can make the results more trustworthy. We also show that there are contexts in which the introduction of ML introduces strong, unwanted statistical biases. For one, when ML models are used to emulate physical (or first-principles) simulations, they amplify confirmation biases. For another, when expressive regressions are used to label datasets, those labels cannot be used in downstream joint or ensemble analyses without taking on uncontrolled biases. The question in the title is being asked of all of the natural sciences; that is, we are calling on the scientific communities to take a step back and consider the role and value of ML in their fields; the (partial) answers we give here come from the particular perspective of physics.

Machine Learning,Instrumentation and Methods for Astrophysics,Data Analysis, Statistics and Probability

What problem does this paper attempt to address?

This paper discusses the role of machine learning (ML) in natural science and whether it is beneficial or harmful. The author points out that although ML is widely used in various scientific fields, its ontological view (only focusing on the existence of data) and epistemology (performance on validation data as the criterion for success) conflict with the goal of understanding and explaining the world pursued by natural science. The paper mentions that ML has value in natural science, especially in causal inference, where complex models representing confounding factors (such as foreground, background, or instrument calibration parameters) can improve the credibility of results. However, ML also introduces some statistical biases, such as amplifying confirmation bias when used to replace or enhance physical simulations in modeling, or labeling datasets with expression regression, which leads to uncontrollable bias in downstream joint or integrated analysis. The author emphasizes that ML has a safe and necessary space in certain operational aspects of scientific projects, but its role and value in understanding natural phenomena are still unclear. The paper calls for reflection and evaluation of the role and value of ML in the natural science community. Overall, the paper suggests that ML has both benefits and potential problems in natural science and should be used with caution.

Is machine learning good or bad for the natural sciences?

The Automated Laplacean Demon: How ML Challenges Our Views on Prediction and Explanation

Machine learning and the physical sciences

The Challenges of Machine Learning: A Critical Review

(Non)-neutrality of science and algorithms: Machine Learning between fundamental physics and society

Use and Misuse of Machine Learning in Anthropology

Machine Learning and Theory Ladenness -- A Phenomenological Account

Machine learning alternative to systems biology should not solely depend on data

Social and environmental impact of recent developments in machine learning on biology and chemistry research

Understanding Biology in the Age of Artificial Intelligence

Value-laden Disciplinary Shifts in Machine Learning

Machine learning and deep learning—A review for ecologists

Reflections on the future of machine learning for materials research

Opportunities for machine learning in scientific discovery

The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?

When not to use machine learning: A perspective on potential and limitations

Can Machine Learning be Moral?

Reliability and Interpretability in Science and Deep Learning

Perspective: Machine learning in experimental solid mechanics

The Scientific Method in the Science of Machine Learning