Understanding Biology in the Age of Artificial Intelligence

Elsa Lawrence,Adham El-Shazly,Srijit Seal,Chaitanya K Joshi,Pietro Liò,Shantanu Singh,Andreas Bender,Pietro Sormanni,Matthew Greenig
2024-03-07
Abstract:Modern life sciences research is increasingly relying on artificial intelligence approaches to model biological systems, primarily centered around the use of machine learning (ML) models. Although ML is undeniably useful for identifying patterns in large, complex data sets, its widespread application in biological sciences represents a significant deviation from traditional methods of scientific inquiry. As such, the interplay between these models and scientific understanding in biology is a topic with important implications for the future of scientific research, yet it is a subject that has received little attention. Here, we draw from an epistemological toolkit to contextualize recent applications of ML in biological sciences under modern philosophical theories of understanding, identifying general principles that can guide the design and application of ML systems to model biological phenomena and advance scientific knowledge. We propose that conceptions of scientific understanding as information compression, qualitative intelligibility, and dependency relation modelling provide a useful framework for interpreting ML-mediated understanding of biological systems. Through a detailed analysis of two key application areas of ML in modern biological research - protein structure prediction and single cell RNA-sequencing - we explore how these features have thus far enabled ML systems to advance scientific understanding of their target phenomena, how they may guide the development of future ML models, and the key obstacles that remain in preventing ML from achieving its potential as a tool for biological discovery. Consideration of the epistemological features of ML applications in biology will improve the prospects of these methods to solve important problems and advance scientific understanding of living systems.
Artificial Intelligence
What problem does this paper attempt to address?
This paper discusses how to understand biology in the era of artificial intelligence (AI). Modern life science research increasingly relies on artificial intelligence, especially machine learning models, to simulate biological systems. However, these methods differ significantly from traditional scientific exploration, and there is relatively little philosophical discussion on how they contribute to biological understanding. The paper first points out that although machine learning is very useful in pattern recognition of large complex datasets, its application in biology brings new challenges. Unlike the predictable laws commonly found in physics, biological models are often based on qualitative descriptions and exhibit multidimensionality, conditionality, and emergence, making direct deductive explanations less applicable. The paper proposes three concepts from modern epistemology for understanding science: information compression, qualitative interpretability, and dependency modeling. These concepts serve as frameworks for explaining the understanding of biological systems mediated by machine learning. Through case studies in protein structure prediction and single-cell RNA sequencing, the paper explores how these features promote scientific understanding, guide the development of future machine learning models, and highlight key obstacles preventing machine learning from realizing its potential in biological discoveries. The paper also discusses the applications of machine learning in biology, such as learning patterns and knowledge from data, and emphasizes the importance of preventing overfitting to improve generalization performance. The choice of data representation, such as principal component analysis (PCA) and other nonlinear dimensionality reduction techniques, is also considered a key factor in understanding and modeling biological phenomena. In conclusion, the problem addressed in this paper is how to appropriately apply machine learning in biological research to enhance scientific understanding of biological systems, and how to guide the design and application of machine learning models through modern philosophical theories.