Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution

Mridul Khurana,Arka Daw,M. Maruf,Josef C. Uyeda,Wasila Dahdul,Caleb Charpentier,Yasin Bakış,Henry L. Bart Jr.,Paula M. Mabee,Hilmar Lapp,James P. Balhoff,Wei-Lun Chao,Charles Stewart,Tanya Berger-Wolf,Anuj Karpatne
2024-08-01
Abstract:A central problem in biology is to understand how organisms evolve and adapt to their environment by acquiring variations in the observable characteristics or traits of species across the tree of life. With the growing availability of large-scale image repositories in biology and recent advances in generative modeling, there is an opportunity to accelerate the discovery of evolutionary traits automatically from images. Toward this goal, we introduce Phylo-Diffusion, a novel framework for conditioning diffusion models with phylogenetic knowledge represented in the form of HIERarchical Embeddings (HIER-Embeds). We also propose two new experiments for perturbing the embedding space of Phylo-Diffusion: trait masking and trait swapping, inspired by counterpart experiments of gene knockout and gene editing/swapping. Our work represents a novel methodological advance in generative modeling to structure the embedding space of diffusion models using tree-based knowledge. Our work also opens a new chapter of research in evolutionary biology by using generative models to visualize evolutionary changes directly from images. We empirically demonstrate the usefulness of Phylo-Diffusion in capturing meaningful trait variations for fishes and birds, revealing novel insights about the biological mechanisms of their evolution.
Populations and Evolution,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper aims to address a core issue in biology: how to understand the adaptation of organisms to their environment and their evolution in the tree of life (i.e., the evolutionary history of species) through the acquisition of observable traits or characteristics. Specifically, the goal of the research is to use machine learning methods to automatically discover evolutionary traits from images and visualize the changes in these traits through generative models. To tackle this problem, the paper proposes the Phylo-Diffusion framework, a novel approach that combines diffusion models with phylogenetic knowledge from the tree of life. The core innovation of this framework is the introduction of a strategy called "HIER-Embed" (Hierarchical Embedding), which encodes information about each species at different evolutionary levels in the tree of life. Additionally, the paper proposes two experimental methods to analyze evolutionary traits: Trait Masking and Trait Swapping, which simulate gene knockout and gene editing/swapping experiments, respectively. The main contributions include: 1. Proposing a new method for structured embedding space that uses tree-based knowledge to guide generative models. 2. Opening a new research direction for visualizing evolutionary changes directly from images using generative models, which is of significant importance to evolutionary biology. 3. Demonstrating through experimental results that Phylo-Diffusion can capture meaningful trait changes in fish and bird datasets, thereby revealing new insights into their evolution. In summary, this paper provides new tools and techniques for understanding biological evolution by combining the latest generative model technologies with biological knowledge.