GENOT: Entropic (Gromov) Wasserstein Flow Matching with Applications to Single-Cell Genomics

Dominik Klein,Théo Uscidda,Fabian Theis,Marco Cuturi
2024-11-08
Abstract:Single-cell genomics has significantly advanced our understanding of cellular behavior, catalyzing innovations in treatments and precision medicine. However, single-cell sequencing technologies are inherently destructive and can only measure a limited array of data modalities simultaneously. This limitation underscores the need for new methods capable of realigning cells. Optimal transport (OT) has emerged as a potent solution, but traditional discrete solvers are hampered by scalability, privacy, and out-of-sample estimation issues. These challenges have spurred the development of neural network-based solvers, known as neural OT solvers, that parameterize OT maps. Yet, these models often lack the flexibility needed for broader life science applications. To address these deficiencies, our approach learns stochastic maps (i.e. transport plans), allows for any cost function, relaxes mass conservation constraints and integrates quadratic solvers to tackle the complex challenges posed by the (Fused) Gromov-Wasserstein problem. Utilizing flow matching as a backbone, our method offers a flexible and effective framework. We demonstrate its versatility and robustness through applications in cell development studies, cellular drug response modeling, and cross-modality cell translation, illustrating significant potential for enhancing therapeutic strategies.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address several key challenges in single-cell genomics: 1. **Limitations of Data Modalities**: Single-cell sequencing technologies are typically destructive and can only measure limited data modalities simultaneously. This limits a comprehensive understanding of cell states. 2. **Limitations of Optimal Transport (OT)**: - **Scalability**: Traditional discrete OT solvers face high computational complexity when handling large-scale data. - **Privacy Issues**: Discrete OT solvers have data privacy issues when dealing with large-scale single-cell atlases. - **Out-of-Sample Estimation**: Discrete OT solvers have limited out-of-sample estimation capabilities in specific scenarios. 3. **Shortcomings of Neural Network OT Solvers**: - **Deterministic Mapping**: Existing neural network OT solvers typically estimate deterministic mappings (Monge maps), which do not align with the stochastic nature of cell evolution processes. - **Choice of Cost Function**: These methods are limited in the choice of cost functions, mostly restricted to squared Euclidean costs, while single-cell genomics data are non-Euclidean. - **Mass Conservation Constraint**: Existing neural network OT solvers lack the ability to handle unbalanced OT, failing to model cell growth and death, and cannot automatically exclude outliers in noisy data. - **Limitations of Linear OT**: Existing neural network OT solvers are mainly limited to linear OT scenarios, whereas many applications in single-cell genomics require handling partially or fully incomparable spaces, i.e., Quadratic OT or Gromov-Wasserstein (GW) problems. To address these issues, the paper proposes GENOT (Generative Entropic Neural Optimal Transport), a powerful and flexible neural OT framework that meets all the above requirements and demonstrates its effectiveness and robustness in multiple single-cell biology tasks.