Theoretical foundations for the Human Cell Atlas

Salil Bhate
DOI: https://doi.org/10.48550/arXiv.1710.07585
2017-10-20
Abstract:In Schiebinger et al. (2017), the authors use optimal transport of measures on empirical distributions arising from biological experiments to relate the single cell RNA sequencing profiles for induced pluripotent stem cells differentiating. But such algorithms could be arbitrarily applied to any datasets from any collection of experiments. We consider here a natural question that arises: in a manner consistent with conventionally accepted assumptions about biology, in which cases can the results of two experiments be mapped to each other in this manner? The answer to this question is of fundamental practical importance in developing algorithms that use this method for analysing and integrating complex datasets collected as part of the Human Cell Atlas. Here, we develop a formulation of biology in terms of sheaves of $C^*(X)$-modules for a smooth manifold $X$ equipped with certain structures, that enables this question to be formally answered, leading to formal statements about experimental inference and phenotypic identifiability. These structures capture a perspective on biology that is consistent with a standard, widely accepted biological perspective and is mathematically intuitive. Our methods provide a framework in which to design complex experiments and the algorithms to analyse them in a way that their conclusions can be believed.
Quantitative Methods
What problem does this paper attempt to address?