Sciphy: A Bayesian phylogenetic framework using sequential genetic lineage tracing data.

Sophie Seidel,Antoine Zwaans,Samuel Regalado,Junhong Choi,Jay Shendure,Tanja Stadler
DOI: https://doi.org/10.1101/2024.10.01.615771
2024-10-03
Abstract:CRISPR-based lineage tracing offers a promising avenue to decipher single cell lineage trees, especially in organisms that are challenging for microscopy. A recent advancement in this domain is lineage tracing based on sequential genome editing, which not only records genetic edits but also the order in which they occur. To capitalize on this enriched data, we introduce SciPhy, a simulation and inference tool integrated within the BEAST 2 framework. SciPhy utilizes a Bayesian phylogenetic approach to estimate time-scaled phylogenies and cell population parameters. After validating SciPhy using simulations, we apply it to lineage tracing data obtained from a monoclonal culture of HEK293T cells for which we estimate time-scaled trees together with cell proliferation rates. We compare SciPhy to the lineage reconstruction based on a widely used clustering method, UPGMA, and find that the UPGMA-reconstructed lineage trees differ from SciPhy trees in some key aspects of tree structure; in particular, SciPhy trees stand out for their later estimated cell division times. In addition, SciPhy reports uncertainty as well as proliferation rates, neither of which are available within a UPGMA analysis. This study showcases the application of advanced phylogenetic and phylodynamic tools to explore and quantify cell lineage trees, laying the groundwork for enhanced and confident analyses to decode the complexities of biological development in multicellular organisms. SciPhy's codebase is publicly available at https://github.com/azwaans/SciPhy.
Developmental Biology
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on how to use CRISPR - Cas9 - based continuous gene - editing data to more accurately reconstruct single - cell lineage trees and estimate cell - population dynamic parameters. Specifically: 1. **Improving the resolution of lineage trees**: Early CRISPR - Cas9 lineage - tracing methods relied on unordered insertion mutations, which limited the reconstruction of high - resolution cell lineage trees. This paper improves the resolution of lineage trees by introducing ordered insertion mutations, that is, sequentially inserting nucleotide sequences at target sites. 2. **Integrating time information**: Traditional lineage - reconstruction methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) are only based on pairwise distance metrics and ignore higher - order information. The method (SciPhy) proposed in this paper uses a Bayesian phylogenetics framework and can estimate lineage trees on a time scale, providing more accurate time information. 3. **Modeling the editing process**: The SciPhy model not only considers the incidence of insertion mutations but also the probabilities of different insertions and the order of these insertions at the target sites. This enables the model to more accurately reflect the actual editing process. 4. **Estimating cell proliferation rates**: In addition to reconstructing lineage trees, SciPhy can also estimate cell proliferation rates and other dynamic parameters, which cannot be provided by traditional methods (such as UPGMA). 5. **Validation and comparison**: The paper validates the effectiveness of SciPhy through simulated data and compares it with the UPGMA method, demonstrating the advantages of SciPhy in estimating lineage - tree structures and branch lengths. In summary, the main purpose of this paper is to develop a new tool (SciPhy), use CRISPR - Cas9 - based continuous gene - editing data to more accurately reconstruct single - cell lineage trees, and estimate cell - population dynamic parameters, thereby providing a more powerful tool for studying the developmental complexity of multicellular organisms.