Abstract:CRISPR-based lineage tracing, coupled with single-cell RNA sequencing, has emerged as a promising approach for studying cell transformations during development as well as disease progression. However, the high ratio of cells to CRISPR-induced mutations, combined with missing data from silencing or dropout, make cell lineage tree (CLT) reconstruction difficult. As a result, this computational problem has attracted significant attention in recent years, including the introduction of Star Homoplasy Parsimony (SHP) in 2023 to model the specific properties of CRISPR-induced mutations, along with the Startle family of methods based on integer linear programming (ILP) or heuristic search (NNI). Here, we present Star-CDP, the first dynamic programming algorithm for SHP. Star-CDP solves SHP within a constrained search space $\Sigma$ defined by subsets of cells from which a solution CLT must draw its clades. When $\Sigma$ is the power set, Star-CDP is an exact exponential algorithm with time complexity $O(nm|\Sigma|^2)$, where $n$ is the number of cells, $m$ is the number of target sites, and $|\Sigma| = O(2^n)$. We show that it is possible to build clade constraints that are polynomially-sized and effective in practice. Motivated by the technological challenges in producing consistent phylogenetic signal across the tree during lineage tracing, we also present algorithms to efficiently count, sample, and build consensus trees from all solutions to the clade-constrained SHP problem. In simulations, Star-CDP's strict consensus effectively reduced false positive branches while preserving many more true positives compared to the standard strict consensus implemented by PAUP*, a popular parsimony method from species phylogenetics. Likewise, Star-CDP's strict consensus achieved the same or higher accuracy (f1-score) on all but one of the 15 model conditions tested, often outperforming leading the methods, Startle-ILP and Startle-NNI, while also scaling to larger data sets than Startle-ILP. Lastly, we analyzed lineage tracing data from the KP-Tracer mouse model of lung adenocarcinoma, finding that Star-CDP produced plausible CLTs, often lowering the number of migration and reseeding events needed to explain metastases compared to Startle. Our analysis also showed, for the first time, that strategies for preprocessing cells with missing data---specifically cell pruning and deduplicating techniques---can have a substantial impact on CLTs reconstructed with the same method, even changing relative performance across methods compared to previously published results. The same was true of postprocessing trees with LAML, a maximum likelihood method designed for mixed-type missing data. By exploring these different pipelines, we recovered the most plausible CLT for the largest KP-Tracer metastatic tumor, reducing the number of reseeding events from 42 to 10 without increasing the number of migrations. Star-CDP is available on Github: https://github.com/molloy-lab/Star-CDP.

Estimation of cell lineage trees by maximum-likelihood phylogenetics

Maximum Likelihood Inference of Time-scaled Cell Lineage Trees with Mixed-type Missing Data

Single-cell lineage tracing by integrating CRISPR-Cas9 mutations with transcriptomic data

Achieving single-cell-resolution lineage tracing in zebrafish by continuous barcoding mutations during embryogenesis

Quantitative Analysis of Synthetic Cell Lineage Tracing Using Nuclease Barcoding

Sciphy: A Bayesian phylogenetic framework using sequential genetic lineage tracing data.

Maximum likelihood phylogeographic inference of cell motility and cell division from spatial lineage tracing data

Simulation of CRISPR-Cas9 editing on evolving barcode and accuracy of lineage tracing

Bayesian phylodynamics of early vertebrate development in BEAST 2

Cell lineage tracing using nuclease barcoding

Lineage recording of zebrafish embryogenesis reveals historical and ongoing lineage commitments

Large-scale reconstruction of cell lineages using single-cell readout of transcriptomes and CRISPR–Cas9 barcodes by scGESTALT

Mapping single-cell-resolution cell phylogeny reveals cell population dynamics during organ development

Tree reconstruction guarantees from CRISPR-Cas9 lineage tracing data using Neighbor-Joining

Mapping Lineage-Traced Cells Across Time Points with Moslin

Reconstructing cell lineage trees with genomic barcoding: approaches and applications

Analysis of Cell Lineage Trees by Exact Bayesian Inference Identifies Negative Autoregulation of Nanog in Mouse Embryonic Stem Cells.

ScisTree2: An Improved Method for Large-scale Inference of Cell Lineage Trees and Genotype Calling from Noisy Single Cell Data

Improving cellular phylogenies through the integrated use of mutation order and optimality principles

Dynamic programming algorithms for fast and accurate cell lineage tree reconstruction from CRISPR-based lineage tracing data

Startle: A star homoplasy approach for CRISPR-Cas9 lineage tracing