Quantifying the clusterness and trajectoriness of single-cell RNA-seq data

Hong Seo Lim,Peng Qiu
DOI: https://doi.org/10.1371/journal.pcbi.1011866
2024-02-29
PLoS Computational Biology
Abstract:Among existing computational algorithms for single-cell RNA-seq analysis, clustering and trajectory inference are two major types of analysis that are routinely applied. For a given dataset, clustering and trajectory inference can generate vastly different visualizations that lead to very different interpretations of the data. To address this issue, we propose multiple scores to quantify the "clusterness" and "trajectoriness" of single-cell RNA-seq data, in other words, whether the data looks like a collection of distinct clusters or a continuum of progression trajectory. The scores we introduce are based on pairwise distance distribution, persistent homology, vector magnitude, Ripley's K, and degrees of connectivity. Using simulated datasets, we demonstrate that the proposed scores are able to effectively differentiate between cluster-like data and trajectory-like data. Using real single-cell RNA-seq datasets, we demonstrate the scores can serve as indicators of whether clustering analysis or trajectory inference is a more appropriate choice for biological interpretation of the data. Single-cell sequencing technologies have motivated development of numerous computational algorithms. Two main types of these algorithms are clustering and trajectory inference. When scientists have a scRNA-seq dataset, they usually pick one of these approaches based on what they think the data shows. If they think the data has distinct clusters of cells, they will analyze the data using clustering algorithms. If they think the data shows a continuous progression, they will use trajectory inference algorithms. However, sometimes using clustering and trajectory inference on the same data can lead to very different interpretations, where clustering algorithms produce distinct cell clusters while trajectory inference on the same data show continuous trajectories. This makes us wonder: which way of looking at the data is more appropriate? In this paper, we developed a pipeline for quantifying the "clusterness" and "trajectoriness" of scRNA-seq data, in other words, whether the data looks like a collection of distinct clusters or a continuum of progression trajectory. We think such geometric quantification is an important question that should be broadly discussed in the single-cell research community.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?