SCTC: inference of developmental potential from single-cell transcriptional complexity

Hai Lin,Huan Hu,Zhen Feng,Fei Xu,Jie Lyu,Xiang Li,Liyu Liu,Gen Yang,Jianwei Shuai
DOI: https://doi.org/10.1093/nar/gkae340
IF: 14.9
2024-05-08
Nucleic Acids Research
Abstract:Inferring the developmental potential of single cells from scRNA-Seq data and reconstructing the pseudo-temporal path of cell development are fundamental but challenging tasks in single-cell analysis. Although single-cell transcriptional diversity (SCTD) measured by the number of expressed genes per cell has been widely used as a hallmark of developmental potential, it may lead to incorrect estimation of differentiation states in some cases where gene expression does not decrease monotonously during the development process. In this study, we propose a novel metric called single-cell transcriptional complexity (SCTC), which draws on insights from the economic complexity theory and takes into account the sophisticated structure information of scRNA-Seq count matrix. We show that SCTC characterizes developmental potential more accurately than SCTD, especially in the early stages of development where cells typically have lower diversity but higher complexity than those in the later stages. Based on the SCTC, we provide an unsupervised method for accurate, robust, and transferable inference of single-cell pseudotime. Our findings suggest that the complexity emerging from the interplay between cells and genes determines the developmental potential, providing new insights into the understanding of biological development from the perspective of complexity theory.
biochemistry & molecular biology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to accurately infer the developmental potential of single cells from single - cell transcriptome data (scRNA - Seq) and reconstruct the pseudo - time path of cell development. Specifically, the article points out: 1. **Limitations of existing methods**: - The currently widely used single - cell transcriptional diversity (SCTD), that is, measuring the developmental potential of cells by calculating the number of expressed genes in each cell, may lead to misestimation of the differentiation state in some cases. Especially in the early stage of development, the gene expression level does not always decrease monotonically, which makes methods based on gene diversity (such as CytoTRACE) perform poorly in these stages. 2. **The proposed new method**: - To solve the above problems, the author introduced a new metric - single - cell transcriptional complexity (SCTC). This concept draws on the economic complexity theory and considers not only the quantity of gene expression but also the complex structural information in the gene expression pattern. - SCTC constructs a bipartite network model, regards cells and genes as nodes in the network, and uses recursive or analytical methods to calculate the high - order complexity of cells and genes. Finally, the cell complexity index (CCI) and the gene complexity index (GCI) are defined to quantify SCTC. 3. **Research purposes**: - To verify whether SCTC can more accurately characterize the developmental potential of cells, especially in the early stage of development. - To provide an unsupervised method for accurate, robust and transferable single - cell pseudo - time inference. 4. **Research significance**: - Research shows that the complexity generated by the interaction between cells and genes determines the developmental potential of cells, which provides new insights into understanding biological development from the perspective of complexity theory. - By comparing with the existing SCTD methods, SCTC shows better performance on multiple single - cell data sets, especially more reliable in pseudo - time inference in the early development stage. ### Formula summary - **0 - order complexity**: \[ k_{c,0}=\sum_{g}M_{cg} \] \[ k_{g,0}=\sum_{c}M_{cg} \] - **N - order complexity**: \[ k_{c,N}=\frac{1}{k_{c,0}}\sum_{g}M_{cg}k_{g,N - 1} \] \[ k_{g,N}=\frac{1}{k_{g,0}}\sum_{c}M_{cg}k_{c,N - 1} \] - **Definition of matrix \(\tilde{M}_{cc'}\)**: \[ \tilde{M}_{cc'}=\sum_{g}\frac{M_{cg}M_{c'g}}{k_{c,0}k_{g,0}} \] - **Cell complexity index (CCI)**: \[ CCI=\frac{\vec{K}-\min(\vec{K})}{\max(\vec{K})-\min(\vec{K})} \] - **Gene complexity index (GCI)**: \[ Q_g=\frac{1}{k_{g,0}}\sum_{c}M_{cg}CCI_c \] \[ GCI=\frac{\vec{Q}-\min(\vec{Q})}{\max(\vec{Q})-\min(\vec{Q})} \] Through these formulas and methods, SCTC can more comprehensively capture the complex structural information in the gene expression pattern, so as to more accurately infer the developmental potential and pseudo - time of cells.