LotOfCells: data visualization and statistics of single cell metadata

Oscar Gonzalez-Velasco
DOI: https://doi.org/10.1101/2024.05.23.595582
2024-05-28
Abstract:Single-cell sequencing unveils a treasure trove into the biological and molecular characteristics of samples. Yet, within this flood of data, the challenge to draw meaningful conclusions sometimes can be time consuming and a tortuous process. Here we introduce LotOfCells: a simple R package designed to explore the intricate landscape of phenotypic data within single-cell studies. Normally, we are interested in visualizing and measuring if the differences in the proportion of number of cells across various covariates is significant or biologically relevant. As an example, one of the most common questions is the proportion of different cell types across conditions in our experiment, or the cluster composition before and after treatment (e.g.: difference in cell type proportions between wild type and mutant). LotOfCells helps with the interpretation and visualization of meta- data of these recurrent scenarios, including the test of proportion changes across multiple ordered stages. Additionally, it computes a symmetric divergence score to measure global deregulation of cell proportions due to a condition.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively visualize and analyze phenotypic data (metadata) in single - cell research, especially the changes in cell - type proportions under different experimental conditions and their biological relevance. Specifically, the paper proposes an R package named LotOfCells, aiming to help researchers easily visualize and test whether the changes in the proportion of specific cell populations are significant, for example, the changes in the number of different cell types between tumors and control groups. In addition, this tool can also calculate the Symmetric Divergence Score, which is used to measure the degree of global cell - proportion imbalance caused by certain conditions. ### Background of the paper Single - cell sequencing technology has revealed valuable information about the biological and molecular characteristics of samples. However, in this flood of data, drawing meaningful conclusions is sometimes both time - consuming and complex. Especially when comparing cell - type proportions or other covariates (such as clusters or tissue composition) under different experimental conditions, traditional statistical tests may not be able to effectively capture subtle differences, and it is also challenging to assess the extremity and significance of observations because of the limited power of classical statistical methods. ### Purpose of the paper 1. **Provide a tool**: LotOfCells is a simple R package designed specifically for exploring phenotypic data in single - cell research. 2. **Solve specific problems**: - Visualize and measure whether the differences in cell - number proportions under different covariates are statistically significant or biologically relevant. - Evaluate the significance and global impact of changes in cell - type proportions through methods such as Monte Carlo simulation and Symmetric Divergence Score. 3. **Support multiple application scenarios**: Applicable to multiple single - cell data sets, including Seurat and SingleCellExperiment objects. ### Method overview - **Monte Carlo simulation**: Used to calculate the cell - frequency differences between two conditions, and construct a null distribution through random sampling and frequency calculation. - **Symmetric Divergence Score**: Based on Kullback - Leibler (KL) divergence, it measures the global dissimilarity of class distributions between two samples. - **Kendall correlation coefficient**: Used to test the trend of proportion changes under multiple ordered conditions. ### Experimental verification The paper used a publicly available single - cell RNA - sequencing data set from Kim et al. on metastatic lung adenocarcinoma, which contains 208,506 cells, involving normal adjacent tissues and cancers from the early to metastatic stages, a total of 44 patients. Through LotOfCells, the authors demonstrated the powerful functions of this tool in visualization and statistical analysis, especially in detecting significant changes in cell - type proportions between tumor and normal lung tissues. ### Main findings - **Immune response in the tumor microenvironment**: The proportions of T - lymphocytes and B - lymphocytes in tumor lung tissues increased significantly, while the proportions of natural killer cells (NK cells) and myeloid cells decreased significantly. - **Myeloid infiltration in metastatic lymph nodes**: The proportions of myeloid cells and myofibroblasts in metastatic lymph nodes increased significantly, indicating that myeloid infiltration is related to tumor progression. - **Myofibroblasts and tumor progression**: The proportions of myofibroblasts and COL14A1 + stromal fibroblasts changed significantly at different lung cancer stages, supporting the role of these cell types in promoting tissue remodeling and angiogenesis. ### Conclusion LotOfCells provides an effective method for evaluating significant changes in cell - number proportions under different conditions in single - cell research, which helps to clarify and test the quantitative differences that are usually claimed but not statistically tested in single - cell research. This will help researchers gain a deeper understanding of cell heterogeneity and disease - progression mechanisms.