Automated cell type annotation and exploration of single cell signalling dynamics using mass cytometry

Dimitrios Kleftogiannnis,Sonia Gavasso,Benedicte Sjo Tislevoll,Nisha van der Meer,Inga K. F. Motzfeldt,Monica Hellesøy,Stein-Erik Gullaksen,Emmanuel Griessinger,Oda Fagerholt,Andrea Lenartova,Yngvar Fløisand,Bjørn Tore Gjertsen,Inge Jonassen
DOI: https://doi.org/10.1101/2022.08.13.503587
2024-04-17
Abstract:Mass cytometry by time-of-flight (CyTOF) is an emerging technology allowing for in-depth characterisation of cellular heterogeneity in cancer and other diseases. However, computational identification of cell populations from CyTOF, and utilisation of single cell data for biomarker discoveries faces several technical limitations, and although some computational approaches are available, high-dimensional analyses of single cell data remains quite demanding. Here, we deploy a bioinformatics framework that tackles two fundamental problems in CyTOF analyses namely: a) automated annotation of cell populations guided by a reference dataset, and b) systematic utilisation of single cell data for more effective patient stratification. By applying this framework on several publicly available datasets, we demonstrate that the Scaffold approach achieves good tradeoff between sensitivity and specificity for automated cell type annotation. Additionally, a case study focusing on a cohort of 43 leukemia patients, reported salient interactions between signalling proteins that are sufficient to predict short-term survival at time of diagnosis using the XGBoost algorithm. Our work introduces an automated and versatile analysis framework for CyTOF data with many applications in future precision medicine projects. Datasets and codes are publicly available at:
Cancer Biology
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are using cytometry by time - of - flight mass spectrometry (CyTOF) data for automated cell - type annotation and exploring single - cell signal dynamics. Specifically, the paper focuses on two core issues: 1. **Automated cell - type annotation**: Through an automated method guided by reference datasets, accurately identify and annotate cell types. This addresses the limitations and subjectivity of manual gating in high - dimensional single - cell data analysis. 2. **Systematically using single - cell data for patient stratification**: By analyzing the dynamics of single - cell signal proteins and combining with machine - learning algorithms, predict the survival time of patients. In particular, the paper shows how to use the XGBoost algorithm combined with the DREMI score to distinguish leukemia patients with short - term and long - term survival. ### Main research contents - **Methods**: - Developed a bioinformatics framework based on Scaffold for automated cell - type annotation. - Used the DREMI (Density Resampled Estimate of Mutual Information) method to extract features for training machine - learning models. - Evaluated the performance of different machine - learning algorithms in classification tasks, especially the performance of the XGBoost algorithm in predicting patient survival time. - **Experimental design**: - Used multiple publicly available CyTOF datasets for validation, including AML_benchmark, BMMC_benchmark and PANORAMA_benchmark. - Conducted a detailed analysis of CyTOF data from 43 leukemia patients and combined clinical and genetic information for patient stratification. - **Results**: - The Scaffold method performs excellently in cell - type annotation, especially its performance on independent datasets is better than other methods. - Using the DREMI score as a feature, combined with the XGBoost algorithm, can effectively predict the survival time of patients, especially when dealing with the class - imbalance problem. - Through feature - importance analysis, key signal - protein interactions were identified, and these interactions are significantly related to the survival time of patients. ### Conclusion This paper proposes a new bioinformatics framework that can make significant progress in automated cell - type annotation and using single - cell data for patient stratification. This framework not only improves the accuracy of cell - type annotation but also provides a powerful tool for precision medicine, especially in the survival prediction of leukemia patients.