Abstract:Introduction Flow cytometry is widely used in clinical and research laboratories for diagnostics, biomarker discovery, and immune system monitoring. Flow cytometry data processing still uses gating- and clustering-based approaches that are highly time-consuming and subjective. Data processing time increases with panel size and number of detected populations, posing challenges to the search for new biomarkers. Low reproducibility and method limitations have thus far hindered efforts to automate and standardize flow cytometry data processing; hence, these efforts have not yielded any significant advancements in data processing methods. Here we present a new ML-based algorithm for automated cell-type labeling. Our supervised ML approach allows us to classify every event in a flow cytometry data file solely based on the presence and absence of markers, without the need for prior knowledge or assumption about cell population content in the sample. This approach enables the detection of rare and/or new cell populations with a high average quality metric (f1-score). The rapid and high-quality analysis our algorithm can perform renders it applicable in clinical settings, particularly for detecting hematological abnormalities and cancers. Methods We processed 500 blood samples from a cohort of healthy donors and patients with various cancer diagnoses using 10 different 18-channel multicolor flow cytometry panels. We then used data from either the entire or a portion of these 500 samples in a 3:1 split for training:test datasets to train and test our algorithm on each cytometry panel. To do this, we manually matched cells with certain cellular phenotypes to create 10 high-quality training sets for supervised learning and 10 test datasets, one pair for each of the 10 panels. To train the cell type classifier, we set up a two-level boosting-based model. The first-level model filters out outliers, including dead cells, cellular debris, beads, and other undefined particles, in order to hone in on the target population. The second-level model for predicting cell types within a target population is defined by two approaches. The population-based approach detects major subpopulation types in a target population and predicts the precise population labels. This approach is useful for labeling a small number of previously known or predicted subpopulations. The marker-based approach is useful for target populations with large numbers of subpopulations, such as T cells harboring different combinations of cell-surface receptors. It predicts the presence or absence of specific markers on each cell to assign its phenotype. It also allows us to construct complex hierarchies in order to detect new populations that are challenging to identify manually. Figure 1 outlines our workflow. Results We validated our final set of 10 trained models on our test dataset. The summarized number of detected cell populations in the test dataset was 221, which corresponds to the number of unique cell types predicted by our models. Table 1 shows the evaluation metrics for our algorithm for populations with > 0.1% whole blood cells (WBCs).The average quality metric (f1-score) for all antibody panels used is 0.86. This value is the mean of all f1-scores calculated for all cell populations identified by our algorithm. Mean f1-score is the highest (0.96) for large populations, lower (0.87) for mid-sized populations, and lowest but acceptable (0.77) for small populations. Mean quality score for the marker-based models is also high (0.96). Compared to manual evaluation that took approximately 1 hour to analyze one data file, the algorithm completed analysis within 10 seconds. Conclusion Our new algorithm automates cell labeling and produces high-quality outputs that are comparable to manual processing, but with a much shorter turnaround time (TAT) and without the need for prior knowledge or expert competence from the user. Importantly, it allows us to effectively and accurately filter out outliers, identify the target population, and divide this target population into multiple cell subtypes including new and rare cell subpopulations, all without a priori assumptions about cell population content in the sample. Given its ability to perform high-quality cell population analysis and its short TAT, our algorithm provides rapid, unbiased, and precise cell typing that will have utility for the diagnosis of heme malignancies and immunoprofiling.

Multi-angle pulse shape detection of scattered light in flow cytometry for label-free cell cycle classification

A Novel Three-parameter Flow Cytometric Analysis for Cell Cycle

Cell sorting based on pulse shapes from angle resolved detection of scattered light

Label-free cell cycle analysis for high-throughput imaging flow cytometry

Pytometry: Flow and Mass Cytometry Analytics in Python

An Automatic Method for Robust and Fast Cell Detection in Bright Field Images from High-Throughput Microscopy.

Ultra High-Throughput Multiparametric Imaging Flow Cytometry: Towards Diffraction-Limited Sub-Cellular Detection

An Algorithmic Pipeline for Analyzing Multi-parametric Flow Cytometry Data

Stimulated Raman scattering flow cytometry for label-free single-particle analysis

A guide to automated apoptosis detection: How to make sense of imaging flow cytometry data

High-dimensional multi-pass flow cytometry via spectrally encoded cellular barcoding

Machine Learning for Flow Cytometry Data Analysis

Analysis of cell cycle stage, replicated DNA, and chromatin-associated proteins using high-throughput flow cytometry

Overcoming fixation and permeabilization challenges in flow cytometry by optical barcoding and multi‐pass acquisition

Machine Learning (ML)-Enabled Automation for High-Throughput Data Processing in Flow Cytometry

From big flow cytometry datasets to smart diagnostic strategies: The EuroFlow approach

Development of microfluidic flow cytometry capable of characterization of single-cell intrinsic structural and electrical parameters

An optimized multiplex flow cytometry protocol for the analysis of intracellular signaling in peripheral blood mononuclear cells

A high-throughput all-optical laser-scanning imaging flow cytometer with biomolecular specificity and subcellular resolution

A Scalable Pipeline for High-Throughput Flow Cytometry

Abstract 2068: Multi-pass flow cytometry for high-marker panels with minimal spillover spread