Abstract:Abstract Our knowledge of cancer biology has advanced with the characterization of distinct cell types and cell states within heterogeneous tumor environments. Many methods for imaging and sorting tumor cells require biomarker labels that alter cell characteristics and create a selection bias. The use of label-free single-cell methods should further improve cancer studies involving viable, unperturbed cells for downstream assays such as RNA-Seq, cell culture expansion, and functional assays. To eliminate biomarkers and enable broader assessment of cells, we developed REM-I, a platform that characterizes and sorts unlabeled single cells based on high-dimensional morphology. Cells are captured with brightfield imaging and processed in real-time by self-supervised deep learning models to generate quantitative AI embeddings representative of cell morphology. A significant technical challenge in building REM-I was developing an AI model based on extracted features from cell images without prior knowledge of cell types, cell preparation, or other application-specific knowledge. Accordingly, we developed the Human Foundation Model (HFM), a hybrid architecture that combines self-supervised learning (SSL) and morphometrics (computer vision) to extract 115 dimensional embeddings representing cell morphology from high-resolution REM-I cell images. SSL produces a foundation model with high generalization capacity that enables hypothesis-free sample exploration and efficient generation of application-specific models. Meanwhile, computer vision extracts features that represent measurable and interpretable concepts (e.g., cell size, shape, texture, intensity). The training process for the HFM self-supervised backbone model utilizes the discriminatory power of supervised tasks. Using synthetic cells and three cancer cell lines, we trained and then validated the reproducibility and generalization capabilities of the resulting model. Results show combining deep learning and morphometrics improve interpretability of data and enable rapid characterization and classification of tumor cells with high accuracy. We also report on the Deepcell® Axon data suite, a tool to analyze data and customize reusable workflows. This includes the ability to store and manage data, visualize high dimensional data as low-dimensional projections, and train classifiers to identify and sort cell populations. To enable compatibility with user-preferred downstream analysis pipelines, Axon provides data export options for images, plots, and embeddings. Our approach allows users of all skill levels to access and interpret AI-enabled morphological profiling. Applications of REM-I include hypothesis-free evaluation of heterogeneous tumor samples, label-free cancer cell enrichment, characterization of distinct cell states, and multi-omic integration. Citation Format: Kiran Saini, Senzeyu Zhang, Ryan Carelli, Kevin B. Jacobs, Amy Wong-Thai, Cris Luengo, Vivian Lu, Anastasia Mavropoulos, Andreja Jovic, Jeanette Mei, Thomas Vollbrecht, Stephane C. Boutet, Mahyar Salek, Maddison Masaeli. Self-supervised foundation model captures high-dimensional morphology data from single cell brightfield images [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3523.

Evaluating the role of pre-training dataset size and diversity on single-cell foundation model performance

Large-scale foundation model on single-cell transcriptomics

scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Evaluating the Utilities of Foundation Models in Single-cell Data Analysis

ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

How Good Are We? Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment

The Development of AI Foundation Models for Single-Cell Transcriptomics

CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells

Scaling Dense Representations for Single Cell with Transcriptome-Scale Context

Harnessing the deep learning power of foundation models in single-cell omics

Specialized Foundation Models Struggle to Beat Supervised Baselines

A sandbox for prediction and integration of DNA, RNA, and proteins in single cells

Sctab: Scaling Cross-Tissue Single-Cell Annotation Models

CancerFoundation: A single-cell RNA sequencing foundation model to decipher drug resistance in cancer

Cell-Graph Compass: Modeling Single Cells with Graph Structure Foundation Model

Abstract 3523: Self-supervised foundation model captures high-dimensional morphology data from single cell brightfield images

scPRINT: pre-training on 50 million cells allows robust gene network predictions

scLong: A Billion-Parameter Foundation Model for Capturing Long-Range Gene Context in Single-Cell Transcriptomics

Large-scale pretraining on pathological images for fine-tuning of small pathological benchmarks

The effectiveness of MAE pre-pretraining for billion-scale pretraining