Imputing Single-Cell Protein Abundance in Multiplex Tissue Imaging

Raphael Kirchgaessner,Cameron Watson,Allison L Creason,Kaya Keutler,Jeremy Goecks

DOI: https://doi.org/10.1101/2023.12.05.570058

2024-07-27

Abstract:Multiplex tissue imaging are a collection of increasingly popular single-cell spatial proteomics and transcriptomics assays for characterizing biological tissues both compositionally and spatially. However, several technical issues limit the utility of multiplex tissue imaging, including the limited number of molecules (proteins and RNAs) that can be assayed, tissue loss, and protein probe failure. In this work, we demonstrate how machine learning methods can address these limitations by imputing protein abundance at the single-cell level using multiplex tissue imaging datasets from a breast cancer cohort. We first compared machine learning methods' strengths and weaknesses for imputing single-cell protein abundance. Machine learning methods used in this work include regularized linear regression, gradient-boosted regression trees, and deep learning autoencoders. We also incorporated cellular spatial information to improve imputation performance. Using machine learning, single-cell protein expression can be imputed with mean absolute error ranging between 0.05-0.3 on a [0,1] scale. Finally, we used imputed data to predict whether single cells were more likely to come from pre-treatment or post-treatment biopsies. Our results demonstrate (1) the feasibility of imputing single-cell abundance levels for many proteins using machine learning; (2) how including cellular spatial information can substantially enhance imputation results; and (3) the use of single-cell protein abundance levels in a use case to demonstrate biological relevance.

Cancer Biology

What problem does this paper attempt to address?

This paper attempts to solve several key problems in single - cell protein abundance measurement in Multiplex Tissue Imaging (MTI) technology. Specifically, although the MTI technology can depict the composition and spatial structure of biological tissues in detail, it has the following limitations: 1. **Limited number of measurable molecules**: Only a limited number of proteins and RNAs can be measured in each experiment (usually 10 - 150 proteins or 500 - 2000 RNAs), which limits the comprehensiveness of information. 2. **Technical problems**: Including tissue loss, probe failure, illumination artifacts, and errors in downstream image processing, these problems will reduce data quality. 3. **Data missing**: Due to the above - mentioned technical problems, part of the protein data may be lost or cannot be accurately measured. To overcome these limitations, the author proposes to use machine - learning methods to impute protein abundance at the single - cell level. Through this method, the protein data that could not be measured in the experiment can be compensated to a certain extent, thereby improving the integrity and usability of MTI data. ### Specific research content The author mainly did the following work: 1. **Compare the effects of different machine - learning methods**: The author compared the performance of Regularized Linear Regression, Gradient - Boosted Regression Trees, and Deep Learning Autoencoders in inferring single - cell protein abundance. 2. **Introduce spatial information**: The author found that introducing the spatial information of cells (i.e., the protein abundance of neighboring cells) can significantly improve the accuracy of the inference results. 3. **Application verification**: The author used the inferred data to predict whether a single cell is more likely to come from a biopsy sample before or after treatment, in order to verify the biological significance of the inferred data. ### Main conclusions 1. **Feasibility**: It has been proved that it is feasible to use machine - learning methods to infer single - cell protein abundance, and the Mean Absolute Error (MAE) is between 0.05 and 0.3. 2. **Importance of spatial information**: Introducing the spatial information of cells can significantly improve the accuracy of the inference results. 3. **Biological applications**: It shows the value of the inferred single - cell protein abundance data in actual biological applications, such as distinguishing cell states before and after treatment. Through these works, the author not only improves the quality of MTI data, but also provides new tools and methods for future single - cell spatial proteomics research.

Imputing Single-Cell Protein Abundance in Multiplex Tissue Imaging

Learning Consistent Subcellular Landmarks to Quantify Changes in Multiplexed Protein Maps.

Machine Learning-Enhanced Estimation of Cellular Protein Levels from Bright-Field Images

Expanding the coverage of spatial proteomics: a machine learning approach

Multiplex in Situ Tagging Technology for Highly Multiplexed Single-Cell Analysis

RRScell method for automated single-cell profiling of multiplexed immunofluorescence cancer tissue

Spatial mapping of protein composition and tissue organization: a primer for multiplexed antibody-based imaging

Multiplexed Barcoding Image Analysis for Immunoprofiling and Spatial Mapping Characterization in the Single-Cell Analysis of Paraffin Tissue Samples

MAPS: pathologist-level cell type annotation from tissue images through machine learning

Multi-omics Prediction from High-content Cellular Imaging with Deep Learning

MIML: Multiplex Image Machine Learning for High Precision Cell Classification via Mechanical Traits within Microfluidic Systems

Automated assignment of cell identity from single-cell multiplexed imaging and proteomic data

Statistical analysis of multiple regions-of-interest in multiplexed spatial proteomics data

High-plex immunofluorescence imaging and traditional histology of the same tissue section for discovering image-based biomarkers

Multiplex protein imaging in tumour biology

Abstract PO2-25-05: Spatial multiplexing of protein biomarkers for immune cell profiling of the tumor microenvironment with ChipCytometry

Learning from heterogeneous data sources: an application in spatial proteomics

A highly adaptable protocol for mapping spatial features of cellular aggregates in tissues

Learning tissue representation by identification of persistent local patterns in spatial omics data

Deep-learning and transfer learning identify new breast cancer survival subtypes from single-cell imaging data

Abstract 2059: Machine learning integration of transcriptome-wide spatial sequencing data and ultra-high plex spatial proteomic data enables the prioritization of cancer drug targets