Imputing Single-Cell Protein Abundance in Multiplex Tissue Imaging

Raphael Kirchgaessner,Cameron Watson,Allison L Creason,Kaya Keutler,Jeremy Goecks
DOI: https://doi.org/10.1101/2023.12.05.570058
2024-07-27
Abstract:Multiplex tissue imaging are a collection of increasingly popular single-cell spatial proteomics and transcriptomics assays for characterizing biological tissues both compositionally and spatially. However, several technical issues limit the utility of multiplex tissue imaging, including the limited number of molecules (proteins and RNAs) that can be assayed, tissue loss, and protein probe failure. In this work, we demonstrate how machine learning methods can address these limitations by imputing protein abundance at the single-cell level using multiplex tissue imaging datasets from a breast cancer cohort. We first compared machine learning methods' strengths and weaknesses for imputing single-cell protein abundance. Machine learning methods used in this work include regularized linear regression, gradient-boosted regression trees, and deep learning autoencoders. We also incorporated cellular spatial information to improve imputation performance. Using machine learning, single-cell protein expression can be imputed with mean absolute error ranging between 0.05-0.3 on a [0,1] scale. Finally, we used imputed data to predict whether single cells were more likely to come from pre-treatment or post-treatment biopsies. Our results demonstrate (1) the feasibility of imputing single-cell abundance levels for many proteins using machine learning; (2) how including cellular spatial information can substantially enhance imputation results; and (3) the use of single-cell protein abundance levels in a use case to demonstrate biological relevance.
Cancer Biology
What problem does this paper attempt to address?
This paper attempts to solve several key problems in single - cell protein abundance measurement in Multiplex Tissue Imaging (MTI) technology. Specifically, although the MTI technology can depict the composition and spatial structure of biological tissues in detail, it has the following limitations: 1. **Limited number of measurable molecules**: Only a limited number of proteins and RNAs can be measured in each experiment (usually 10 - 150 proteins or 500 - 2000 RNAs), which limits the comprehensiveness of information. 2. **Technical problems**: Including tissue loss, probe failure, illumination artifacts, and errors in downstream image processing, these problems will reduce data quality. 3. **Data missing**: Due to the above - mentioned technical problems, part of the protein data may be lost or cannot be accurately measured. To overcome these limitations, the author proposes to use machine - learning methods to impute protein abundance at the single - cell level. Through this method, the protein data that could not be measured in the experiment can be compensated to a certain extent, thereby improving the integrity and usability of MTI data. ### Specific research content The author mainly did the following work: 1. **Compare the effects of different machine - learning methods**: The author compared the performance of Regularized Linear Regression, Gradient - Boosted Regression Trees, and Deep Learning Autoencoders in inferring single - cell protein abundance. 2. **Introduce spatial information**: The author found that introducing the spatial information of cells (i.e., the protein abundance of neighboring cells) can significantly improve the accuracy of the inference results. 3. **Application verification**: The author used the inferred data to predict whether a single cell is more likely to come from a biopsy sample before or after treatment, in order to verify the biological significance of the inferred data. ### Main conclusions 1. **Feasibility**: It has been proved that it is feasible to use machine - learning methods to infer single - cell protein abundance, and the Mean Absolute Error (MAE) is between 0.05 and 0.3. 2. **Importance of spatial information**: Introducing the spatial information of cells can significantly improve the accuracy of the inference results. 3. **Biological applications**: It shows the value of the inferred single - cell protein abundance data in actual biological applications, such as distinguishing cell states before and after treatment. Through these works, the author not only improves the quality of MTI data, but also provides new tools and methods for future single - cell spatial proteomics research.