Abstract:Tumor microenvironments (TMEs) contain vast amounts of information on patient's cancer through their cellular composition and the spatial distribution of tumor cells and immune cell populations. Exploring variations in TMEs between patient groups, as well as determining the extent to which this information can predict outcomes such as patient survival or treatment success with emerging immunotherapies, is of great interest. Moreover, in the face of a large number of cell interactions to consider, we often wish to identify specific interactions that are useful in making such predictions. We present an approach to achieve these goals based on summarizing spatial relationships in the TME using spatial K functions, and then applying functional data analysis and random forest models to both predict outcomes of interest and identify important spatial relationships. This approach is shown to be effective in simulation experiments at both identifying important spatial interactions while also controlling the false discovery rate. We further used the proposed approach to interrogate two real data sets of Multiplexed Ion Beam Images of TMEs in triple negative breast cancer and lung cancer patients. The methods proposed are publicly available in a companion R package funkycells . Spatial data on the tumor microenvironment (TME) are becoming more prevalent. Existing methods to interrogate such data often have several limitations: (1) they can rely on estimating the spatial relationships among cells by examining simple counts of cells within a single radius, (2) they may not come with ways to evaluate the statistical significance of any findings, or (3) they model individual interactions independently of other interactions. Our approach leverages techniques in spatial statistics and uses a benchmark ensemble machine learning method to address each of these deficiencies; it (1) uses K functions to encode the relative densities of cells over all radii up to a user-selected maximum radius, (2) employs permutation and cross-validation to evaluate the statistical significance of any findings on the spatial interactions in the TME, and (3) models multiple interactions simultaneously. Our approach is freely available with an R implementation called funkycells . In the analysis of two real data sets, we have seen that the method performs well, and gives the expected results. We think this will be a robust tool for researchers looking to interrogate TME data.

*K-means and Cluster Models for Cancer Signatures

Knowledge Based Cluster Ensemble for Cancer Discovery from Biomolecular Data

Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data

Bayesian network-driven clustering analysis with feature selection for high-dimensional multi-modal molecular data

An interpretable multiple kernel learning approach for the discovery of integrative cancer subtypes

Archetypal solution spaces for clustering gene expression datasets in identification of cancer subtypes

A Kernelized Classification Approach for Cancer Recognition Using Markovian Analysis of DNA Structure Patterns as Feature Mining

A Clustering Approach to Integrative Analysis of Multiomic Cancer Data

A robust and sparse K-means clustering algorithm

Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach

Fast Dimension Reduction and Integrative Clustering of Multi-Omics Data Using Low-Rank Approximation: Application to Cancer Molecular Classification

A clustering approach to integrative analyses of multiomic cancer data

Gene selection and cancer classification using Monte Carlo and nonnegative matrix factorization

Clustering cancer gene expression data: a comparative study

Min max kurtosis distance based improved initial centroid selection approach of K-means clustering for big data mining on gene expression data

A new correlation clustering method for cancer mutation analysis

Using random forests to uncover the predictive power of distance-varying cell interactions in tumor microenvironments

Sparse tree-based clustering of microbiome data to characterize microbiome heterogeneity in pancreatic cancer

A Fast Quantum Clustering Approach for Cancer Gene Clustering

Towards multiple kernel principal component analysis for integrative analysis of tumor samples

Early Warning of Financial Risk Based on K-Means Clustering Algorithm