Computational modelling in single-cell cancer genomics: methods and future directions

Allen W Zhang,Kieran R Campbell
DOI: https://doi.org/10.48550/arXiv.2005.01549
2020-05-04
Abstract:Single-cell technologies have revolutionized biomedical research by enabling scalable measurement of the genome, transcriptome, and proteome of multiple systems at single-cell resolution. Now widely applied to cancer models, these assays offer new insights into tumour heterogeneity, which underlies cancer initiation, progression, and relapse. However, the large quantities of high-dimensional, noisy data produced by single-cell assays can complicate data analysis, obscuring biological signals with technical artefacts. In this review article, we outline the major challenges in analyzing single-cell cancer genomics data and survey the current computational tools available to tackle these. We further outline unsolved problems that we consider major opportunities for future methods development to help interpret the vast quantities of data being generated.
Genomics,Applications
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenges in single - cell cancer genomics data analysis. Specifically, with the development of single - cell technologies, researchers are able to measure multiple systems such as the genome, transcriptome and proteome at single - cell resolution, which provides a new perspective for studying tumor heterogeneity. However, the large amount of high - dimensional, noisy data generated by these single - cell experiments brings complexity to data analysis, making biological signals likely to be masked by technical factors. Therefore, this article aims to outline the main challenges in analyzing single - cell cancer genomic data and investigate the current computational tools available to address these challenges. In addition, the article also points out unresolved problems, which are regarded as important opportunities for future method development to help interpret the large amount of data being generated. The article focuses particularly on the following aspects: 1. **Mutation Profiles and Phylogenetic Inference**: Some methods for identifying mutations and clustering cells into clones are introduced, as well as how to detect copy number variations from single - cell DNA sequencing data. 2. **Tumor Microenvironment**: How to use single - cell RNA sequencing data to determine cell - type composition, heterogeneity within tumors, and characteristics of treatment resistance is discussed. 3. **Gene Expression**: The special requirements for analysis elements such as quality control, dimension reduction and clustering of single - cell RNA sequencing data when dealing with cancer samples are explored, as well as methods for linking tumor genotypes to phenotypes (such as gene expression). Overall, the goal of this paper is to provide a reference point for the current state of the single - cell cancer genomics field and encourage discussion of future computational methods in order to realize the potential of this field.