Tnsim: A Tumor Sequencing Data Simulator For Incorporating Clonality Information

Yu Geng,Zhongmeng Zhao,Mingzhe Xu,Xuanping Zhang,Xiao Xiao,Jiayin Wang
DOI: https://doi.org/10.1007/978-3-319-95933-7_45
2018-01-01
Abstract:In recent years, the next generation sequencing enables us to obtain high resolution landscapes of the genetic changes at single-nucleotide level. More and more novel methods are proposed for efficient and effective analyses on cancer sequencing data. To facilitate such development, data simulator is a crucial tool, which not only tests and evaluates proposed approaches, but provides the feedbacks for further improvements as well. Several simulators are released to generate the next generation sequencing data. However, based on our best knowledge, none of them considers clonality information. It is suggested that clonal heterogeneity does widely exist in tumor samples. The patterns of somatic mutational events usually expose a wide spectrum of variant allelic frequencies, while some of them are only detectable in one or multiple clonal lineages. In this article, we introduce a Tumor-Normal sequencing Simulator, TNSim, to generate the next generation sequencing data by involving clonality information. The simulator is able to mimic a tumor sample and the paired normal sample, where the germline variants and somatic mutations can be settled respectively. Tumor purity is adjustable. Clonal architecture is preassigned as one or more clonal lineages, where each lineage consists of a set of somatic mutations whose variant allelic frequencies are similar. A group of experiments are conducted to evaluate its performance. The statistical features of the artificial sequencing reads are comparable to the real tumor sequencing data whose sample consists of multiple sub-clones. The source codes are available at http://github.com/lnmxgy/TNSim and for academic use only.
What problem does this paper attempt to address?