IntSIM: an Integrated Simulator of Next-Generation Sequencing Data.

Xiguo Yuan,Junying Zhang,Liying Yang
DOI: https://doi.org/10.1109/tbme.2016.2560939
2017-01-01
Abstract:OBJECTIVE:Next-generation sequencing data has been widely used for DNA variant discovery and tumor study through computational tools. Effective simulation of such data with many realistic features is very necessary for testing existing tools and guiding the development of new tools.METHODS:We present an integrated simulation system, IntSIM, to simulate common DNA variants and to generate sequencing reads for mixture genomes. IntSIM has three novel features in comparison with other simulation programs: 1) it is able to simulate both germline and somatic variants in the same sequence, 2) it deals with tumor purity so as to generate reads corresponding to heterogeneous genomes and also produce tumor-normal matched samples, and 3) it simulates correlations among SNPs, among CNVs/CNAs based on HMM models trained from real sequencing genomes, and can simulates broad and focal CNV/CNA events.RESULTS:The simulation data of IntSIM can reflect characteristics observed from real data and are consistent with input parameters. The IntSIM software package is freely available at http://intsim.sourceforge.net/.CONCLUSION:Based on a great number of experiments, IntSIM performs better than other program for some scenarios, such as simulation of heterozygous SNPs, CNVs/CNAs, and can achieve some functions that other programs cannot achieve.SIGNIFICANCE:Simulation with IntSIM can be expected to evaluate performance of methods in detecting various types of variants, analyzing tumor samples, and especially providing a realistic assessment of effect of tumor purity on identification of somatic mutations.
What problem does this paper attempt to address?