SVSR: A Program to Simulate Structural Variations and Generate Sequencing Reads for Multiple Platforms

Xiguo Yuan,Meihong Gao,Jun Bai,Junbo Duan
DOI: https://doi.org/10.1109/tcbb.2018.2876527
2020-01-01
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Abstract:Structural variation accounts for a major fraction of mutations in the human genome and confers susceptibility to complex diseases. Next generation sequencing along with the rapid development of computational methods provides a cost-effective procedure to detect such variations. Simulation of structural variations and sequencing reads with real characteristics is essential for benchmarking the computational methods. Here, we develop a new program, SVSR, to simulate five types of structural variations (indels, tandem duplication, CNVs, inversions, and translocations) and SNPs for the human genome and to generate sequencing reads with features from popular platforms (Illumina, SOLiD, 454, and Ion Torrent). We adopt a selection model trained from real data to predict copy number states, starting from the first site of a particular genome to the end. Furthermore, we utilize references of microbial genomes to produce insertion fragments and design probabilistic models to imitate inversions and translocations. Moreover, we create platform-specific errors and base quality profiles to generate normal, tumor, or normal-tumor mixture reads. Experimental results show that SVSR could capture more features that are realistic and generate datasets with satisfactory quality scores. SVSR is able to evaluate the performance of structural variation detection methods and guide the development of new computational methods.
What problem does this paper attempt to address?