GENOMICON-Seq: A comprehensive tool for the simulation of mutations in amplicon and whole exome sequencing

Milan S Stosic,Jean-Marc Costanzi,Ole Herman Ambur,Trine B Rounge
DOI: https://doi.org/10.1101/2024.08.14.607907
2024-08-17
Abstract:GENOMICON-Seq is a comprehensive genomic sequencing simulation tool that enables the assessment of laboratory and bioinformatics parameters influencing the detection of mutations. The tool generates genomes with mutations, mimicking processes such as low-frequency mutations, APOBEC3 activity in viruses, somatic mutations and single base substitution (SBS) mutational signals. GENOMICON-Seq adds amplicon and whole exome sequencing biases and errors. It outputs sequencing reads compatible with mutation detection tools and a report on mutation origin (generated mutations and PCR errors), nucleotide context, and position. GENOMICON-Seq aids in the evaluation of bioinformatics tools and experimental designs, reducing the need for costly real-world sequencing experiments.
Bioinformatics
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the challenges of low - frequency mutation detection. Specifically, by developing a comprehensive genome sequencing simulation tool named GENOMICON - Seq, the paper aims to evaluate the impact of laboratory and bioinformatics parameters on mutation detection. The following are the specific problems that this paper attempts to solve: 1. **Challenges of low - frequency mutation detection**: - **Low - frequency mutations in viruses**: In viruses, low - frequency mutations (intrahost single nucleotide variants, iSNVs) reveal genetic diversity, and this information is crucial for understanding virus evolution and pathogenicity. - **Low - frequency somatic mutations in humans**: In humans, the accumulation of low - frequency somatic mutations is a major cause of cancer development. The frequency of somatic mutations also varies in different types of cancers. 2. **Errors introduced by technology**: - **PCR and sequencing errors**: When using amplicon sequencing, potential technical mutations are introduced during PCR and sequencing, which makes it complicated to distinguish between real mutations and technical errors. - **Errors in whole - exome sequencing (WES)**: In WES, probes capture and enrich exon regions, and sequencing errors become the main source of bias in detecting somatic mutations. 3. **Evaluation of mutation detection strategies**: - **Subjectivity of detection criteria**: Currently, the criteria used to distinguish between real mutations and technical errors are often set subjectively rather than based on empirical data. - **Lack of standardized methods**: Although there have been many efforts to improve detection accuracy, there is still a lack of standardized and validated methods. 4. **Importance of simulated data**: - **Systematic evaluation of reliability**: Simulated data sets are of great value in systematically evaluating the reliability of detection strategies and can help researchers better understand the impact of different parameters on mutation detection. GENOMICON - Seq generates FASTQ files similar to real - sequencing data by simulating the mutation generation process, PCR and probe - capture biases, and sequencing errors, thus helping researchers evaluate different mutation detection strategies, optimize experimental design, and reduce the cost of actual sequencing experiments.