Abstract:BackgroundThe chimeric sequences produced by phi29 DNA polymerase, which are named as chimeras, influence the performance of the multiple displacement amplification (MDA) and also increase the difficulty of sequence data process. Despite several articles have reported the existence of chimeric sequence, there was only one research focusing on the structure and generation mechanism of chimeras, and it was merely based on hundreds of chimeras found in the sequence data of E. coli genome.MethodWe finished data mining towards a series of Next Generation Sequencing (NGS) reads which were used for whole genome haplotype assembling in a primary study. We established a bioinformatics pipeline based on subsection alignment strategy to discover all the chimeras inside and achieve their structural visualization. Then, we artificially defined two statistical indexes (the chimeric distance and the overlap length), and their regular abundance distribution helped illustrate of the structural characteristics of the chimeras. Finally we analyzed the relationship between the chimera type and the average insertion size, so that illustrate a method to decrease the proportion of wasted data in the procedure of DNA library construction.Results/Conclusion131.4 Gb pair-end (PE) sequence data was reanalyzed for the chimeras. Totally, 40,259,438 read pairs (6.19%) with chimerism were discovered among 650,430,811 read pairs. The chimeric sequences are consisted of two or more parts which locate inconsecutively but adjacently on the chromosome. The chimeric distance between the locations of adjacent parts on the chromosome followed an approximate bimodal distribution ranging from 0 to over 5,000 nt, whose peak was at about 250 to 300 nt. The overlap length of adjacent parts followed an approximate Poisson distribution and revealed a peak at 6 nt. Moreover, unmapped chimeras, which were classified as the wasted data, could be reduced by properly increasing the length of the insertion segment size through a linear correlation analysis.SignificanceThis study exhibited the profile of the phi29MDA chimeras by tens of millions of chimeric sequences, and helped understand the amplification mechanism of the phi29 DNA polymerase. Our work also illustrated the importance of NGS data reanalysis, not only for the improvement of data utilization efficiency, but also for more potential genomic information.

ChimeraMiner: An Improved Chimeric Read Detection Pipeline and Its Application in Single Cell Sequencing

Exploration of whole genome amplification generated chimeric sequences in long-read sequencing data

Systematic Characteristic Exploration Of The Chimeras Generated In Multiple Displacement Amplification Through Next Generation Sequencing Data Reanalysis

UCHIME improves sensitivity and speed of chimera detection

Hotspot Selective Preference of the Chimeric Sequences Formed in Multiple Displacement Amplification

Streamlined and quantitative detection of chimerism using digital PCR

A Genomic Language Model for Chimera Artifact Detection in Nanopore Direct RNA Sequencing

UCHIME2: improved chimera prediction for amplicon sequencing

Quantification of the effects of chimerism on read mapping, differential expression and annotation following short-read de novo assembly.

A superior strategy for single-cell mutational screening via multiplex-targeted QPCR using the BioMark HD microfluidic platform.

MDAGenera: an Efficient and Accurate Simulator for Multiple Displacement Amplification.

1D-Reactor Decentralized MDA for Uniform and Accurate Whole Genome Amplification.

“Evaluating the Benefits and Limits of Multiple Displacement Amplification with Whole-Genome Oxford Nanopore Sequencing”

DEMINERS enables clinical metagenomics and comparative transcriptomic analysis by increasing throughput and accuracy of nanopore direct RNA sequencing

MDA in Capillary for Whole Genome Amplification

Using Local Alignment to Enhance Single-Cell Bisulfite Sequencing Data Efficiency.

Shortcut barcoding and early pooling for scalable multiplex single-cell reduced-representation CpG methylation sequencing at single nucleotide resolution

Mitosplitter: A Mitochondrial Variants- Based Method for Efficient Demultiplexing of Pooled Single- Cell RNA- Seq

Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications

Enhanced recovery of single-cell RNA-sequencing reads for missing gene expression data

Identification of Multi-landscape and Cell Interactions in the Tumor Microenvironment through High-Coverage Single-Cell Sequencing