Advanced Whole Genome Sequencing Using a Complete PCR-free Massively Parallel Sequencing (MPS) Workflow

Hanjie Shen,Pengjuan Liu,Zhanqing Li,Fang Chen,Hui Jiang,Shiming Shi,Yang Xi,Qiaoling Li,Xiaojue Wang,Jing Zhao,Xinming Liang,Yinlong Xie,Lin Wang,Wenlan Tian,Tam Berntsen,Yinling Luo,Meihua Gong,Jiguang Li,Chongjun Xu,Sijie Dai,Zilan Mi,Han Ren,Zhe Lin,Ao Chen,Wenwei Zhang,Feng Mu,Xun Xu,Xia Zhao,Yuan Jiang,Radoje Drmanac
DOI: https://doi.org/10.1101/2019.12.20.885517
2019-01-01
Abstract:Systematic errors could be introduced by amplification during MPS library preparation and cluster/array formation. Polymerase Chain Reaction (PCR)-free library preparation methods have previously demonstrated improved sequencing quality with PCR-amplified read-clusters, however we hypothesized that some some InDel errors are still introduced by the remaining PCR step. Here we sequenced PCR-free libraries on MGI‘s PCR-free DNBSEQ arrays to obtain for the first time a true PCR-free WGS (Whole Genome Sequencing). We used MGI’s PCR-free WGS library preparation kits as recommended or with some modifications to make several NA12878 libraries. Reproducibly high quality libraries where obtained with low bias and less than 1% read duplication for both ultrasonic and enzymatic DNA fragmenting.In a triplicate analysis, over 99% SNPs and about 98% indels in each library were found in at least one of the other two libraries. Using machine learning (ML) methods (DeepVariant or DNAscope), variant calling performance (SNPs F-measure>99.94%, InDels F-measure>99.6%) exceeded the widely accepted standards. The F-measure of 15X PCR-free ML-WGS was comparable to or even better than 30X PCR WGS analyzed with GATK. Furthermore, PCR-free WGS libraries sequenced on PCR-free DNBSEQ platform have up to 55% less InDel errors compared to NovaSeq platform confirming that DNA clusters have PCR-generated errors.Enabled by the new PCR-free library kits, super high-thoughput sequencer and ML-based variant calling, DNBSEQ true PCR-free WGS provides a powerful solution to improve accuracy while reducing cost and analysis time to facilitate future precision medicine, cohort studies and large population genome project.
What problem does this paper attempt to address?