A Novel High-Accuracy Genome Assembly Method Utilizing a High-Throughput Workflow
Qingdong Zeng,Wenjin Cao,Liping Xing,Guowei Qin,Jianhui Wu,Michael F. Nagle,Qin Xiong,Jinhui Chen,Liming Yang,Prasad Bajaj,Annapurna Chitikineni,Yan Zhou,Yunxin Yu,Jiang Xu,Xiaojun Nie,Lin Huang,Shengjie Liu,Jan Šafář,Hana Šimková,Weining Song,Baozhu Guo,Shilin Chen,Jaroslav Doležel,Zhaodong Hao,Qiang Cheng,Jianguo Liang,Jiansong Tang,Aizhong Cao,Qiang Wang,Xiangqian Lu,Shouping Yang,Hongxiang Ma,Jiajie Liu,Xiaoting Wang,Hong Zhang,Zhonghua Wang,Wanquan Ji,Changfa Wang,Fengping Yuan,Jisen Shi,Rajeev K. Varshney,Zhensheng Kang,Dejun Han,Haibin Xu
DOI: https://doi.org/10.1101/2020.11.26.400507
2020-01-01
bioRxiv
Abstract:Across domains of biological research using genome sequence data, high-quality reference genome sequences are essential for characterizing genetic variation and understanding the genetic basis of phenotypes. However, the construction of genome assemblies for various species is often hampered by complexities of genome organization, especially repetitive and complex sequences, leading to mis-assembly and missing regions. Here, we describe a high-throughput gold standard genome assembly workflow using a large-scale bacterial artificial chromosome (BAC) library with a refined two-step pooling strategy and the Lamp assembler algorithm. This strategy minimizes the laborious processes of physical map construction and clone-by-clone sequencing, enabling inexpensive sequencing of several thousand BAC clones. By applying this strategy with a minimum tiling path BAC clone library for the short arm of chromosome 2D (2DS) of bread wheat, 98% of BAC sequences, covering 92.7% of the 2DS chromosome, were assembled correctly for this species with a highly complex and repetitive genome. We also identified 48 large mis-assemblies in the reference wheat genome assembly (IWGSC RefSeq v1.0) and corrected these large mis-assemblies in addition to filling 92.2% of the gaps in RefSeq v1.0. Our 2DS assembly represents a new benchmark for the assembly of complex genomes with both high accuracy and efficiency.### Competing Interest StatementThe authors have declared no competing interest.