Comparison of the Two Up-to-date Sequencing Technologies for Genome Assembly: HiFi Reads of Pacific Biosciences Sequel II System and Ultralong Reads of Oxford Nanopore

Dandan Lang,Shilai Zhang,Pingping Ren,Fan Liang,Zongyi Sun,Guanliang Meng,Yuntao Tan,Xiaokang Li,Qihua Lai,Lingling Han,Depeng Wang,Fengyi Hu,Wen Wang,Shanlin Liu
DOI: https://doi.org/10.1093/gigascience/giaa123
IF: 7.658
2020-01-01
GigaScience
Abstract:Background The availability of reference genomes has revolutionized the study of biology. Multiple competing technologies have been developed to improve the quality and robustness of genome assemblies during the past decade. The 2 widely used long-read sequencing providers-Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)-have recently updated their platforms: PacBio enables high-throughput HiFi reads with base-level resolution of >99%, and ONT generated reads as long as 2 Mb. We applied the 2 up-to-date platforms to a single rice individual and then compared the 2 assemblies to investigate the advantages and limitations of each. Results The results showed that ONT ultralong reads delivered higher contiguity, producing a total of 18 contigs of which 10 were assembled into a single chromosome compared to 394 contigs and 3 chromosome-level contigs for the PacBio assembly. The ONT ultralong reads also prevented assembly errors caused by long repetitive regions, for which we observed a total of 44 genes of false redundancies and 10 genes of false losses in the PacBio assembly, leading to over- or underestimation of the gene families in those long repetitive regions. We also noted that the PacBio HiFi reads generated assemblies with considerably fewer errors at the level of single nucleotides and small insertions and deletions than those of the ONT assembly, which generated an average 1.06 errors per kb and finally engendered 1,475 incorrect gene annotations via altered or truncated protein predictions. Conclusions It shows that both PacBio HiFi reads and ONT ultralong reads had their own merits. Further genome reference constructions could leverage both techniques to lessen the impact of assembly errors and subsequent annotation mistakes rooted in each.
What problem does this paper attempt to address?