Design of PCI 2.2 Target Controller to Support Prefetch Request

E. Hyun,K. Han,K. Seong

ICCD

Abstract:

What problem does this paper attempt to address?

Comprehensive assessment of 11 de novo HiFi assemblers on complex eukaryotic genomes and metagenomes

Wenjuan Yu,Haohui Luo,Jinbao Yang,Shengchen Zhang,Heling Jiang,Xianjia Zhao,Xingqi Hui,Da Sun,Liang Li,Xiu-Qing Wei,Stefano Lonardi,Weihua Pan,Xiu-qing Wei

DOI: https://doi.org/10.1101/gr.278232.123

IF: 9.438

2024-03-22

Genome Research

Abstract:Pacific Biosciences (PacBio) HiFi sequencing technology generates long reads (>10 kbp) with very high accuracy (<0.01% sequencing error). Although several de novo assembly tools are available for HiFi reads, there are no comprehensive studies on the evaluation of these assemblers. We evaluated the performance of 11 de novo HiFi assemblers on (1) real data for three eukaryotic genomes; (2) 34 synthetic data sets with different ploidy, sequencing coverage levels, heterozygosity rates, and sequencing error rates; (3) one real metagenomic data set; and (4) five synthetic metagenomic data sets with different composition abundance and heterozygosity rates. The 11 assemblers were evaluated using quality assessment tool (QUAST) and benchmarking universal single-copy ortholog (BUSCO). We also used several additional criteria, namely, completion rate, single-copy completion rate, duplicated completion rate, average proportion of largest category, average distance difference, quality value, run-time, and memory utilization. Results show that hifiasm and hifiasm-meta should be the first choice for assembling eukaryotic genomes and metagenomes with HiFi data. We performed a comprehensive benchmarking study of commonly used assemblers on complex eukaryotic genomes and metagenomes. Our study will help the research community to choose the most appropriate assembler for their data and identify possible improvements in assembly algorithms.

genetics & heredity,biochemistry & molecular biology,biotechnology & applied microbiology
Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly

Marios Gavrielatos,Konstantinos Kyriakidis,Demetrios Spandidos,Ioannis Michalopoulos

DOI: https://doi.org/10.3892/mmr.2021.11890

IF: 3.423

2021-02-02

Molecular Medicine Reports

Abstract:Genome assemblers are computational tools for <i>de novo</i> genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their contiguity and the occurrences of misassemblies (duplications, deletions, translocations or inversions). The rapid development of sequencing technologies has enabled the rise of novel <i>de novo</i> genome assembly strategies. The ultimate goal of such strategies is to utilise the features of each sequencing platform in order to address the existing weaknesses of each sequencing type and compose a complete and correct genome map. In the present study, the hybrid strategy, which is based on Illumina short paired‑end reads and Nanopore long reads, was benchmarked using MaSuRCA and Wengan assemblers. Moreover, the long‑read assembly strategy, which is based on Nanopore reads, was benchmarked using Canu or PacBio HiFi reads were benchmarked using Hifiasm and HiCanu. The assemblies were performed on a computational cluster with limited computational resources. Their outputs were evaluated in terms of accuracy and computational performance. PacBio HiFi assembly strategy outperforms the other ones, while Hi‑C scaffolding, which is based on chromatin 3D structure, is required in order to increase continuity, accuracy and completeness when large and complex genomes, such as the human one, are assembled. The use of Hi‑C data is also necessary while using the hybrid assembly strategy. The results revealed that HiFi sequencing enabled the rise of novel algorithms which require less genome coverage than that of the other strategies making the assembly a less computationally demanding task. Taken together, these developments may lead to the democratisation of genome assembly projects which are now approachable by smaller labs with limited technical and financial resources.

oncology,medicine, research & experimental
Benchmarking of Long-Read Sequencing, Assemblers and Polishers for Yeast Genome

Xue Zhang,Chen-Guang Liu,Shi-Hui Yang,Xia Wang,Feng-Wu Bai,Zhuo Wang

DOI: https://doi.org/10.1093/bib/bbac146

IF: 9.5

2022-01-01

Briefings in Bioinformatics

Abstract:BACKGROUND:The long reads of the third-generation sequencing significantly benefit the quality of the de novo genome assembly. However, its relatively high single-base error rate has been criticized. Currently, sequencing accuracy and throughput continue to improve, and many advanced tools are constantly emerging. PacBio HiFi sequencing and Oxford Nanopore Technologies (ONT) PromethION are two up-to-date platforms with low error rates and ultralong high-throughput reads. Therefore, it is urgently needed to select the appropriate sequencing platforms, depths and genome assembly tools for high-quality genomes in the era of explosive data production.METHODS:We performed 455 (7 assemblers with 4 polishing pipelines or without polishing on 13 subsets with different depths) and 88 (4 assemblers with or without polishing on 11 subsets with different depths) de novo assemblies of Yeast S288C on high-coverage ONT and HiFi datasets, respectively. The assembly quality was evaluated by Quality Assessment Tool (QUAST), Benchmarking Universal Single-Copy Orthologs (BUSCO) and the newly proposed Comprehensive_score (C_score). In addition, we applied four preferable pipelines to assemble the genome of nonreference yeast strains.RESULTS:The assembler plays an essential role in genome construction, especially for low-depth datasets. For ONT datasets, Flye is superior to other tools through C_score evaluation. Polishing by Pilon and Medaka improve accuracy and continuity of the preassemblies, respectively, and their combination pipeline worked well in most quality metrics. For HiFi datasets, Flye and NextDenovo performed better than other tools, and polishing is also necessary. Enough data depth is required for high-quality genome construction by ONT (>80X) and HiFi (>20X) datasets.
Benchmarking of bioinformatics tools for the hybrid de novo assembly of human whole-genome sequencing data

Adrian Munoz-Barrera,Luis A. Rubio-Rodriguez,David Jaspez,Almudena Corrales,Itahisa Marcelino-Rodriguez,Jose M. Lorenzo-Salazar,Rafaela Gonzalez-Montelongo,Carlos Flores

DOI: https://doi.org/10.1101/2024.05.28.595812

2024-05-29

Abstract:Accurate and complete de novo assembled genomes sustain variant identification and catalyze the discovery of new genomic features and biological functions. However, accurate and precise de novo assemblies of large and complex genomes remains a challenging task. Long-read sequencing data alone or in hybrid mode combined with more accurate short-read sequences facilitate the de novo assembly of genomes. A number of software exists for de novo genome assembly from long-read data although specific performance comparisons to assembly human genomes are lacking. Here we benchmarked 11 different pipelines including four long-read only assemblers and three hybrid assemblers, combined with four polishing schemes for de novo genome assembly of a human reference material sequenced with Oxford Nanopore Technologies and Illumina. In addition, the best performing choice was validated in a non-reference routine laboratory sample. Software performance was evaluated by assessing the quality of the assemblies with QUAST, BUSCO, and Merqury metrics, and the computational costs associated with each of the pipelines were also assessed. We found that Flye was superior to all other assemblers, especially when relying on Ratatosk error-corrected long-reads. Polishing improved the accuracy and continuity of the assemblies and the combination of two rounds of Racon and Pilon achieved the best results. The assembly of the non-reference sample showed comparable assembly metrics as those of the reference material. Based on the results, a complete optimal analysis pipeline for the assembly, polishing, and contig curation developed on Nextflow is provided to enable efficient parallelization and built-in dependency management to further advance in the generation of high-quality and chromosome-level human assemblies.

Bioinformatics
Benchmarking multi-platform sequencing technologies for human genome assembly

Jingjing Wang,Werner Pieter Veldsman,Xiaodong Fang,Yufen Huang,Xuefeng Xie,Aiping Lyu,Lu Zhang

DOI: https://doi.org/10.1093/bib/bbad300

IF: 9.5

2023-08-19

Briefings in Bioinformatics

Abstract:Genome assembly is a computational technique that involves piecing together deoxyribonucleic acid (DNA) fragments generated by sequencing technologies to create a comprehensive and precise representation of the entire genome. Generating a high-quality human reference genome is a crucial prerequisite for comprehending human biology, and it is also vital for downstream genomic variation analysis. Many efforts have been made over the past few decades to create a complete and gapless reference genome for humans by using a diverse range of advanced sequencing technologies. Several available tools are aimed at enhancing the quality of haploid and diploid human genome assemblies, which include contig assembly, polishing of contig errors, scaffolding and variant phasing. Selecting the appropriate tools and technologies remains a daunting task despite several studies have investigated the pros and cons of different assembly strategies. The goal of this paper was to benchmark various strategies for human genome assembly by combining sequencing technologies and tools on two publicly available samples (NA12878 and NA24385) from Genome in a Bottle. We then compared their performances in terms of continuity, accuracy, completeness, variant calling and phasing. We observed that PacBio HiFi long-reads are the optimal choice for generating an assembly with low base errors. On the other hand, we were able to produce the most continuous contigs with Oxford Nanopore long-reads, but they may require further polishing to improve on quality. We recommend using short-reads rather than long-reads themselves to improve the base accuracy of contigs from Oxford Nanopore long-reads. Hi-C is the best choice for chromosome-level scaffolding because it can capture the longest-range DNA connectedness compared to 10× linked-reads and Bionano optical maps. However, a combination of multiple technologies can be used to further improve the quality and completeness of genome assembly. For diploid assembly, hifiasm is the best tool for human diploid genome assembly using PacBio HiFi and Hi-C data. Looking to the future, we expect that further advancements in human diploid assemblers will leverage the power of PacBio HiFi reads and other technologies with long-range DNA connectedness to enable the generation of high-quality, chromosome-level and haplotype-resolved human genome assemblies.

biochemical research methods,mathematical & computational biology
Systematic Comparison of the Performances of De Novo Genome Assemblers for Oxford Nanopore Technology Reads From Piroplasm

Jinming Wang,Kai Chen,Qiaoyun Ren,Ying Zhang,Junlong Liu,Guangying Wang,Aihong Liu,Youquan Li,Guangyuan Liu,Jianxun Luo,Wei Miao,Jie Xiong,Hong Yin,Guiquan Guan

DOI: https://doi.org/10.3389/fcimb.2021.696669

IF: 6.073

2021-08-16

Frontiers in Cellular and Infection Microbiology

Abstract:Background Emerging long reads sequencing technology has greatly changed the landscape of whole-genome sequencing, enabling scientists to contribute to decoding the genetic information of non-model species. The sequences generated by PacBio or Oxford Nanopore Technology (ONT) be assembled de novo before further analyses. Some genome de novo assemblers have been developed to assemble long reads generated by ONT. The performance of these assemblers has not been completely investigated. However, genome assembly is still a challenging task. Methods and Results We systematically evaluated the performance of nine de novo assemblers for ONT on different coverage depth datasets. Several metrics were measured to determine the performance of these tools, including N50 length, sequence coverage, runtime, easy operation, accuracy of genome and genomic completeness in varying depths of coverage. Based on the results of our assessments, the performances of these tools are summarized as follows: 1) Coverage depth has a significant effect on genome quality; 2) The level of contiguity of the assembled genome varies dramatically among different de novo tools; 3) The correctness of an assembled genome is closely related to the completeness of the genome. More than 30× nanopore data can be assembled into a relatively complete genome, the quality of which is highly dependent on the polishing using next generation sequencing data. Conclusion Considering the results of our investigation, the advantage and disadvantage of each tool are summarized and guidelines of selecting assembly tools are provided under specific conditions.

immunology,microbiology
Antimalarial and antiplasmodial activities of norneolignans. Syntheses and SAR.

D. M. Skytte,S. F. Nielsen,Ming Chen,L. Zhai,C. Olsen,S. Christensen

DOI: https://doi.org/10.1021/JM0508235

IF: 8.039

2006-01-12

Journal of Medicinal Chemistry

Abstract:A systematic change of the substituents and side chain of the norneolignan hinokiresinol afforded a 10 fold improvement of the IC(50) value toward inhibition of the growth of Plasmodium falciparum. The more potent compounds controlled the parasitemia in mice infected with Plasmodium berghei.
Assessment of Metagenomic Assemblers Based on Hybrid Reads of Real and Simulated Metagenomic Sequences

Ziye Wang,Ying Wang,Jed A Fuhrman,Fengzhu Sun,Shanfeng Zhu

DOI: https://doi.org/10.1093/bib/bbz025

IF: 9.5

2020-01-01

Briefings in Bioinformatics

Abstract:In metagenomic studies of microbial communities, the short reads come from mixtures of genomes. Read assembly is usually an essential first step for the follow-up studies in metagenomic research. Understanding the power and limitations of various read assembly programs in practice is important for researchers to choose which programs to use in their investigations. Many studies evaluating different assembly programs used either simulated metagenomes or real metagenomes with unknown genome compositions. However, the simulated datasets may not reflect the real complexities of metagenomic samples and the estimated assembly accuracy could be misleading due to the unknown genomes in real metagenomes. Therefore, hybrid strategies are required to evaluate the various read assemblers for metagenomic studies. In this paper, we benchmark the metagenomic read assemblers by mixing reads from real metagenomic datasets with reads from known genomes and evaluating the integrity, contiguity and accuracy of the assembly using the reads from the known genomes. We selected four advanced metagenome assemblers, MEGAHIT, MetaSPAdes, IDBA-UD and Faucet, for evaluation. We showed the strengths and weaknesses of these assemblers in terms of integrity, contiguity and accuracy for different variables, including the genetic difference of the real genomes with the genome sequences in the real metagenomic datasets and the sequencing depth of the simulated datasets. Overall, MetaSPAdes performs best in terms of integrity and continuity at the species-level, followed by MEGAHIT. Faucet performs best in terms of accuracy at the cost of worst integrity and continuity, especially at low sequencing depth. MEGAHIT has the highest genome fractions at the strain-level and MetaSPAdes has the overall best performance at the strain-level. MEGAHIT is the most efficient in our experiments. Availability: The source code is available at https://github.com/ziyewang/MetaAssemblyEval.
Assembly Arena: Benchmarking RNA isoform reconstruction algorithms for nanopore sequencing

Mélanie Sagniez,Anshul Budhraja,Bastien Paré,Shawn M. Simpson,Clément Vinet-Ouellette,Marieke Rozendaal,Martin A. Smith

DOI: https://doi.org/10.1101/2024.03.21.586080

2024-03-23

Abstract:Resolving the transcriptomes of higher eukaryotes is more tangible with the advent of long read sequencing, which greatly facilitates the identification of new transcripts and their splicing isoforms. However, the computational analysis of long read RNA sequencing data remains challenging as it is difficult to disentangle technical artifacts from biological information. To address this, we evaluated the performance of multiple leading transcriptome assembly algorithms on their ability to accurately reconstruct RNA transcript isoforms. We specifically focused on deep nanopore sequencing of synthetic RNA spike-in controls (Sequins™ and SIRVs) across different chemistries, including cDNA and direct RNA protocols. Our systematic comparative benchmarking exposes the strengths and limitations of the different surveyed strategies. We also highlight conceptual and technical challenges with the annotation of transcriptomes and the formalization of assembly quality metrics. Our results complement similar recent endeavors, helping forge a path towards a gold standard analytical pipeline for long read transcriptome assembly.

Bioinformatics
Technical report on best practices for hybrid and long read de novo assembly of bacterial genomes utilizing Illumina and Oxford Nanopore Technologies reads

Kay Nieselt,Simon Tim Hackl,Theresa Anisja Harbig

DOI: https://doi.org/10.1101/2022.10.25.513682

2022-10-27

bioRxiv

Abstract:The emergence of commercial long read sequencing technologies in the 2010s and the concomitant development of new bioinformatics tools bears the potential of de novo genome assemblies of unprecedented contiguity and quality. However, until today these novel technologies suffer from high rates of sequencing errors. These may be overcome by using long and short reads in combination, in so called hybrid approaches, or by increasing the throughput and thereby the coverage of sequencing runs. In particular the latter will thereby increase the cost of the assembly inevitably. Herein, to-date long read and hybrid assemblers were tested on real whole genome sequencing Illumina and Oxford Nanopore Technologies read data sets and sub samples of these in order to elaborate a best practice for de novo assembly. The findings suggest that although long reads alone can be used to reconstruct complete and contiguous genomes, in particular the single-nucleotide and indel error rate remains high compared to hybrid approaches and that this can impact downstream applications such as variation discovery and gene prediction negatively.
Cerebral venous drainage pattern of the Sturge-Weber syndrome.

J. Bentson,G. Wilson,T. Newton

DOI: https://doi.org/10.1097/00004424-197005000-00031

IF: 19.7

1970-05-01

Radiology

Abstract:Carotid angiographies of 11 patients with Sturge-Weber syndrome revealed cerebral venous abnormalities in each. An abnormal cerebral venous drainage pattern was found, consisting of lack of superficial cortical veins and associated nonfilling of the superior sagittal sinus, enlargement and tortuosity of the deep subependymal and deep medullary veins, and occasionally bizarre courses of cerebral veins. The basis of the pattern appears to be nonfunction or absence of cortical veins beneath the Sturge-Weber leptomeningeal angiomatosis, with collateral flow centrally to the subependymal veins.
Pre-Assembly NGS Correction of ONT Reads Achieves HiFi-Level Assembly Quality

Evgeniy Mozheiko,Heng Yi,Anzhi Lu,Heitung Kong,Yong Hou,Yan Zhou,Hui Gao

DOI: https://doi.org/10.1101/2024.07.12.603260

2024-07-13

Abstract:Recently developed hybrid assemblies can achieve Telomere-to-Telomere (T2T) completeness of some chromosomes. However, such approaches involve sequencing a large volume of both Pacific Biosciences high-fidelity (HiFi) and Oxford Nanopore Technologies (ONT) sequencing reads. Along with this, third-generation sequencing techniques are rapidly advancing, increasing the available length and accuracy. To reduce the final cost of genome assembly, here we investigated the possibility of assembly from low-coverage samples and with only ONT corrected by Next-Generation Sequencing (NGS) sequencing reads. We demonstrated that ONT-based assembly approaches corrected by NGS can achieve performance metrics comparable to more expensive hybrid approaches based on HiFi sequencing. We investigated the assembly of different chromosomes and the low-coverage performance of state-of-the-art hybrid assembly tools, including Verkko and Hifiasm, as well as ONT-based assemblers such as Shasta and Flye. We rigorously evaluated the performance of MGI, Illumina, and stLFR NGS technologies across various aspects of hybrid genome assembly, including pre-assembly correction, haplotype phasing, and polishing, and found them to be similarly effective. Additionally, we proposed two-round assembly methods that utilize stLFR linked-read data to achieve assembly phasing performance comparable to that obtained with trio data.

Genomics
Benchmarking of Hi-C tools for scaffolding de novo genome assemblies

Lia Obinu,Urmi Trivedi,Andrea Porceddu

DOI: https://doi.org/10.1101/2023.05.16.540917

2024-02-15

Abstract:The implementation of Hi-C reads in the genome assembly allows to order large regions of the genome in scaffolds, obtaining chromosome-level assemblies. Several bioinformatics tools have been developed for genome scaffolding with Hi-C, and all have pros and cons which need to be carefully evaluated before adoption. We developed assemblyQC, a bash pipeline that combines QUAST, BUSCO, Merqury and, optionally, Liftoff, plus a gene positioning validation script to evaluate and benchmark the performance of three scaffolders, 3d-dna, SALSA2, and YaHS, on two de novo assembly of Arabidopsis thaliana obtained from the same raw PacBio HiFi and ONT data. In our analysis, YaHS proved to be the best-performing bioinformatic tool for scaffolding of genome assembly.

Genomics
Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study

Qiong-Yi Zhao,Yi Wang,Yi-Meng Kong,Da Luo,Xuan Li,Pei Hao

DOI: https://doi.org/10.1186/1471-2105-12-s14-s2

IF: 3.307

2011-12-01

BMC Bioinformatics

Abstract:Abstract Background With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data. Results To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k -mer values, genome complexity, coverage depth, directional reads, etc . Seven program conditions, four single k -mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k -mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k -mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies. Conclusions Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.

biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Keith R. Bradnam,Joseph N. Fass,Anton Alexandrov,Paul Baranay,Michael Bechner,İnanç Birol,Sébastien Boisvert,Jarrod A. Chapman,Guillaume Chapuis,Rayan Chikhi,Hamidreza Chitsaz,Wen-Chi Chou,Jacques Corbeil,Cristian Del Fabbro,T. Roderick Docking,Richard Durbin,Dent Earl,Scott Emrich,Pavel Fedotov,Nuno A. Fonseca,Ganeshkumar Ganapathy,Richard A. Gibbs,Sante Gnerre,Élénie Godzaridis,Steve Goldstein,Matthias Haimel,Giles Hall,David Haussler,Joseph B. Hiatt,Isaac Y. Ho,Jason Howard,Martin Hunt,Shaun D. Jackman,David B Jaffe,Erich Jarvis,Huaiyang Jiang,Sergey Kazakov,Paul J. Kersey,Jacob O. Kitzman,James R. Knight,Sergey Koren,Tak-Wah Lam,Dominique Lavenier,François Laviolette,Yingrui Li,Zhenyu Li,Binghang Liu,Yue Liu,Ruibang Luo,Iain MacCallum,Matthew D MacManes,Nicolas Maillet,Sergey Melnikov,Bruno Miguel Vieira,Delphine Naquin,Zemin Ning,Thomas D. Otto,Benedict Paten,Octávio S. Paulo,Adam M. Phillippy,Francisco Pina-Martins,Michael Place,Dariusz Przybylski,Xiang Qin,Carson Qu,Filipe J Ribeiro,Stephen Richards,Daniel S. Rokhsar,J. Graham Ruby,Simone Scalabrin,Michael C. Schatz,David C. Schwartz,Alexey Sergushichev,Ted Sharpe,Timothy I. Shaw,Jay Shendure,Yujian Shi,Jared T. Simpson,Henry Song,Fedor Tsarev,Francesco Vezzi,Riccardo Vicedomini,Jun Wang,Kim C. Worley,Shuangye Yin,Siu-Ming Yiu,Jianying Yuan,Guojie Zhang,Hao Zhang,Shiguo Zhou,Ian F. Korf

DOI: https://doi.org/10.1186/2047-217X-2-10

2013-06-28

Abstract:Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results - In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions - Many current genome assemblers produced useful assemblies, containing a significant representation of their genes, regulatory sequences, and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

Genomics
New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads

Laura Gonzalez-Garcia,David Guevara-Barrientos,Daniela Lozano-Arce,Juanita Gil,Jorge Díaz-Riaño,Erick Duarte,Germán Andrade,Juan Camilo Bojacá,Maria Camila Hoyos-Sanchez,Christian Chavarro,Natalia Guayazan,Luis Alberto Chica,Maria Camila Buitrago Acosta,Edwin Bautista,Miller Trujillo,Jorge Duitama

DOI: https://doi.org/10.26508/lsa.202201719

2023-02-22

Abstract:Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers selected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reimplementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms

Nicolas Cerveau,Daniel J. Jackson

DOI: https://doi.org/10.1186/s12859-016-1406-x

IF: 3.307

2016-12-01

BMC Bioinformatics

Abstract:BackgroundNext-generation sequencing (NGS) technologies are arguably the most revolutionary technical development to join the list of tools available to molecular biologists since PCR. For researchers working with nonconventional model organisms one major problem with the currently dominant NGS platform (Illumina) stems from the obligatory fragmentation of nucleic acid material that occurs prior to sequencing during library preparation. This step creates a significant bioinformatic challenge for accurate de novo assembly of novel transcriptome data. This challenge becomes apparent when a variety of modern assembly tools (of which there is no shortage) are applied to the same raw NGS dataset. With the same assembly parameters these tools can generate markedly different assembly outputs.ResultsIn this study we present an approach that generates an optimized consensus de novo assembly of eukaryotic coding transcriptomes. This approach does not represent a new assembler, rather it combines the outputs of a variety of established assembly packages, and removes redundancy via a series of clustering steps. We test and validate our approach using Illumina datasets from six phylogenetically diverse eukaryotes (three metazoans, two plants and a yeast) and two simulated datasets derived from metazoan reference genome annotations. All of these datasets were assembled using three currently popular assembly packages (CLC, Trinity and IDBA-tran). In addition, we experimentally demonstrate that transcripts unique to one particular assembly package are likely to be bioinformatic artefacts. For all eight datasets our pipeline generates more concise transcriptomes that in fact possess more unique annotatable protein domains than any of the three individual assemblers we employed. Another measure of assembly completeness (using the purpose built BUSCO databases) also confirmed that our approach yields more information.ConclusionsOur approach yields coding transcriptome assemblies that are more likely to be closer to biological reality than any of the three individual assembly packages we investigated. This approach (freely available as a simple perl script) will be of use to researchers working with species for which there is little or no reference data against which the assembly of a transcriptome can be performed.

biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline

Sara Wattanasombat,Siripong Tongjai

DOI: https://doi.org/10.12688/f1000research.149577.1

2024-05-31

Abstract:Background: Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources. Methods: We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment. Results: Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among de novo assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime. Conclusions: The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.
Phasing or purging: tackling the genome assembly of a highly heterozygous animal species in the era of high-accuracy long reads

Nadège Guiglielmoni,Philipp H Schiffer

DOI: https://doi.org/10.1101/2024.06.16.599187

2024-06-17

Abstract:The revolution of high-accuracy long reads offers unprecedented quality and contiguity in genome assembly. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies have made significant strides in improving their sequencing technologies, yielding reads with error rates below 1% and lengths ranging from kilobases to megabases. These advancements have prompted the development of assembly tools tailored to leverage the enhanced accuracy of long reads. However, the challenge of collapsing haplotypes into high-quality haploid assemblies persists, especially for highly heterozygous genomes. This raises questions about the feasibility and desirability of phased assemblies versus collapsed haploid assemblies. To address these challenges, we benchmarked five assembly tools on ultra-low input PacBio HiFi and Nanopore R10.4 reads from the parthenogenetic nematode species Plectus sambesii. We propose a comprehensive methodology for assessing phased assemblies, repurposing existing evaluation programs to collect haplotype-relevant statistics. Our evaluation criteria include assembly size, contiguity, and completeness, with a focus on assessing the accuracy of phased assemblies by examining duplicated BUSCO orthologs and k-mer spectra. Additionally, we present strategies for generating collapsed assemblies by purging haplotigs. This study provides valuable insights and guidelines for generating high-quality phased and collapsed de novo genome assemblies from highly accurate long reads, particularly beneficial for non-model species genome assembly projects.

Genomics
DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

Chengxi Ye,Chris Hill,Shigang Wu,Jue Ruan,Zhanshan

DOI: https://doi.org/10.1038/srep31900

2016-09-04

Abstract:(An updated version of this manuscript has been accepted to Scientific Reports in 2016, please refer to <a class="link-external link-http" href="http://www.nature.com/articles/srep31900" rel="external noopener nofollow">this http URL</a>) The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads lead to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, which established an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more efficiently in time than existing methods, while saving about half of the sequencing cost.

Genomics

Design of PCI 2.2 Target Controller to Support Prefetch Request

Comprehensive assessment of 11 de novo HiFi assemblers on complex eukaryotic genomes and metagenomes

Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly

Benchmarking of Long-Read Sequencing, Assemblers and Polishers for Yeast Genome

Benchmarking of bioinformatics tools for the hybrid de novo assembly of human whole-genome sequencing data

Benchmarking multi-platform sequencing technologies for human genome assembly

Systematic Comparison of the Performances of De Novo Genome Assemblers for Oxford Nanopore Technology Reads From Piroplasm

Antimalarial and antiplasmodial activities of norneolignans. Syntheses and SAR.

Assessment of Metagenomic Assemblers Based on Hybrid Reads of Real and Simulated Metagenomic Sequences

Assembly Arena: Benchmarking RNA isoform reconstruction algorithms for nanopore sequencing

Technical report on best practices for hybrid and long read de novo assembly of bacterial genomes utilizing Illumina and Oxford Nanopore Technologies reads

Cerebral venous drainage pattern of the Sturge-Weber syndrome.

Pre-Assembly NGS Correction of ONT Reads Achieves HiFi-Level Assembly Quality

Benchmarking of Hi-C tools for scaffolding de novo genome assemblies

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads

Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms

Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline

Phasing or purging: tackling the genome assembly of a highly heterozygous animal species in the era of high-accuracy long reads

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies