Comparison of genomic assembly and annotation based on two clones of avian pathogenic

Yufei Zhao,John Elmerdahl Olsen,Louise Poulsen,Henrik Christensen
DOI: https://doi.org/10.1101/2024.11.22.624809
2024-11-22
Abstract:Methods for assembly and annotation of whole genomic sequences were compared for six strains of avian pathogenic (APEC). Two vertically transferred clones, represented by three isolates all belonging to pulse field genome electrophoresis (PFGE) type 65-sequence type (ST)95 and three isolates belonging to PFGE type 47-ST131, were selected for Illumina short read sequencing. There was no significant difference between SPAdes and CLC Genomic Workbench for benchmark parameters to assemble the short reads. The six strains were also sequenced by long read sequencing (Nanopore) and these reads were hybrid assembled with the short reads. Unicycler provided a lower number of contigs and higher N50 compared to Flye. No significant differences between total length of genomes were obtained from the four assemblers. At least 2.1 and 0.9% of coding gene sequences (CDSs) annotated with RAST and PROKKA, respectively were wrongly annotated. The errors were most often associated to CDS of shorter length (< 150 nt) with functions such as transposases, mobile genetic elements or being hypothetical. The investigation points out the importance of controlling automatic annotations and suggest further work to improve annotations in strains not belonging to the K12 or B lineages.
Bioinformatics
What problem does this paper attempt to address?