Abstract:The availability of a large number of genome sequences, resulting from inexpensive, high-throughput next-generation sequencing platforms, has created the need for an integrated, fully-automated, rapid, and high-throughput annotation capability that is also easy-to-use. Here, we present a web-based software application, Annotation of Genome Sequences (AGeS), which incorporates publicly-available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. The current version of AGeS provides annotations for bacterial genome sequences, and serves as a readily-accessible resource to Department of Defense (DoD) scientists for storing, annotating, and visualizing genomes of newly-sequenced pathogens of interest. The AGeS system is composed of two major components. The first component is a web-based application that provides a graphical user interface for managing users’ input genomes, submitting annotation jobs, and visualizing results. Sequence contigs are uploaded as a multi-FASTA input file and submitted for annotation, and the resulting annotations are visualized through GBrowse. The input genome sequences and the annotation results are stored in a secure, customized database. The second component is a high-throughput annotation pipeline for finding the genomic regions that code for proteins, RNAs, and other genomic elements through a Do-It-Yourself Annotation framework. The pipeline also functionally annotates the protein-coding regions using an in-house-developed high-throughput pipeline, the Pipeline for Protein Annotation. The annotation pipeline has been deployed on the Mana Linux cluster at the Maui High Performance Computing Center. The two components are connected together using the DoD user interface toolkit application programming interface. The AGeS system was evaluated for scaling of its parallel execution and annotation performance. AGeS scaled with super-linear speedup for up to 128 processors, after which performance degraded. A 2.2-Mbp bacterial genome sequence can be annotated in ~1 hr using 128 processors. AGeS annotations of draft and complete genomes were compared with the original annotations from three different sources, and were found to be in general agreement with them.

DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

DNAscan: a fast, computationally and memory efficient bioinformatics pipeline for the analysis of DNA next-generation-sequencing data

Construction of an Open-Access Database That Integrates Cross-Reference Information from the Transcriptome and Proteome of Immune Cells

DDBJ update in 2024: the DDBJ Group Cloud service for sharing pre-publication data

A Fully Integrated End-to-End Genome Analysis Accelerator for Next-Generation Sequencing

Metapipeline-DNA: A Comprehensive Germline & Somatic Genomics Nextflow Pipeline

Scalable and efficient DNA sequencing analysis on different compute infrastructures aiding variant discovery

An open-sourced bioinformatic pipeline for the processing of Next-Generation Sequencing derived nucleotide reads: Identification and authentication of ancient metagenomic DNA

A Web-based High-Throughput Tool for Next-Generation Sequence Annotation

DNA Data Bank of Japan (DDBJ) update report 2021

DNA Data Bank of Japan (DDBJ) update report 2022

DNAscan2: a versatile, scalable, and user-friendly analysis pipeline for human next-generation sequencing data

A graphical, interactive and GPU-enabled workflow to process long-read sequencing data

DDBJ update in 2023: the MetaboBank for metabolomics data and associated metadata

Cloud Based Short Read Mapping Service

A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies

Next-generation sequencing analysis with a population-specific reference genome

Read Annotation Pipeline for High-Throughput Sequencing Data.

Ingap: an Integrated Next-Generation Genome Analysis Pipeline

miCloud: A Plug-n-Play, Extensible, On-Premises Bioinformatics Cloud for Seamless Execution of Complex Next-Generation Sequencing Data Analysis Pipelines

Next-generation sequencing analysis with a population-specific human reference genome