Pathosphere.org: pathogen detection and characterization through a web-based, open source informatics platform

Andy Kilianski,Patrick Carcel,Shijie Yao,Pierce Roth,Josh Schulte,Greg B Donarum,Ed T Fochler,Jessica M Hill,Alvin T Liem,Michael R Wiley,Jason T Ladner,Bradley P Pfeffer,Oliver Elliot,Alexandra Petrosov,Dereje D Jima,Tyghe G Vallard,Melanie C Melendrez,Evan Skowronski,Phenix-Lan Quan,W Ian Lipkin,Henry S Gibbons,David L Hirschberg,Gustavo F Palacios,C Nicole Rosenzweig
DOI: https://doi.org/10.1186/s12859-015-0840-5
IF: 3.307
2015-12-29
BMC Bioinformatics
Abstract:Background: The detection of pathogens in complex sample backgrounds has been revolutionized by wide access to next-generation sequencing (NGS) platforms. However, analytical methods to support NGS platforms are not as uniformly available. Pathosphere (found at Pathosphere.org) is a cloud - based open - sourced community tool that allows for communication, collaboration and sharing of NGS analytical tools and data amongst scientists working in academia, industry and government. The architecture allows for users to upload data and run available bioinformatics pipelines without the need for onsite processing hardware or technical support. Results: The pathogen detection capabilities hosted on Pathosphere were tested by analyzing pathogen-containing samples sequenced by NGS with both spiked human samples as well as human and zoonotic host backgrounds. Pathosphere analytical pipelines developed by Edgewood Chemical Biological Center (ECBC) identified spiked pathogens within a common sample analyzed by 454, Ion Torrent, and Illumina sequencing platforms. ECBC pipelines also correctly identified pathogens in human samples containing arenavirus in addition to animal samples containing flavivirus and coronavirus. These analytical methods were limited in the detection of sequences with limited homology to previous annotations within NCBI databases, such as parvovirus. Utilizing the pipeline-hosting adaptability of Pathosphere, the analytical suite was supplemented by analytical pipelines designed by the United States Army Medical Research Insititute of Infectious Diseases and Walter Reed Army Institute of Research (USAMRIID-WRAIR). These pipelines were implemented and detected parvovirus sequence in the sample that the ECBC iterative analysis previously failed to identify. Conclusions: By accurately detecting pathogens in a variety of samples, this work demonstrates the utility of Pathosphere and provides a platform for utilizing, modifying and creating pipelines for a variety of NGS technologies developed to detect pathogens in complex sample backgrounds. These results serve as an exhibition for the existing pipelines and web-based interface of Pathosphere as well as the plug-in adaptability that allows for integration of newer NGS analytical software as it becomes available.
What problem does this paper attempt to address?