Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data

Allyson L Byrd,Joseph F Perez-Rogers,Solaiappan Manimaran,Eduardo Castro-Nallar,Ian Toma,Tim McCaffrey,Marc Siegel,Gary Benson,Keith A Crandall,William Evan Johnson

DOI: https://doi.org/10.1186/1471-2105-15-262

IF: 3.307

2014-08-04

BMC Bioinformatics

Abstract:BackgroundThe use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.ResultsHere we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.ConclusionsClinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: http://sourceforge.net/projects/pathoscope/.

biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to rapidly and accurately identify pathogens in clinical samples. Specifically, the research aims to overcome several challenges existing in current pathogen identification methods: 1. **Removal of host contamination**: Genomic sequences in clinical samples come from multiple sources, which complicates the identification of pathogens. The method proposed in the paper can effectively remove host contamination, thereby improving the accuracy of pathogen identification. 2. **Identification of multiple pathogens**: Existing methods usually require a large number of pathogen reads and include computationally intensive steps, such as genome assembly, multi - genome alignment, extensive homology search and/or phylogenetic estimation. These methods perform poorly when dealing with mixed samples. The method proposed in the paper can simultaneously identify multiple pathogens in a single clinical sample. 3. **High sensitivity and specificity**: Existing methods are often not accurate enough when identifying pathogens at the subspecies level and may assign ambiguously aligned reads to a higher taxonomic level, leading to non - specific or incorrect diagnoses. The method proposed in the paper is superior to existing pathogen identification methods in terms of computational speed, sensitivity and specificity. 4. **Ability to handle a small number of reads**: The method proposed in the paper can distinguish very closely related species and strains when the number of pathogen reads in the sample is very small. Through these improvements, Clinical PathoScope can complete the analysis of clinical samples more quickly without the need for genome assembly, thus providing support for personalized medicine.

Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data

Rapid Detection of Potential New Pathogens in Patient Using High -Throughput Sequencing

PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples

Pathosphere.org: pathogen detection and characterization through a web-based, open source informatics platform

scPathoQuant: A tool for efficient alignment and quantification of pathogen sequence reads from 10x single cell sequencing data sets

Development and Optimization of an Unbiased, Metagenomics-Based Pathogen Detection Workflow for Infectious Disease and Biosurveillance Applications

An integrated method for targeted Oxford Nanopore sequencing and automated bioinformatics for the simultaneous detection of bacteria, fungi, and ARG

A streamlined clinical metagenomic sequencing protocol for rapid pathogen identification

High-Throughput Metagenomics for Identification of Pathogens in the Clinical Settings

Unified metagenomic method for rapid detection of microorganisms in clinical samples

MegaPath: sensitive and rapid pathogen detection using metagenomic NGS data

ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research

IDseq-An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring

Identification and quantitation of clinically relevant microbes in patient samples: Comparison of three k-mer based classifiers for speed, accuracy, and sensitivity

Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection

GenomicGapID: Leveraging Spatial Distribution of Conserved Genomic Sites for Broad-Spectrum Microbial Identification

Castanet: a pipeline for rapid analysis of targeted multi-pathogen genomic data

Detection of Pathogens Via High-Throughput Sequencing

Towards a Rapid-Turnaround Low-Depth Unbiased Metagenomics Sequencing Workflow on the Illumina Platforms

Simultaneous detection of pathogens and antimicrobial resistance genes with the open source, cloud-based, CZ ID pipeline