Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data

Allyson L Byrd,Joseph F Perez-Rogers,Solaiappan Manimaran,Eduardo Castro-Nallar,Ian Toma,Tim McCaffrey,Marc Siegel,Gary Benson,Keith A Crandall,William Evan Johnson
DOI: https://doi.org/10.1186/1471-2105-15-262
IF: 3.307
2014-08-04
BMC Bioinformatics
Abstract:BackgroundThe use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.ResultsHere we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.ConclusionsClinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: http://sourceforge.net/projects/pathoscope/.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to rapidly and accurately identify pathogens in clinical samples. Specifically, the research aims to overcome several challenges existing in current pathogen identification methods: 1. **Removal of host contamination**: Genomic sequences in clinical samples come from multiple sources, which complicates the identification of pathogens. The method proposed in the paper can effectively remove host contamination, thereby improving the accuracy of pathogen identification. 2. **Identification of multiple pathogens**: Existing methods usually require a large number of pathogen reads and include computationally intensive steps, such as genome assembly, multi - genome alignment, extensive homology search and/or phylogenetic estimation. These methods perform poorly when dealing with mixed samples. The method proposed in the paper can simultaneously identify multiple pathogens in a single clinical sample. 3. **High sensitivity and specificity**: Existing methods are often not accurate enough when identifying pathogens at the subspecies level and may assign ambiguously aligned reads to a higher taxonomic level, leading to non - specific or incorrect diagnoses. The method proposed in the paper is superior to existing pathogen identification methods in terms of computational speed, sensitivity and specificity. 4. **Ability to handle a small number of reads**: The method proposed in the paper can distinguish very closely related species and strains when the number of pathogen reads in the sample is very small. Through these improvements, Clinical PathoScope can complete the analysis of clinical samples more quickly without the need for genome assembly, thus providing support for personalized medicine.