Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data
- Equal contributors
1 Department of Bioinformatics, Boston University, Boston, MA, USA
2 Genetics and Molecular Biology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
3 Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
4 Computational Biology Institute, George Washington University, Ashburn, VA, USA
5 Division of Genomic Medicine, George Washington University, Washington, DC, USA
6 Division of Infectious Disease, George Washington University, Washington, DC, USA
7 Department of Computer Science, Boston University, Boston, MA, USA
8 Department of Biology, Boston University, Boston, MA, USA
BMC Bioinformatics 2014, 15:262 doi:10.1186/1471-2105-15-262Published: 4 August 2014
The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.
Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.
Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: http://sourceforge.net/projects/pathoscope/ webcite.