Open Access Highly Accessed Open Badges Methodology article

Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data

Allyson L Byrd12, Joseph F Perez-Rogers13, Solaiappan Manimaran3, Eduardo Castro-Nallar4, Ian Toma5, Tim McCaffrey5, Marc Siegel6, Gary Benson178, Keith A Crandall4* and William Evan Johnson13*

Author Affiliations

1 Department of Bioinformatics, Boston University, Boston, MA, USA

2 Genetics and Molecular Biology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA

3 Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA

4 Computational Biology Institute, George Washington University, Ashburn, VA, USA

5 Division of Genomic Medicine, George Washington University, Washington, DC, USA

6 Division of Infectious Disease, George Washington University, Washington, DC, USA

7 Department of Computer Science, Boston University, Boston, MA, USA

8 Department of Biology, Boston University, Boston, MA, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:262  doi:10.1186/1471-2105-15-262

Published: 4 August 2014



The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.


Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.


Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: webcite.