Identification of pathogen genomic variants through an integrated pipeline
1 Department of Pediatrics, University of California, San Diego, School of Medicine, 9500 Gilman Drive 0741, La Jolla, California 92093, USA
2 Biomedical Sciences Program, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
3 Immunology and Infectious Diseases, Harvard School of Public Health, 665 Huntington Avenue, Boston, Massachusetts 02115, USA
4 Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, California 92121, USA
BMC Bioinformatics 2014, 15:63 doi:10.1186/1471-2105-15-63Published: 3 March 2014
Whole-genome sequencing represents a powerful experimental tool for pathogen research. We present methods for the analysis of small eukaryotic genomes, including a streamlined system (called Platypus) for finding single nucleotide and copy number variants as well as recombination events.
We have validated our pipeline using four sets of Plasmodium falciparum drug resistant data containing 26 clones from 3D7 and Dd2 background strains, identifying an average of 11 single nucleotide variants per clone. We also identify 8 copy number variants with contributions to resistance, and report for the first time that all analyzed amplification events are in tandem.
The Platypus pipeline provides malaria researchers with a powerful tool to analyze short read sequencing data. It provides an accurate way to detect SNVs using known software packages, and a novel methodology for detection of CNVs, though it does not currently support detection of small indels. We have validated that the pipeline detects known SNVs in a variety of samples while filtering out spurious data. We bundle the methods into a freely available package.