Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Methodology article

V-Phaser 2: variant inference for viral populations

Xiao Yang*, Patrick Charlebois, Alex Macalalad, Matthew R Henn and Michael C Zody

Author Affiliations

Broad Institute of MIT & Harvard, 7 Cambridge Center, Cambridge, MA 02142 USA

For all author emails, please log on.

BMC Genomics 2013, 14:674  doi:10.1186/1471-2164-14-674

Published: 3 October 2013

Abstract

Background

Massively parallel sequencing offers the possibility of revolutionizing the study of viral populations by providing ultra deep sequencing (tens to hundreds of thousand fold coverage) of complete viral genomes. However, differentiation of true low frequency variants from sequencing errors remains challenging.

Results

We developed a software package, V-Phaser 2, for inferring intrahost diversity within viral populations. This program adds three major new methodologies to the state of the art: a technique to efficiently utilize paired end read data for calling phased variants, a new strategy to represent and infer length polymorphisms, and an in line filter for erroneous calls arising from systematic sequencing artifacts. We have also heavily optimized memory and run time performance. This combination of algorithmic and technical advances allows V-Phaser 2 to fully utilize extremely deep paired end sequencing data (such as generated by Illumina sequencers) to accurately infer low frequency intrahost variants in viral populations in reasonable time on a standard desktop computer. V-Phaser 2 was validated and compared to both QuRe and the original V-Phaser on three datasets obtained from two viral populations: a mixture of eight known strains of West Nile Virus (WNV) sequenced on both 454 Titanium and Illumina MiSeq and a mixture of twenty-four known strains of WNV sequenced only on 454 Titanium. V-Phaser 2 outperformed the other two programs in both sensitivity and specificity while using more than five fold less time and memory.

Conclusions

We developed V-Phaser 2, a publicly available software tool (V-Phaser 2 can be accessed via: http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/v-phaser-2 webcite and is freely available for academic use) that enables the efficient analysis of ultra-deep sequencing data produced by common next generation sequencing platforms for viral populations.

Keywords:
Viral population; Variant calling; Length polymorphisms; Phasing; Next generation sequencing