Viral quasispecies inference from 454 pyrosequencing
- Equal contributors
1 Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
2 Center for Emerging and Neglected Infectious Diseases, Mahidol University, Bangkok, Thailand
3 Division of Bioinformatics and Data Management for Research, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
4 Medical Biotechnology Research Unit, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Bangkok, Thailand
5 Dengue Hemorrhagic Fever Research Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
6 Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
7 NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore, Singapore
8 Life Sciences Institute, National University of Singapore, Singapore, Singapore
9 Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
10 Center for Infectious Disease Epidemiology and Research, National University of Singapore, Singapore, Singapore
BMC Bioinformatics 2013, 14:355 doi:10.1186/1471-2105-14-355Published: 5 December 2013
Many potentially life-threatening infectious viruses are highly mutable in nature. Characterizing the fittest variants within a quasispecies from infected patients is expected to allow unprecedented opportunities to investigate the relationship between quasispecies diversity and disease epidemiology. The advent of next-generation sequencing technologies has allowed the study of virus diversity with high-throughput sequencing, although these methods come with higher rates of errors which can artificially increase diversity.
Here we introduce a novel computational approach that incorporates base quality scores from next-generation sequencers for reconstructing viral genome sequences that simultaneously infers the number of variants within a quasispecies that are present. Comparisons on simulated and clinical data on dengue virus suggest that the novel approach provides a more accurate inference of the underlying number of variants within the quasispecies, which is vital for clinical efforts in mapping the within-host viral diversity. Sequence alignments generated by our approach are also found to exhibit lower rates of error.
The ability to infer the viral quasispecies colony that is present within a human host provides the potential for a more accurate classification of the viral phenotype. Understanding the genomics of viruses will be relevant not just to studying how to control or even eradicate these viral infectious diseases, but also in learning about the innate protection in the human host against the viruses.