Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Copy number variation detection using next generation sequencing read counts

Heng Wang123*, Dan Nettleton4 and Kai Ying5

Author Affiliations

1 Lyman Briggs College, Michigan State University, East Lansing, USA

2 Department of Statistics and Probability, Michigan State University, East Lansing, USA

3 Department of Animal Science, Michigan State University, East Lansing, USA

4 Department of Statistics, Iowa State University, Ames, USA

5 Genome Technology Branch, The National Human Genome Research Institute, NIH, Bethesda, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:109  doi:10.1186/1471-2105-15-109

Published: 14 April 2014

Abstract

Background

A copy number variation (CNV) is a difference between genotypes in the number of copies of a genomic region. Next generation sequencing (NGS) technologies provide sensitive and accurate tools for detecting genomic variations that include CNVs. However, statistical approaches for CNV identification using NGS are limited. We propose a new methodology for detecting CNVs using NGS data. This method (henceforth denoted by m-HMM) is based on a hidden Markov model with emission probabilities that are governed by mixture distributions. We use the Expectation-Maximization (EM) algorithm to estimate the parameters in the model.

Results

A simulation study demonstrates that our proposed m-HMM approach has greater power for detecting copy number gains and losses relative to existing methods. Furthermore, application of our m-HMM to DNA sequencing data from the two maize inbred lines B73 and Mo17 to identify CNVs that may play a role in creating phenotypic differences between these inbred lines provides results concordant with previous array-based efforts to identify CNVs.

Conclusions

The new m-HMM method is a powerful and practical approach for identifying CNVs from NGS data.

Keywords:
Count data; Gamma-Poisson mixture; Hidden Markov model; Plant genomics; Poisson mixture model