Open Access Highly Accessed Methodology article

DNA copy number, including telomeres and mitochondria, assayed using next-generation sequencing

John C Castle12*, Matthew Biery13, Heather Bouzek14, Tao Xie15, Ronghua Chen16, Kira Misura17, Stuart Jackson16, Christopher D Armour13, Jason M Johnson16, Carol A Rohl16 and Christopher K Raymond13*

Author Affiliations

1 Rosetta Inpharmatics LLC, a wholly owned subsidiary of Merck & Co., Inc., Seattle, Washington 98109, USA

2 Institute for Translational Oncology and Immunology (TrOn), Mainz, Germany

3 NuGEN Technologies, Inc., Seattle, Washington, USA

4 University of Washington, Seattle, Washington, USA

5 Pfizer, Inc., San Diego, California, USA

6 Merck Research Laboratories, Boston, Massachusetts, USA

7 Amgen, Inc., Seattle, Washington, USA

For all author emails, please log on.

BMC Genomics 2010, 11:244  doi:10.1186/1471-2164-11-244

Published: 16 April 2010

Abstract

Background

DNA copy number variations occur within populations and aberrations can cause disease. We sought to develop an improved lab-automatable, cost-efficient, accurate platform to profile DNA copy number.

Results

We developed a sequencing-based assay of nuclear, mitochondrial, and telomeric DNA copy number that draws on the unbiased nature of next-generation sequencing and incorporates techniques developed for RNA expression profiling. To demonstrate this platform, we assayed UMC-11 cells using 5 million 33 nt reads and found tremendous copy number variation, including regions of single and homogeneous deletions and amplifications to 29 copies; 5 times more mitochondria and 4 times less telomeric sequence than a pool of non-diseased, blood-derived DNA; and that UMC-11 was derived from a male individual.

Conclusion

The described assay outputs absolute copy number, outputs an error estimate (p-value), and is more accurate than array-based platforms at high copy number. The platform enables profiling of mitochondrial levels and telomeric length. The assay is lab-automatable and has a genomic resolution and cost that are tunable based on the number of sequence reads.