Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Software

Mitochondrial genome sequence analysis: A custom bioinformatics pipeline substantially improves Affymetrix MitoChip v2.0 call rate and accuracy

Hongbo M Xie1, Juan C Perin1, Theodore G Schurr2, Matthew C Dulik2, Sergey I Zhadanov2, Joseph A Baur3, Michael P King4, Emily Place5, Colleen Clarke5, Michael Grauer1, Jonathan Schug6, Avni Santani7, Anthony Albano8, Cecilia Kim8, Vincent Procaccio9, Hakon Hakonarson108, Xiaowu Gai111* and Marni J Falk125*

Author Affiliations

1 Center for Biomedical Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA

2 Department of Anthropology, University of Pennsylvania School of Arts and Sciences, Philadelphia, PA 19104, USA

3 Department of Physiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA

4 Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA 19107, USA

5 Division of Human Genetics, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA

6 Computational Biology and Informatics Lab, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA

7 Molecular Genetics Laboratory, Department of Pathology, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA

8 Center for Applied Genomics, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA

9 Department of Biochemistry and Genetics, Angers University Hospital, School of Medicine, Angers, F-49000, France

10 Division of Pulmonary Medicine, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA

11 Department of Molecular Pharmacology and Therapeutics, Loyola University Chicago Stritch School of Medicine, Maywood, IL, 60153, USA

12 Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:402  doi:10.1186/1471-2105-12-402

Published: 19 October 2011

Abstract

Background

Mitochondrial genome sequence analysis is critical to the diagnostic evaluation of mitochondrial disease. Existing methodologies differ widely in throughput, complexity, cost efficiency, and sensitivity of heteroplasmy detection. Affymetrix MitoChip v2.0, which uses a sequencing-by-genotyping technology, allows potentially accurate and high-throughput sequencing of the entire human mitochondrial genome to be completed in a cost-effective fashion. However, the relatively low call rate achieved using existing software tools has limited the wide adoption of this platform for either clinical or research applications. Here, we report the design and development of a custom bioinformatics software pipeline that achieves a much improved call rate and accuracy for the Affymetrix MitoChip v2.0 platform. We used this custom pipeline to analyze MitoChip v2.0 data from 24 DNA samples representing a broad range of tissue types (18 whole blood, 3 skeletal muscle, 3 cell lines), mutations (a 5.8 kilobase pair deletion and 6 known heteroplasmic mutations), and haplogroup origins. All results were compared to those obtained by at least one other mitochondrial DNA sequence analysis method, including Sanger sequencing, denaturing HPLC-based heteroduplex analysis, and/or the Illumina Genome Analyzer II next generation sequencing platform.

Results

An average call rate of 99.75% was achieved across all samples with our custom pipeline. Comparison of calls for 15 samples characterized previously by Sanger sequencing revealed a total of 29 discordant calls, which translates to an estimated 0.012% for the base call error rate. We successfully identified 4 known heteroplasmic mutations and 24 other potential heteroplasmic mutations across 20 samples that passed quality control.

Conclusions

Affymetrix MitoChip v2.0 analysis using our optimized MitoChip Filtering Protocol (MFP) bioinformatics pipeline now offers the high sensitivity and accuracy needed for reliable, high-throughput and cost-efficient whole mitochondrial genome sequencing. This approach provides a viable alternative of potential utility for both clinical diagnostic and research applications to traditional Sanger and other emerging sequencing technologies for whole mitochondrial genome analysis.