Open Access Software

Mitochondrial genome sequence analysis: A custom bioinformatics pipeline substantially improves Affymetrix MitoChip v2.0 call rate and accuracy

Hongbo M Xie1, Juan C Perin1, Theodore G Schurr2, Matthew C Dulik2, Sergey I Zhadanov2, Joseph A Baur3, Michael P King4, Emily Place5, Colleen Clarke5, Michael Grauer1, Jonathan Schug6, Avni Santani7, Anthony Albano8, Cecilia Kim8, Vincent Procaccio9, Hakon Hakonarson108, Xiaowu Gai111* and Marni J Falk125*

Author Affiliations

1 Center for Biomedical Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA

2 Department of Anthropology, University of Pennsylvania School of Arts and Sciences, Philadelphia, PA 19104, USA

3 Department of Physiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA

4 Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA 19107, USA

5 Division of Human Genetics, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA

6 Computational Biology and Informatics Lab, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA

7 Molecular Genetics Laboratory, Department of Pathology, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA

8 Center for Applied Genomics, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA

9 Department of Biochemistry and Genetics, Angers University Hospital, School of Medicine, Angers, F-49000, France

10 Division of Pulmonary Medicine, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA

11 Department of Molecular Pharmacology and Therapeutics, Loyola University Chicago Stritch School of Medicine, Maywood, IL, 60153, USA

12 Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:402  doi:10.1186/1471-2105-12-402

Published: 19 October 2011

Additional files

Additional file 1:

Source Code.

Format: TXT Size: 19KB Download file

Open Data

Additional file 2:

Structural variant detection capacity analysis in MFP. (A and B) Quality score plots with 25 bp moving window for simulated data sets with deletion segments of different sizes (marked on the left). The deleted segment is highlighted in red in each plot. (C) Sensitivity plot for deletions of various sizes based on simulation tests.

Format: PPT Size: 553KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional file 3:

Box plot of ratios between the highest and second highest signal intensities of all bases located in the large deleted region of sample #14. 12.7% of bases in the 5791 bp deleted region would fall above this cutoff.

Format: PPT Size: 125KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional file 4:

Single nucleotide variant discrepancies within 15 DNA samples analyzed both by Affymetrix MitoChip v2.0 with the MFP analysis algorithm and by either Sanger or Illumina Genome Analyzer II Sequencing methods. Whole mtDNA genome sequence data was compared for all 15 high quality samples detailed in Table 2, as well as for 3 samples found to have poor quality (#14, #17, and #18). No sequence discrepancies were noted between MitoChip v2.0 and 7 of the high quality samples (#10, #11, #13, #15, #16, #21, #22), nor in sample #14 at any base positions outside of its large deleted region.

Format: XLS Size: 34KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Alignment of Illumina GA next generation sequencing reads from position 3433 in sample #15. Mitochondrial genome position 3433, visualized in Tablet, shows no indication of heteroplasmy by next generation sequencing.

Format: PPT Size: 328KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional file 6:

Alignment of Illumina GA next generation sequencing reads from position 15940 to 15944 in sample #15. The 2 base pair deletion located between np 15940 and 15944 in the mitochondrial genome, visualized in Tablet, clearly indicates these are potential heteroplasmic sites.

Format: PPT Size: 366KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional file 7:

Validation data set results comparison between MFP and Sanger sequencing. (A) Call rate improvement when comparing MFP analysis to GSEQ 4.1 for each of 5 validation samples. (B) Accuracy between total calls made by MFP analysis and Sanger sequencing for each of 5 validation samples. (C) Call type details for a total of 5 discrepant calls between MFP analysis and Sanger sequencing seen in 3 of the validation samples. Interestingly, 3 of the 5 discrepant calls involved the same SNP (A12307G) in three distinct samples (C5, C9, C11) that was detected by MFP. (D) Sanger-based electropherogram clearly shows the presence of the A > G variation, not at 12307, but at its neighboring position 12308, as shown here for validation sample C5.

Format: PPT Size: 176KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data