Open Access Highly Accessed Research article

Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data

David Mosén-Ansorena1*, Ana María Aransay1 and Naiara Rodríguez-Ezpeleta2

Author Affiliations

1 Genome Analysis Platform, CIC bioGUNE - CIBERehd, Technologic Park of Bizkaia, building 502, 48160, Derio, Spain

2 Marine Research Division, AZTI-Tecnalia, Txatxarramendiugartea z/g, 48395, Sukarrieta, Spain

For all author emails, please log on.

BMC Bioinformatics 2012, 13:192  doi:10.1186/1471-2105-13-192

Published: 7 August 2012

Additional files

Additional file 1:

Code to generate the synthetic samples.

Format: R Size: 3KB Download file

Open Data

Additional file 2:

Synthetic regions without noise.Synthetic BAF (left graph) and LRR (right graph) signals of some example regions generated with CnaGen at different contamination levels and without probe-specific and autocorrelated noises: a heterozygous deletion (first column), a normal diploid region (second column), the various heterozygous CNA events up to copy number 5 (third to seventh columns) and two concrete cases of 2 and 3-subclone CNAs (last two columns). Each SNP probe provides a measurement of the proportion of one of the alleles (BAF) and the total intensity coming from the two alleles (LRR).

Format: PDF Size: 902KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Recall rates by method, contamination and alteration length. Recall rates (y-axis) of each of the assessed methods, calculated by contamination and alteration length over each of the 5 synthetic sample sets. Colour code: GAP (orange); Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey).

Format: PDF Size: 106KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Recall rates, considering LOH status, by method, contamination, and alteration copy number and length. Recall rates (y-axis) of calls made with correct copy number and LOH status. By: (i) normal cell contamination (x-axis), (ii) contamination and copy number (x-axis), and (iii) contamination and alteration length (x-axis) over each of the 5 synthetic sample sets. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey).

Format: PDF Size: 236KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Method version and parameterization details. Versions of the methods used in this study and details of parameterization when the default parameters were not used. Additionally, details on the PFB and GC content files used as input when required.

Format: DOC Size: 26KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 6:

FDRs on synthetic samples. Overall false discovery rates on synthetic samples, broken down by normal cell contamination level and called/real copy number. Cell colour represents the amount of incorrectly made calls when the predicted copy number (y-axis) is different from the actual copy number (x-axis). There are no copy number 0 regions in the samples, but some methods make copy number 0 calls. The total FDR for a certain method and contamination is indicated in the lower right corner of each plot, and is the sum of all the corresponding cell values. Good performance is reflected in the symmetry and narrowness of the wrong call distribution along the correct call diagonal. Departure from such symmetry evidences some kind of bias.

Format: PDF Size: 340KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Generation of hybrid samples.

Format: R Size: 2KB Download file

Open Data

Additional file 8:

Recall rates by method, contamination, and alteration length. (A) Recall rates (y-axis) of each of the assessed methods, calculated by contamination over each of the 3 hybrid sample sets. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey). (B) Recall rates (y-axis) of each of the assessed methods, calculated by contamination and alteration length over each of the 3 hybrid sample sets. Alteration lengths (y-axis) are grouped into increasingly bigger bins (10–19 SNPs, 20–39, 40–79, 80–159, 160–319, 320–639, 640–1279 and from 1280 SNPs on) and each bin is represented by the shorter length within it. Alterations shorter than 10 SNPs were not assessed. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey).

Format: PDF Size: 100KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 9:

Cell-line data and method calls. LRR (top graph) and BAF (bottom graph) signals for the cell-line sample at 21% contamination. Chromosomes 6, 16 and X are excluded for the reasons described in the main text. In the middle, the calls made by the seven methods, including MixHMM with manually set global parameters (LRR shift and contamination), and the reference true calls. If any, calls made with copy numbers higher than 4 are displayed as copy number 4.

Format: ZIP Size: 3.5MB Download file

Open Data

Additional file 10:

Parameter relationships. Tables and regression plots that show the relationship between: (i) coefficient of LRR contraction and degree of normal cell contamination; and (ii) DNA index and baseline shift.

Format: XLS Size: 30KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data