<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-409</ui>
   <ji>1471-2105</ji>
   <fm>
		<dochead>Methodology article</dochead>
		<bibl>
			<title>
				<p>Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Staaf</snm>
					<fnm>Johan</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>johan.staaf@med.lu.se</email>
				</au>
				<au id="A2">
					<snm>Vallon-Christersson</snm>
					<fnm>Johan</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>johan.vallon-christersson@med.lu.se</email>
				</au>
				<au id="A3">
					<snm>Lindgren</snm>
					<fnm>David</fnm>
					<insr iid="I1"/>
					<email>david.lindgren@med.lu.se</email>
				</au>
				<au id="A4">
					<snm>Juliusson</snm>
					<fnm>Gunnar</fnm>
					<insr iid="I3"/>
					<email>gunnar.juliusson@med.lu.se</email>
				</au>
				<au id="A5">
					<snm>Rosenquist</snm>
					<fnm>Richard</fnm>
					<insr iid="I4"/>
					<email>richard.rosenquist@genpat.uu.se</email>
				</au>
				<au id="A6">
					<snm>H&#246;glund</snm>
					<fnm>Mattias</fnm>
					<insr iid="I1"/>
					<email>mattias.hoglund@med.lu.se</email>
				</au>
				<au id="A7">
					<snm>Borg</snm>
					<fnm>&#197;ke</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<insr iid="I3"/>
					<email>ake.borg@med.lu.se</email>
				</au>
				<au ca="yes" id="A8">
					<snm>Ringn&#233;r</snm>
					<fnm>Markus</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>markus.ringner@med.lu.se</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Oncology, Clinical Sciences, Lund University, SE-22185 Lund, Sweden</p>
				</ins>
				<ins id="I2">
					<p>CREATE Health Strategic Centre for Clinical Cancer Research, Lund University, SE-22184 Lund, Sweden</p>
				</ins>
				<ins id="I3">
					<p>Lund Strategic Research Center for Stem Cell Biology and Cell Therapy, Lund University, SE-22184 Lund, Sweden</p>
				</ins>
				<ins id="I4">
					<p>Department of Genetics and Pathology, Uppsala University, SE-75185 Uppsala, Sweden</p>
				</ins>
			</insg>
			<source>BMC Bioinformatics</source>
			<issn>1471-2105</issn>
			<pubdate>2008</pubdate>
			<volume>9</volume>
			<issue>1</issue>
			<fpage>409</fpage>
			<url>http://www.biomedcentral.com/1471-2105/9/409</url>
			<xrefbib>
				<pubidlist>
					<pubid idtype="pmpid">18831757</pubid>
					<pubid idtype="doi">10.1186/1471-2105-9-409</pubid>
				</pubidlist>
			</xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>03</day>
					<month>6</month>
					<year>2008</year>
				</date>
			</rec>
			<acc>
				<date>
					<day>02</day>
					<month>10</month>
					<year>2008</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>02</day>
					<month>10</month>
					<year>2008</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2008</year>
			<collab>Staaf et al; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Illumina Infinium whole genome genotyping (WGG) arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH). Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina's proprietary software (BeadStudio). We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300 k version 1 and 2, 370 k and 550 k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>The proposed normalization strategy represents a valuable tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.</p>
				</sec>
			</sec>
		</abs>
	</fm>
   <bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Genomic copy number alterations (CNA) and allelic imbalances are common events in the development of cancer and certain genetic disorders <abbrgrp>
					<abbr bid="B1">1</abbr>
					<abbr bid="B2">2</abbr>
				</abbrgrp>. The introduction of whole genome genotyping (WGG) arrays based on single nucleotide polymorphism (SNP) genotyping <abbrgrp>
					<abbr bid="B3">3</abbr>
					<abbr bid="B4">4</abbr>
				</abbrgrp> allows for combined DNA copy number (SNP-CGH) and loss-of-heterozygosity (LOH) analysis at high resolution <abbrgrp>
					<abbr bid="B5">5</abbr>
				</abbrgrp>. Currently, two major SNP array platforms are in use, Affymetrix GeneChip arrays <abbrgrp>
					<abbr bid="B6">6</abbr>
				</abbrgrp> and Illumina BeadChips <abbrgrp>
					<abbr bid="B7">7</abbr>
				</abbrgrp>. The Infinium assay for Illumina BeadChips is based on allele-specific hybridization coupled with primer extension of genomic DNA using primers directly surrounding the SNP on randomly ordered bead arrays <abbrgrp>
					<abbr bid="B4">4</abbr>
				</abbrgrp>. The Infinium assay has been further developed into allele-specific single base extension using two color labeling with the Cy3 and Cy5 fluorescent dyes (Infinium II) <abbrgrp>
					<abbr bid="B8">8</abbr>
				</abbrgrp>. Current generations of Infinium II arrays are able to interrogate more than 1 million SNPs simultaneously.</p>
			<p>Infinium II is a two-channel assay and data consist of two intensity values (X, Y) for each SNP, with one intensity channel for each of the fluorescent dyes associated with the two alleles of the SNP. SNP markers are present at a high redundancy on Infinium II assays and the allele specific intensities (X, Y) are summarized estimates from replicate markers. The alleles measured by the X channel (Cy5 dye) are arbitrarily, with respect to haplotypes, called the A alleles, whereas the alleles measured by the Y channel (Cy3 dye) are called the B alleles. The allele specific intensities are normalized using a proprietary algorithm in the Illumina Beadstudio software. The normalization algorithm is applied on a sub-bead pool level and is designed to adjust for channel-dependent background and global intensity differences, and to scale the data. A sub-bead pool is a set of beads that were manufactured together and are located in roughly the same analytical location (stripe) on a BeadChip. The algorithm uses a 6-degree of freedom affine transformation with 5 main steps: outlier removal, background estimation, rotational estimation, shears estimation, and scaling estimation <abbrgrp>
					<abbr bid="B5">5</abbr>
				</abbrgrp>. After normalization, data should be as canonical as possible with homozygous SNPs positioned along the transformed X and Y intensity axes. Normalized allele intensities are transformed to a combined SNP intensity, R (R = X + Y), and an allelic intensity ratio, theta (&#952; = 2/&#960;*arctan(Y/X)).</p>
			<p>R values are calibrated to generate copy number estimates (CN) by comparison to either a matched reference sample analyzed simultaneously or to canonical genotype clusters <abbrgrp>
					<abbr bid="B5">5</abbr>
				</abbrgrp>. Canonical genotype clusters are generated from a large panel of normal samples and the clusters for a SNP indicate the R and theta values expected for each genotype (AA, AB and BB). Theta values are calibrated to generate B allele frequencies (BAF) using canonical genotype clusters. BAF is a value between 0 and 1 and represents the proportion contributed by one SNP allele (B) to the total copy number: BAF is an estimate of N<sub>B</sub>/(N<sub>A</sub>+ N<sub>B</sub>), where N<sub>A </sub>and N<sub>B </sub>are the number of A and B alleles, respectively. When canonical genotype clusters are used for calibration, copy number estimates are calculated per SNP by taking the log<sub>2 </sub>of the SNP intensity (R) divided by the SNP intensity expected from the canonical genotype clusters. Thus, copy number estimates may be regarded as a combination of two individual one-channel measurements of the amount of genetic material for a given SNP. Normalization of one-channel array data has been extensively explored, incorporating various algorithms, among which quantile normalization (QN) has been reported to perform consistently well <abbrgrp>
					<abbr bid="B9">9</abbr>
				</abbrgrp> and has been widely used to normalize between arrays <abbrgrp>
					<abbr bid="B10">10</abbr>
					<abbr bid="B11">11</abbr>
					<abbr bid="B12">12</abbr>
				</abbrgrp>. Recently, QN was applied, as one of several analysis steps, to Illumina Sentrix SNP BeadArrays to correct for an observed dye bias in copy number analysis <abbrgrp>
					<abbr bid="B13">13</abbr>
				</abbrgrp>.</p>
			<p>Allelic imbalances in samples can be conveniently visualized in BAF plots <abbrgrp>
					<abbr bid="B5">5</abbr>
				</abbrgrp>. A BAF value of 0.5 indicates a heterozygous genotype (AB), whereas 0 and 1 indicate homozygous genotypes (AA and BB, respectively). The allelic intensity ratio may, in the Infinium II assay, be regarded as a comparative dual channel measurement of the allelic proportion for a given SNP, similar to, e.g., two-channel gene expression data. Several reports have underlined the importance of intensity-based normalization, e.g., lowess <abbrgrp>
					<abbr bid="B14">14</abbr>
				</abbrgrp>, to correct for dye specific differences both for gene expression profiling <abbrgrp>
					<abbr bid="B15">15</abbr>
					<abbr bid="B16">16</abbr>
				</abbrgrp> and array comparative genomic hybridization (aCGH) <abbrgrp>
					<abbr bid="B17">17</abbr>
					<abbr bid="B18">18</abbr>
					<abbr bid="B19">19</abbr>
				</abbrgrp> in two-channel microarray data. Since alleles for SNPs are arbitrarily called A or B, a set of genomically consecutive SNPs will appear in BAF plots as horizontal bands that are expected to be symmetrically positioned around 0.5. For example, a region of single copy number gain in all cells will, in addition to the two bands of homozygous SNPs at BAF = 0 and BAF = 1, result in two bands: one at BAF = 0.33 with SNPs having genotype AAB and one at BAF = 0.67 with SNPs having genotype ABB.</p>
			<p>Here we demonstrate that BAF plots for tumor samples analyzed on Infinium II BeadChips often display bands that are not symmetrically positioned around 0.5. We show that these asymmetrical allelic ratios are caused by a bias between the two dyes used in the Infinium II assay, and that this dye intensity bias also hampers copy number estimates. Dye-bias can potentially be both global and SNP-specific. We propose using a quantile normalization based strategy applied to summarized bead type data within arrays for global correction of this dye intensity bias. The strategy corrects asymmetries that remain between intensity channels after the conventionally used BeadStudio normalization for both allelic intensity ratios and copy number estimates in normal as well as in tumor samples. Note that whereas quantile normalization is widely applied to single channel arrays to normalize between arrays, we instead apply it to normalize between channels within Infinium II arrays. Of key importance for the success of the strategy is the generation of new normalized reference data sets for the calibration of theta and R into B allele frequency and log R ratio &#8211; the data set analyzed and the data set used for calibration should both be normalized in the same way. We investigated the performance of the normalization strategy using 535 individual hybridizations from 10 different data sets generated on four different Infinium II platforms. The investigated data sets contain normal blood samples as well as breast tumor, colon tumor, urothelial tumor and chronic lymphocytic leukemia (CLL) samples. The included tumors display a large number of different copy number imbalances, but also variation in tumor heterogeneity and normal cell contamination. We conclude that the normalization strategy improves Infinium II data for samples of many different types.</p>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>Occurrence of asymmetrical B allele frequencies and copy number estimates in tumor specimens</p>
				</st>
				<p>Allelic imbalances in tumor samples may conveniently be displayed using B allele frequency plots, which illustrate the presence and location of genomic regions of apparently the same allelic proportion (Figure <figr fid="F1">1a</figr>). In contrast to the expected symmetrical behavior of the B allele frequency, SNPs in regions of allelic imbalance appear to have bands of BAF values that are asymmetrically positioned around 0.5 for the analyzed urothelial tumor (Figure <figr fid="F1">1a</figr>). The asymmetry becomes even more apparent when a mirror transformation along the 0.5 axes of BAF to mBAF is performed (Figure <figr fid="F1">1b</figr>). Importantly, the asymmetry also affects genotyping, indicated by the higher number of failed genotype calls for lower BAF values (AA) compared to higher BAF values (BB) for the region 1q32.1 to qter (Figure <figr fid="F1">1c</figr>). In this region there are a total of 6421 SNPs evenly distributed between the upper BAF > 0.5 (3295 SNPs) and the lower BAF &lt; 0.5 (3125 SNPs) parts. Of these 6421 SNPs, 927 SNPs have not been assigned a genotype by BeadStudio. 757 of these 927 failed calls have a BAF value below 0.5, showing that the cause of the observed asymmetry in BAF also affects genotyping. The BAF asymmetry also influences analysis methods for detecting allelic imbalance. Recently, the SOMATICs algorithm was proposed as a solution for detecting allelic imbalances in heterogeneous tumor samples using Infinium II data <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp>. The algorithm divides the BAF profile into three bands (red, green, and blue) based on fixed BAF cut-offs. Asymmetry in BAF estimates causes regions of apparent identical allelic imbalance close to the fixed cut-offs to be identified in different bands (see Additional file <supplr sid="S1">1</supplr>). For copy number estimates asymmetry also exists for regions of CN loss and CN gain (Figure <figr fid="F1">1d</figr>). The asymmetry appears to be caused by an uncorrected curvature between the X and Y intensities for the two alleles (Figure <figr fid="F1">1e</figr>), and an unequal distribution of X and Y values (Figure <figr fid="F1">1f</figr>). We conclude that there seems to be a dye intensity bias between the two channels used in the Infinium II assay and that the bias remains after using the normalization in BeadStudio.</p>
				<suppl id="S1">
					<title>
						<p>Additional file 1</p>
					</title>
					<text>
						<p>
							<b>Supplementary figures.</b> This file contains supplementary figures on the effect of BAF asymmetry on downstream analysis, a comparison of CN estimates before and after tQN, and a comparison of BAF asymmetry for regions of allelic imbalance before and after tQN.</p>
					</text>
					<file name="1471-2105-9-409-S1.pdf">
						<p>Click here for file</p>
					</file>
				</suppl>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Occurrence of asymmetrical B allele frequencies and copy number estimates</p>
					</caption>
					<text>
						<p>
							<b>Occurrence of asymmetrical B allele frequencies and copy number estimates.</b> Urothelial tumor UC152_I hybridized on an Infinium 370 k BeadChip is shown. CNV probes have been removed. (a) B allele frequency for chromosome 1. (b)Mirrored B allele frequency (mBAF) for chromosome 1, with individual SNPs colored according to BAF values: less than 0.5 (orange), above 0.5 (blue) showing the asymmetry of BAF values around 0.5. (c) BAF profile of chromosome 1, with individual SNPs colored according to genotype calls: AA (green), AB (yellow), BB (red) and no calls (gray). The cause of the BAF asymmetry also affects genotyping as seen for SNPs not assigned to a genotype (gray), which in the region 1q32.1 to qter (highlighted with a light blue background) predominantly are present with BAF &lt; 0.5. (d) Copy number estimates (Log R ratio) for chromosome 1, with individual SNPs colored according to genotype. The cause of the BAF asymmetry also affects copy number estimates as seen for regions of gain and loss, where AA and BB SNPs do not overlap. (e)  Scatter plot of normalized allele intensities X and Y with individual SNPs colored according to genotype. A lowess regression line (solid) for heterozygous SNPs and the expected X = Y line (dashed) are superimposed. (f) Boxplots of the distributions of allele intensities X (green) and Y (red).</p>
					</text>
					<graphic file="1471-2105-9-409-1"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Correction of dye intensity bias in HapMap samples using quantile normalization</p>
				</st>
				<p>Since the two alleles for each SNP are, with respect to haplotypes, arbitrarily associated with the X and Y intensities, normalized X and Y intensities should, in contrast to figure <figr fid="F1">1f</figr>, be expected to have essentially equal intensity distributions. Quantile normalization can be used to generate identical distributions from a set of distributions <abbrgrp>
						<abbr bid="B9">9</abbr>
					</abbrgrp>. To investigate the effect of within sample QN of X and Y intensities from normal samples, we performed QN on X and Y intensities from HapMap samples used to generate the reference data sets for the Illumina 300 k version 1 (n = 111), 300 k version 2 (n = 120), 370 k (n = 123) and 550 k (n = 120) BeadChips. For each sample and SNP we calculated new normalized theta and R values thereby generating QN reference data sets. QN has been extensively used to normalize one-channel microarray expression data such that identical intensity distributions are generated for a set of arrays (between array normalization) <abbrgrp>
						<abbr bid="B9">9</abbr>
					</abbrgrp>. Here we instead propose to use QN between channels within two-channel SNP arrays.</p>
				<p>For each reference data set we computed new BAF and CN estimates and compared these estimates to BeadStudio data. Using QN we obtained CN estimates with significantly lower standard deviations (SD) for three of four reference data sets (Table <tblr tid="T1">1</tblr>). The mean decrease in SD for CN estimates was 15 &#8211; 26% for the 300 k v2, 370 k and 550 k data sets. For the Illumina 300 k v1 set, QN did not show any effect. Intriguingly, the single sample 300 k v1 BeadChips has a significantly lower variation of CN estimates than the Illumina version 2 Duo 300 k BeadChips (Table <tblr tid="T1">1</tblr>).</p>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Comparison of Log R ratio standard deviations between BeadStudio and quantile normalized HapMap data.</p>
					</caption>
					<tblbdy cols="6">
						<r>
							<c ca="center">
								<p>Platform</p>
							</c>
							<c ca="center">
								<p>HapMap samples</p>
							</c>
							<c ca="center">
								<p>Mean Log R ratio SD* BeadStudio</p>
							</c>
							<c ca="center">
								<p>Mean Log R ratio SD* QN</p>
							</c>
							<c ca="center">
								<p>p-value SD<sub>QN </sub>&lt; SD<sub>BeadStudio </sub>**</p>
							</c>
							<c ca="center">
								<p>Mean decrease in SD (%) ***</p>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>300 k v1</p>
							</c>
							<c ca="center">
								<p>111</p>
							</c>
							<c ca="center">
								<p>0.134</p>
							</c>
							<c ca="center">
								<p>0.136</p>
							</c>
							<c ca="center">
								<p>0.99</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>300 k v2</p>
							</c>
							<c ca="center">
								<p>120</p>
							</c>
							<c ca="center">
								<p>0.197</p>
							</c>
							<c ca="center">
								<p>0.168</p>
							</c>
							<c ca="center">
								<p>2.2*10<sup>-16</sup>
								</p>
							</c>
							<c ca="center">
								<p>15</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>370 k</p>
							</c>
							<c ca="center">
								<p>123</p>
							</c>
							<c ca="center">
								<p>0.262</p>
							</c>
							<c ca="center">
								<p>0.193</p>
							</c>
							<c ca="center">
								<p>2.2*10<sup>-16</sup>
								</p>
							</c>
							<c ca="center">
								<p>26</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>550 k</p>
							</c>
							<c ca="center">
								<p>120</p>
							</c>
							<c ca="center">
								<p>0.209</p>
							</c>
							<c ca="center">
								<p>0.160</p>
							</c>
							<c ca="center">
								<p>2.2*10<sup>-16</sup>
								</p>
							</c>
							<c ca="center">
								<p>23</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*: The SD of Log R ratio was calculated for each sample. The mean SD for all samples is shown.</p>
						<p>**: Paired two-sided t-test. H0: &#916; = mean Log R ratio SD<sub>QN </sub>&#8211; mean Log R ratio SD<sub>BeadStudio </sub>=0</p>
						<p>***: (1- (mean Log R ratio SD<sub>QN</sub>)/(mean Log R ratio SD<sub>BeadStudio</sub>))*100</p>
					</tblfn>
				</tbl>
				<p>QN also showed a positive effect on allelic intensity ratios, generating lower standard deviations and more centralized theta positions for heterozygous SNPs (Table <tblr tid="T2">2</tblr>). Interestingly, it can be observed in table <tblr tid="T2">2</tblr> that the average theta value for heterozygous SNPs differs from the expected 0.5 for all uncorrected and QN reference data sets. QN shows the least deviation from the expected value for all data sets, and also a clearly significant decrease in theta SD for samples across all data sets compared to BeadStudio data (Table <tblr tid="T2">2</tblr>).</p>
				<tbl id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Comparison of effects on allelic intensity ratios between Illumina BeadStudio and quantile normalized HapMap data.</p>
					</caption>
					<tblbdy cols="5">
						<r>
							<c ca="center">
								<p>Platform</p>
							</c>
							<c ca="center">
								<p>HapMap samples</p>
							</c>
							<c ca="center">
								<p>Mean Theta<sub>AB </sub>&#177; mean SD BeadStudio*</p>
							</c>
							<c ca="center">
								<p>Mean Theta<sub>AB </sub>&#177; mean SD QN*</p>
							</c>
							<c ca="center">
								<p>Paired p-value theta SD<sub>QN </sub>&lt; SD<sub>BeadStduio </sub>**</p>
							</c>
						</r>
						<r>
							<c cspan="5">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>300 k v1</p>
							</c>
							<c ca="center">
								<p>111</p>
							</c>
							<c ca="center">
								<p>0.581 &#177; 0.095</p>
							</c>
							<c ca="center">
								<p>0.454 &#177; 0.087</p>
							</c>
							<c ca="center">
								<p>2.2*10<sup>-16</sup>
								</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>300 k v2</p>
							</c>
							<c ca="center">
								<p>120</p>
							</c>
							<c ca="center">
								<p>0.595 &#177; 0.097</p>
							</c>
							<c ca="center">
								<p>0.457 &#177; 0.086</p>
							</c>
							<c ca="center">
								<p>2.2*10<sup>-16</sup>
								</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>370 k</p>
							</c>
							<c ca="center">
								<p>123</p>
							</c>
							<c ca="center">
								<p>0.594 &#177; 0.097</p>
							</c>
							<c ca="center">
								<p>0.460 &#177; 0.082</p>
							</c>
							<c ca="center">
								<p>2.2*10<sup>-16</sup>
								</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>550 k</p>
							</c>
							<c ca="center">
								<p>120</p>
							</c>
							<c ca="center">
								<p>0.608 &#177; 0.099</p>
							</c>
							<c ca="center">
								<p>0.451 &#177; 0.084</p>
							</c>
							<c ca="center">
								<p>2.2*10<sup>-16</sup>
								</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*: The mean and SD of theta was calculated for each sample. The mean of these values for all samples are shown.</p>
						<p>**: Paired two-sided t-test. H0: &#916; = SD<sub>QN </sub>- SD<sub>BeadStudio </sub>=0</p>
					</tblfn>
				</tbl>
			</sec>
			<sec>
				<st>
					<p>The intensity transformation introduced by QN can negatively affect allelic intensity ratio estimates</p>
				</st>
				<p>The deviation from theta = 0.5 for heterozygous SNPs in HapMap samples indicates that an imbalance in the X and Y intensity distributions remains after QN (Table <tblr tid="T2">2</tblr>). The imbalance in theta affects BAF estimates through the calibration of theta into BAF using the HapMap reference genotype clusters. Part of the imbalance can be explained by an uncorrected curvature between X and Y intensities that prior to QN is present for both tumor samples (Figure <figr fid="F1">1e</figr>) and HapMap samples (Figure <figr fid="F2">2a</figr>). To investigate the relationship between allelic intensity ratios and overall intensity we created MR plots where M = log<sub>2</sub>(Y/X) and R = log<sub>10</sub>(X + Y) similar to conventional MA plots <abbrgrp>
						<abbr bid="B16">16</abbr>
					</abbrgrp>. Consequently, in MR plots heterozygote SNPs should have an M value of 0. As expected from figure <figr fid="F2">2a</figr>, curvature is present prior to QN in the MR plot of HapMap sample NA06985 for the AB, BB and AA SNP populations (Figure <figr fid="F2">2b</figr>). The curvature is highlighted by the superimposed lowess curve for each genotype population and the slope of a fitted linear regression line through each population. After QN there is less curvature, although not fully removed (Figure <figr fid="F2">2c</figr>).</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Intensity transformations of X and Y by quantile normalization</p>
					</caption>
					<text>
						<p>
							<b>Intensity transformations of X and Y by quantile normalization.</b> HapMap sample NA06985 hybridized on an Infinium 370 k BeadChip is shown. SNPs have been colored based on individual genotype calls: AA (green), AB (yellow), and BB (red). SNPs without genotype call are excluded. (a) Scatter plot of BeadStudio allele intensities X and Y. A lowess regression line for heterozygous SNPs is superimposed (solid) together with the expected X = Y line (dashed) illustrating that the dye intensity bias affects heterozygous SNPs. (b) MR plot of BeadStudio allele intensities for chromosome 8 with superimposed lowess regression lines (solid) for each genotype population and locally fitted linear regression lines (dashed blue). The mean M value for each genotype population is indicated by horizontally dashed black lines. (c) MR plot of quantile normalized allele intensities for chromosome 8 with superimposed lowess regression lines (solid black) and locally fitted linear regression lines (dashed blue) for each genotype population, separately. (d) Scatter plot of the intensity transformation X<sub>QN</sub>/X vs X from quantile normalization. SNPs are colored by genotype. SNPs with low X intensity values (predominantly genotyped as BB) are increased significantly in intensity by QN. (e) Scatter plot of the intensity transformation Y<sub>QN</sub>/Y vs Y from quantile normalization. SNPs are colored by genotype. (f) Histogram of BeadStudio X intensities. (g) Histogram of BeadStudioY intensities.</p>
					</text>
					<graphic file="1471-2105-9-409-2"/>
				</fig>
				<p>To address how to improve QN, we investigated how QN transforms the X and Y intensities for HapMap sample NA06985 (Figure <figr fid="F2">2d</figr> and <figr fid="F2">2e</figr>). Low values of X are increased with relatively large factors in intensity, while Y values are generally decreased and scaled with smaller factors. SNPs with a low value of X are predominantly genotyped as BB, and the number of SNPs affected by the increase in X is large as seen by comparing the transformation (Figure <figr fid="F2">2d</figr>) with an X intensity histogram (Figure <figr fid="F2">2f</figr>). The same pattern is not observed for the Y intensity, for which the large majority of SNPs are transformed to a lower intensity (Figures <figr fid="F2">2e</figr> and <figr fid="F2">2g</figr>). Thus, QN introduces a transformation that results in a large increase for low X values, which affects a large number of SNPs.</p>
				<p>The transformation imbalance does not appear to affect HapMap CN estimates for which the standard deviation is decreased in three of four reference data sets (Table <tblr tid="T1">1</tblr>). For CN estimates an increase of a low X value is not critical since the corresponding Y intensity is large and dominate the additive R value. However, an increase of low X values will cause more variation of the allelic ratios for SNPs with high values of Y (predominately genotyped as BB). An increase in the variation of allelic ratios for SNPs with low values of X will have the largest effect on regions with loss of allele A (thus dominated by Y with theta and BAF values close to 1). The impact of the transformation imbalance is further increased if the copy number loss is present in the absolute majority of investigated cells and not dampened by contaminating normal cells. To exemplify the effect of the transformation imbalance, the hemizygous loss of chromosome 9 in the urothelial carcinoma UC456_R is shown for both BeadStudio data (Figure <figr fid="F3">3a</figr>) and QN data (Figure <figr fid="F3">3b</figr>). While QN results in a reduced variation for SNPs with BAF values close to 0 (which have large X values), this improvement is counteracted with a large increase in the variation for SNPs with BAF values close to 1 (which have small X values). Furthermore, the transformation imbalance also appears to affect the correction of BAF asymmetry negatively.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Effects of quantile normalization on allelic intensity ratios</p>
					</caption>
					<text>
						<p>
							<b>Effects of quantile normalization on allelic intensity ratios.</b> Two urothelial carcinomas, UC456_R and UC152_I, analyzed using Infinium 370 k BeadChips are shown. SNPs have been colored based on individual genotype calls: AA (green), AB (yellow), BB (red), CNV probes (blue) and no calls (gray). Horizontal dashed lines represent BAF 0.05, 0.1, 0.5, 0.9 and 0.95, respectively. (a) BeadStudio normalized B allele frequency profile for chromosome 9 of UC456_R. (b) QN normalized B allele frequency profile for chromosome 9 of UC456_R. Compared to BeadStudio (a), QN increases variation for SNPs close to 1 in BAF and decreases variation for SNPs close to 0 in BAF. (c) tQN normalized B allele frequency profile for chromosome 9 of UC456_R. Application of a threshold for the increase in intensity of X and Y by QN lowers the variation of SNPs close to 1 in BAF compared to QN alone (b), and creates BAF values that are more symmetrical around BAF = 0.5 compared to BeadStudio (a). (d) tQN normalized B allele frequency profile for chromosome 1 of UC152_I. The region 1q32.1 to qter discussed in the text is highlighted with a light blue background. CNV probes have been removed.</p>
					</text>
					<graphic file="1471-2105-9-409-3"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Incorporation of an intensity transformation threshold for QN improves allelic intensity ratio estimates</p>
				</st>
				<p>The negative effect of QN on allelic intensity ratios could potentially be circumvented by limiting the factor with which X intensity values are increased. Hence, we introduced a threshold for the QN intensity transformations to limit the increase of X and Y values before calculation of the allelic intensity ratio. In all our analyses, we used a threshold of 1.5 for the factor with which X and Y values could maximally be increased. While the threshold is applied identically to both X and Y transformations, it essentially only influences X values. A value of 1.5 appears reasonable as it incorporates the majority of SNPs with low X values (compare Figures <figr fid="F2">2d</figr> and <figr fid="F2">2e</figr>) without affecting the corresponding Y values, but the threshold may potentially be further improved by tuning. Using this QN modified with a threshold, tQN, we generated new tQN reference data sets. The application of the threshold markedly improved quantile normalized tumor BAF data by removing asymmetry and reducing variation (Figures <figr fid="F3">3c</figr> and <figr fid="F3">3d</figr>). Additionally, the removed asymmetry for allelic intensity ratios may provide a higher probability for SNPs to be genotyped, e.g., as AA for chromosome 9 of urothelial carcinoma UC456_R (Figure <figr fid="F3">3c</figr> compared to <figr fid="F3">3a</figr>) or as AA on 1q32.1 to qter for urothelial carcinoma UC152_I (Figure <figr fid="F3">3d</figr> compared to <figr fid="F1">1c</figr>). Consequently, tQN of Infinium II data could increase the genotype call rate, a variable commonly used to assess sample quality. An increased call rate for tumor specimens may also be beneficial for downstream LOH analysis software relying on genotype calls such as dChipSNP <abbrgrp>
						<abbr bid="B21">21</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Systematic investigation of BAF asymmetry in tumor samples before and after tQN</p>
				</st>
				<p>To more comprehensively investigate BAF asymmetry before and after tQN, we divided 35 whole-genome tumor BAF profiles into an upper and lower part along the 0.5 axes. BAF values for each part were converted to mBAF, similar to figure <figr fid="F1">1b</figr>. Next, each part was separately segmented to find regions of consistent allelic proportion <abbrgrp>
						<abbr bid="B22">22</abbr>
					</abbrgrp>. If no asymmetry is present for a defined genomic region the difference between segmented mBAF values for the upper and lower part of the BAF profile should be zero. We found that tQN results in significantly lower asymmetry for regions of apparent allelic imbalance in both paired and unpaired tumor samples across different Infinium II platforms (Figure <figr fid="F4">4</figr>). Essentially identical results were obtained irrespectively of which part of the BAF profile that was used to define the investigated regions. As expected from the upward shift of heterozygous theta positions (Table <tblr tid="T2">2</tblr>), the BeadStudio asymmetry is predominantly the result of higher mBAF values for the upper part of the BAF profile than for the lower part. This asymmetry is consistent with an upward shift of the entire BAF plot, as also observed in figures <figr fid="F1">1a</figr> and <figr fid="F3">3a</figr>. tQN showed the same positive effect on allelic intensity ratios for HapMap samples as shown for QN in table <tblr tid="T2">2</tblr>. For heterozygous SNPs, standard deviations were essentially identical for tQN and QN, whereas theta positions were marginally more centralized for tQN (data not shown).</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Comparison of BAF asymmetry for regions of allelic imbalance before and after tQN across different Infinium II platforms</p>
					</caption>
					<text>
						<p>
							<b>Comparison of BAF asymmetry for regions of allelic imbalance before and after tQN across different Infinium II platforms.</b> BAF profiles for 35 tumor samples were divided into an upper (BAF > 0.5) and lower (BAF &lt; 0.5) part, transformed to mBAF and separately segmented. For a defined genomic region, the average difference in segmented mBAF between the upper and lower part is expected to be zero if no asymmetry is present. Genomic regions were based on segmentation breakpoints of the upper BAF part. Only regions > 30 SNPs and with a segmented mBAF value > 0.6 in the upper and/or lower part were used in the comparisons. Black squares correspond to BeadStudio data and red triangles correspond to tQN data. Error bars for each sample and normalization method show the interquartile range (IQR). (a) BAF asymmetry for 14 matched tumor-normal samples. The black bar denotes 11 paired urothelial tumors from data set 4 and the white bar denotes the paired tumor samples from data set 8. tQN data systematically show less difference between the upper and lower BAF part compared to BeadStudio for the 14 matched tumors. (b) BAF asymmetry for 21 unmatched urothelial, breast and CLL tumor samples. The black bar denotes the 5 unpaired urothelial tumors from data set 4, the blue bar denotes CLL tumors from data set 7 and the red bar denotes breast tumors from data set 6. tQN data systematically show less difference between the upper and lower BAF part compared to BeadStudio for the 21 unmatched tumors.</p>
					</text>
					<graphic file="1471-2105-9-409-4"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Effects of tQN on copy number estimates for tumor and normal samples</p>
				</st>
				<p>Having established that tQN corrects for asymmetry in allelic intensity ratio estimates, we investigated the effects of tQN on CN estimates compared to BeadStudio. To this aim, we applied tQN to Infinium II data sets containing both blood and tumor samples and performed three comparisons. First, we investigated whether tQN increase or decrease the response in log R ratio to CNAs. Second, we investigated if tQN decrease variation in CN estimates. Finally, we applied a CNV calling algorithm to tQN normalized HapMap data to investigate the overlap of identified regions compared to BeadStudio data.</p>
				<p>To investigate whether tQN increase or decrease the response in log R ratios to CNAs compared to BeadStudio we applied segmentation to both tQN and BeadStudio tumor data. For each sample we calculated the difference in segmented log R ratios between BeadStudio data and tQN data. For genomic regions with log R ratio > 0 and &lt; 0, respectively, the differences were calculated separately such that a positive difference for both types of regions corresponds to a better log R ratio response to CNAs for BeadStudio normalization compared to tQN. We observed small differences for all four data sets (Figure <figr fid="F5">5a</figr>). For the urothelial tumors, BeadStudio showed a better response for segments with gains, while tQN showed a better response for segments with losses. Such opposing findings indicate that the two methods result in different centering of the data rather than in different response to CNAs. Thus, tQN does not appear to alter the log R ratio response to CNAs compared to BeadStudio.</p>
				<fig id="F5">
					<title>
						<p>Figure 5</p>
					</title>
					<caption>
						<p>Effects of tQN on copy number estimates across different Infinium platforms</p>
					</caption>
					<text>
						<p>
							<b>Effects of tQN on copy number estimates across different Infinium platforms.</b> (a) Effect of tQN on log R ratio response to CNAs compared to BeadStudio data for 36 tumor samples. For each sample the mean difference in segmented log R ratio between BeadStudio and tQN data is plotted. For segments with log R ratio > 0 (red) the difference is BeadStudio minus tQN. For segments with log R ratio &lt; 0 (green) the difference is tQN minus BeadStudio. A positive difference therefore corresponds to a better log R ratio response to CNAs for BeadStudio normalization compared to tQN for both types of segments. Error bars for each sample show the IQR of the difference. Horizontal bars denote the investigated data sets, urothelial tumors from data set 4 (black), breast/colon tumor samples from data set 8 (white), CLL samples from data set 7 (blue) and breast tumors from data set 6 (red). Only segments > 20 SNPs have been included. Segment definition was based on breakpoints from segmentation of BeadStudio copy number data. The small difference in segmented values between BeadStudio and tQN data indicates that tQN does not affect the log R ratio response to CNAs. (<b>b</b>) Boxplots of sample adaptive thresholds for BeadStudio normalized data (white) and tQN data (red) for 6 data sets. Top axis indicates the number of samples in each data set. tQN results in lower sample adaptive thresholds in four out of six data sets and equal thresholds in the remaining two. (<b>c</b>) tQN copy number estimates for chromosome 1 for urothelial tumor UC152_I with individual SNPs colored according to genotype calls: AA (green), AB (yellow), BB (red) and no calls (gray). CNV probes have been removed. tQN removes the asymmetry between AA and BB SNPs for regions of gain and loss observed in BeadStudio normalized data (compare to figure 1d).</p>
					</text>
					<graphic file="1471-2105-9-409-5"/>
				</fig>
				<p>To investigate the effect on variation in CN estimates by tQN we computed sample adaptive noise thresholds (SATs) for tQN and BeadStudio data as previously described <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp>. We obtained significantly lower SATs using tQN for four of six tested data sets, while SATs were essentially unchanged for the remaining two data sets (Figure <figr fid="F5">5b</figr>). The lack of effect by tQN on tumors hybridized on Infinium 300 k v1 BeadChips is in concordance with the reference data set (Table <tblr tid="T1">1</tblr>). The lack of improvement by tQN for tumors in the breast cancer data set is more difficult to explain. All tumors in this data set are either hyper-diploid or of high aneuploidy resulting in highly unbalanced CN profiles. Unbalanced CN profiles may be problematic for the affine transformation in BeadStudio, which scales the data based on that homozygous SNPs on average should exist in two copies, and therefore may confound normalization and data interpretation for aneuploid tumors <abbrgrp>
						<abbr bid="B5">5</abbr>
					</abbrgrp>. A detailed investigation of this hypothesis is however not within the scope of the current study. The CLL data set is part of a comparison of four different array platforms for detection of CNAs and LOH <abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp>. In that study, Gunnarsson et al. compared the average copy number ratio and standard deviation for the normal chromosome 1 in all samples between the different platforms. The Illumina platform showed the highest average standard deviation (0.26) of the four platforms. We found that the average standard deviation for chromosome 1 after tQN was 0.21, which is comparable to the results obtained by Gunnarsson et al. for the Agilent platform (0.20). Furthermore, when applying tQN the asymmetry in CN estimates observed in figure <figr fid="F1">1d</figr> was removed (Figure <figr fid="F5">5c</figr>). The effect of tQN on CN and BAF estimates for various tumor and normal samples is further illustrated in Additional file <supplr sid="S1">1</supplr>. In conclusion, we find that tQN of Infinium II data is beneficial for CN estimates as variation is reduced while the dynamic response in copy number ratios to CNAs remains unchanged. A decreased variation for CN estimates can be beneficial for downstream analysis and detection of CNAs.</p>
				<p>To investigate whether the reduced variation in copy number estimates by tQN affected CNV detection compared to BeadStudio we applied the PennCNV algorithm <abbrgrp>
						<abbr bid="B24">24</abbr>
					</abbrgrp> to the HapMap 550 k reference data set. The overlap of identified SNPs between BeadStudio and tQN data was on average 80% across the 120 HapMap samples for CNV regions larger than 8 SNPs. Importantly, the overlap percentage increased for larger CNV regions. Even though we cannot validate the correctness of CNV regions identified in either tQN or BeadStudio data these findings indicate that tQN reduces noise without removing biologically relevant regions.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>We have developed a normalization method that improves the quality of data obtained from Illumina Infinium II genotyping arrays. We show that both allelic intensity ratio and copy number estimates are improved by using a quantile normalization strategy with a threshold for the intensity transformations (tQN) for correction of intensity dye bias when Infinium II BeadChips are applied to cancer samples. This dye bias results in an asymmetric detection of the two alleles for each SNP leading to asymmetry for both allelic intensity ratios and copy number estimates. Importantly, tQN not only removes such asymmetry but also reduces variation in copy number estimates. Essential for the improved result is to create reference data sets for calibration of B allele frequency and copy number estimates that are normalized with the same method that is applied to the investigated samples. The normalization strategy was successfully applied both to normal blood samples and tumor specimens with varying tumor heterogeneity and normal cell contamination. Our strategy is applied on a sample per sample basis and we have not evaluated if Infinium II data can be improved by using between array normalization. Further optimization of the normalization approach for Infinium II data should include adjusting X and Y intensities on a sub bead-level instead of the currently used summarized bead level to address the initially unequal X and Y distributions. Such a correction would presumably alleviate the need for an additional normalization. Potentially, such improvements may also address the lower ratio response to CNAs and signal to noise observed with SNP-CGH compared to conventional aCGH <abbrgrp>
					<abbr bid="B23">23</abbr>
					<abbr bid="B25">25</abbr>
				</abbrgrp>.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Data sets</p>
				</st>
				<p>We used 10 data sets for evaluation of the QN strategy. Data set 1 (HapMap 300 k v2) consists of 120 HapMap <abbrgrp>
						<abbr bid="B26">26</abbr>
					</abbrgrp> samples hybridized on Illumina HumanHap300 version 2 Genotyping BeadChips (Courtesy of Illumina Inc., San Diego, CA). Data set 2 (HapMap 370 k) consists of 123 HapMap samples hybridized on Illumina HumanCNV370 Genotyping BeadChips (Courtesy of Illumina Inc.). Data set 3 (HapMap 550 k) consists of 120 HapMap samples hybridized on Illumina HumanHap550 Genotyping BeadChips (Courtesy of Illumina Inc.). Data set 4 (urothelial tumors 370 k) consists of 17 urothelial carcinomas hybridized on HumanCNV370 Genotyping BeadChips. Data set 5 (normal 370 k) consists of 17 normal samples hybridized on Illumina HumanCNV370 Genotyping BeadChips. Samples in data set 5 displayed call rates between 99.5 to 99.8%. Twelve of the samples in data sets 5 and 6 are paired tumor-normal samples from the same individual. Data set 6 (breast tumors 550 k) consists of six breast tumors hybridized on Illumina HumanHap550 Genotyping BeadChips. Data set 7 (leukemia 300 k v2) consists of ten CLL cases hybridized on Illumina HumanHap300 version 2 Genotyping BeadChips <abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp>. Data set 8 (breast/colon 300 k v1) consists of six hybridizations on Illumina HumanHap300 version 1 Genotyping BeadChips representing two breast cancers and one colon cancer with matching normal samples (Courtesy of Illumina Inc.). Data set 9 (HapMap 300 k v1) consists of 111 HapMap samples hybridized on Illumina HumanHap300 version 1 Genotyping BeadChips (Courtesy of Illumina Inc.). Data set 10 (normal 550 k) consists of one normal sample hybridized 5 times at different DNA concentrations on Illumina HumanHap550 Genotyping BeadChips (obtained from the PennCNV website <abbrgrp>
						<abbr bid="B27">27</abbr>
					</abbrgrp>).</p>
				<p>Chromosomes 1 to 22 were used in all comparisons. Data sets 4 and 5 were generated at the SCIBLU Genomics Centre at Lund University, Sweden <abbrgrp>
						<abbr bid="B28">28</abbr>
					</abbrgrp> and data sets 6 and 7 at the SNP technology platform in Uppsala, Sweden <abbrgrp>
						<abbr bid="B29">29</abbr>
					</abbrgrp> according to manufacturers instructions.</p>
			</sec>
			<sec>
				<st>
					<p>BeadStudio data preprocessing</p>
				</st>
				<p>Fluorescent signals were imported into the BeadStudio software version 3.1 (Illumina Inc) and normalized. For each sample, the normalized fluorescence signal intensities were compared with the signal intensities of a set of reference genotypes, and the log<sub>2</sub>-ratios between sample and reference signals were calculated on a SNP per SNP basis. In addition, the frequency of the B-allele was for each sample estimated based on the reference genotype clusters <abbrgrp>
						<abbr bid="B5">5</abbr>
					</abbrgrp>. Normalized X and Y intensities were exported for further analysis. Manifest used for 300 k version 2 BeadChips was HumanHap300v2_A. Manifest used for 300 k version 1 BeadChips was BDCHP-1x10-HUMANHAP300v1-1_11219278_C. Manifest used for 370 k BeadChips was HumanCNV370v1_C. Manifest used for 550 k BeadChips was HumanHap550v3_A. Mirrored B allele frequencies (mBAF) were calculated as mBAF = abs(BAF - 0.5) + 0.5 <abbrgrp>
						<abbr bid="B22">22</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Quantile normalization (tQN) of Infinium II data</p>
				</st>
				<p>tQN was performed individually for each sample using affine normalized intensities (X, Y) from BeadStudio and the R <abbrgrp>
						<abbr bid="B30">30</abbr>
					</abbrgrp> package limma <abbrgrp>
						<abbr bid="B31">31</abbr>
					</abbrgrp>. The combined SNP intensity, R, was calculated from tQN intensities. A threshold of 1.5 for the intensity transformations X<sub>QN</sub>/X and Y<sub>QN</sub>/Y was applied prior to calculation of theta: X<sub>QN </sub>intensities larger than 1.5 * X were set to 1.5 * X; Y<sub>QN </sub>intensities larger than 1.5 * Y were set to 1.5 * Y. Theta, B allele frequencies and copy number estimates were calculated from tQN normalized intensities and reference data sets as previously described <abbrgrp>
						<abbr bid="B5">5</abbr>
					</abbrgrp>. CNV probes in analyzed samples were excluded from normalization due to lack of genotype information. Instead, for these probes the BeadStudio BAF and log R ratio values were used.</p>
			</sec>
			<sec>
				<st>
					<p>Construction of tQN corrected reference data sets</p>
				</st>
				<p>Quantile normalized reference data sets were created from HapMap data sets using intensities (X, Y) normalized in BeadStudio as the starting point. For each sample and SNP, quantile normalized R and theta values were calculated as previously described <abbrgrp>
						<abbr bid="B5">5</abbr>
					</abbrgrp>. Cluster positions in theta and R were calculated for each SNP and genotype based on genotype information (AA, AB and BB) using the mean of all samples for the specific SNP and genotype. SNPs with no cluster positions (no genotype assignment across all HapMap samples) were excluded from the analysis. BAF and copy number estimates for SNPs with only one genotype across all HapMap samples were calculated using the value of the single cluster position. Theta values for SNPs with one heterozygous and only one homozygous cluster position (e.g. AB and AA) were imputed for the missing homozygous cluster position (e.g. BB) by the median of all theta values for the missing genotype. Corresponding R estimates for the missing genotype were set as missing values. For CNV probes the original BeadStudio cluster positions were kept.</p>
			</sec>
			<sec>
				<st>
					<p>Segmentation of allelic ratios for investigation of BAF asymmetry</p>
				</st>
				<p>For matched tumor-normal samples, SNPs homozygous in both the tumor and its matched normal sample were first removed from the tumor BAF profile. Next, each tumor sample was split into an upper and lower data set, based on BAF values > 0.5 or &lt; 0.5. Both data sets were mirrored from BAF to mBAF (compare figure <figr fid="F1">1b</figr>) and segmented by CBS <abbrgrp>
						<abbr bid="B32">32</abbr>
					</abbrgrp> using default settings as recently described <abbrgrp>
						<abbr bid="B22">22</abbr>
					</abbrgrp>. Segments from the lower and upper part of the BAF profile were cross-mapped and the difference in the average segmented mBAF values between the upper and lower part for each genomic segment was calculated. If no asymmetry is present, the difference between the upper and lower part of the BAF profile should be zero. Only segments larger than 30 SNPs in size and with a segmented mBAF value > 0.6 in the upper and/or lower part was used for evaluation of asymmetry. For tumor samples without a matched normal, SNPs with BAF > 0.97 or &lt; 0.03 were removed prior to splitting BAF profiles into an upper and lower part.</p>
			</sec>
			<sec>
				<st>
					<p>Copy number analysis</p>
				</st>
				<p>Segmentation was performed on normalized Log R ratios for each sample, platform and method using CBS <abbrgrp>
						<abbr bid="B32">32</abbr>
					</abbrgrp>. The significance level for accepting change-points, &#945;, was set to 0.001 for all analyzed data sets and normalization methods. For comparisons between methods only segmented regions > 20 SNPs were used.</p>
			</sec>
			<sec>
				<st>
					<p>Sample adaptive thresholds</p>
				</st>
				<p>Sample adaptive thresholds for CN estimates were calculated as previously described <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp>, using a smoothing window of 21 SNPs, the median of the SD distribution as cut-off, and a scaling factor of 2 for all analyzed data sets and normalization methods.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Availability and requirements</p>
			</st>
			<p>Project name: tQN</p>
			<p>Project home page: <url>http://baseplugins.thep.lu.se/wiki/se.lu.onk.IlluminaSNPNormalization</url>
			</p>
			<p>Operating system(s): Any operating system supporting Perl and R.</p>
			<p>Programming language: Perl and R.</p>
			<p>Other requirements: Perl modules File::Spec, Getopt::Long, IO::File and Pod::Usage. R package limma.</p>
			<p>License: GNU GPL</p>
			<p>Any restrictions to use by non-academics: None</p>
			<p>Data set 6 (breast tumors 550 k) is available through NCBI's Gene Expression Omnibus <abbrgrp>
					<abbr bid="B33">33</abbr>
				</abbrgrp> with accession GSE11977.</p>
		</sec>
		<sec>
			<st>
				<p>Abbreviations</p>
			</st>
			<p>aCGH: array-based CGH; BAF: B allele frequency; CBS: circular binary segmentation; CGH: comparative genomic hybridization; CLL: chronic lymphocytic leukemia; CN: copy number; CNA: copy number aberration; CNV: copy number variation; IQR: interquartile range; LOH: loss of heterozygosity; mBAF: mirrored B allele frequency; QN: quantile normalization; SAT: sample adaptive threshold; SD: standard deviation; SNP: single nucleotide polymorphism; tQN: thresholded quantile normalization; WGG: whole genome genotyping.</p>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>JS and MR conceived the study and developed the method. JS implemented the method and performed the analyses. JS and MR interpreted results and wrote the manuscript. JVC and DL contributed to discussions. GJ, RR, MH and &#197;B contributed samples. All authors approved the final manuscript.</p>
		</sec>
	</bdy>
   <bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>Financial support was provided by the Swedish Cancer Society, the Knut &amp; Alice Wallenberg Foundation, the Foundation for Strategic Research through the Lund Centre for Clinical Cancer Research (CREATE Health), the American Cancer Society and the IngaBritt and Arne Lundberg Foundation. The SCIBLU Genomics center is supported by governmental funding of clinical research within the National Health Services (ALF) and by Lund University.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Comparative genomic hybridization</p>
				</title>
				<aug>
					<au>
						<snm>Pinkel</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Albertson</snm>
						<fnm>DG</fnm>
					</au>
				</aug>
				<source>Annu Rev Genomics Hum Genet</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>331</fpage>
				<lpage>354</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.genom.6.080604.162140</pubid>
						<pubid idtype="pmpid" link="fulltext">16124865</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Aneuploidy and cancer</p>
				</title>
				<aug>
					<au>
						<snm>Rajagopalan</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lengauer</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2004</pubdate>
				<volume>432</volume>
				<fpage>338</fpage>
				<lpage>341</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature03099</pubid>
						<pubid idtype="pmpid" link="fulltext">15549096</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays</p>
				</title>
				<aug>
					<au>
						<snm>Matsuzaki</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Dong</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Loi</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Di</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Hubbell</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Law</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Berntsen</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Chadha</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Hui</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Yang</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Kennedy</snm>
						<fnm>GC</fnm>
					</au>
					<au>
						<snm>Webster</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Cawley</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Walsh</snm>
						<fnm>PS</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>KW</fnm>
					</au>
					<au>
						<snm>Fodor</snm>
						<fnm>SP</fnm>
					</au>
					<au>
						<snm>Mei</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Nat Methods</source>
				<pubdate>2004</pubdate>
				<volume>1</volume>
				<fpage>109</fpage>
				<lpage>111</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nmeth718</pubid>
						<pubid idtype="pmpid" link="fulltext">15782172</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>A genome-wide scalable SNP genotyping assay using microarray technology</p>
				</title>
				<aug>
					<au>
						<snm>Gunderson</snm>
						<fnm>KL</fnm>
					</au>
					<au>
						<snm>Steemers</snm>
						<fnm>FJ</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Mendoza</snm>
						<fnm>LG</fnm>
					</au>
					<au>
						<snm>Chee</snm>
						<fnm>MS</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2005</pubdate>
				<volume>37</volume>
				<fpage>549</fpage>
				<lpage>554</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/ng1547</pubid>
						<pubid idtype="pmpid" link="fulltext">15838508</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping</p>
				</title>
				<aug>
					<au>
						<snm>Peiffer</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Le</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Steemers</snm>
						<fnm>FJ</fnm>
					</au>
					<au>
						<snm>Chang</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Jenniges</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Garcia</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Haden</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Shaw</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Belmont</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Cheung</snm>
						<fnm>SW</fnm>
					</au>
					<au>
						<snm>Shen</snm>
						<fnm>RM</fnm>
					</au>
					<au>
						<snm>Barker</snm>
						<fnm>DL</fnm>
					</au>
					<au>
						<snm>Gunderson</snm>
						<fnm>KL</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2006</pubdate>
				<volume>16</volume>
				<fpage>1136</fpage>
				<lpage>1148</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1557768</pubid>
						<pubid idtype="pmpid" link="fulltext">16899659</pubid>
						<pubid idtype="doi">10.1101/gr.5402306</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Affymetrix</p>
				</title>
				<url>http://www.affymetrix.com</url>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Illumina</p>
				</title>
				<url>http://www.illumina.com</url>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Whole-genome genotyping with the single-base extension assay</p>
				</title>
				<aug>
					<au>
						<snm>Steemers</snm>
						<fnm>FJ</fnm>
					</au>
					<au>
						<snm>Chang</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Barker</snm>
						<fnm>DL</fnm>
					</au>
					<au>
						<snm>Shen</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Gunderson</snm>
						<fnm>KL</fnm>
					</au>
				</aug>
				<source>Nat Methods</source>
				<pubdate>2006</pubdate>
				<volume>3</volume>
				<fpage>31</fpage>
				<lpage>33</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nmeth842</pubid>
						<pubid idtype="pmpid" link="fulltext">16369550</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>A comparison of normalization methods for high density oligonucleotide array data based on variance and bias</p>
				</title>
				<aug>
					<au>
						<snm>Bolstad</snm>
						<fnm>BM</fnm>
					</au>
					<au>
						<snm>Irizarry</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Astrand</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Speed</snm>
						<fnm>TP</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>19</volume>
				<fpage>185</fpage>
				<lpage>193</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/19.2.185</pubid>
						<pubid idtype="pmpid" link="fulltext">12538238</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms</p>
				</title>
				<aug>
					<au>
						<snm>Barnes</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Freudenberg</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Thompson</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Aronow</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Pavlidis</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2005</pubdate>
				<volume>33</volume>
				<fpage>5914</fpage>
				<lpage>5923</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1258170</pubid>
						<pubid idtype="pmpid" link="fulltext">16237126</pubid>
						<pubid idtype="doi">10.1093/nar/gki890</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data</p>
				</title>
				<aug>
					<au>
						<snm>Carvalho</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Bengtsson</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Speed</snm>
						<fnm>TP</fnm>
					</au>
					<au>
						<snm>Irizarry</snm>
						<fnm>RA</fnm>
					</au>
				</aug>
				<source>Biostatistics</source>
				<pubdate>2007</pubdate>
				<volume>8</volume>
				<fpage>485</fpage>
				<lpage>499</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/biostatistics/kxl042</pubid>
						<pubid idtype="pmpid" link="fulltext">17189563</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Statistical issues in the analysis of Illumina data</p>
				</title>
				<aug>
					<au>
						<snm>Dunning</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Barbosa-Morais</snm>
						<fnm>NL</fnm>
					</au>
					<au>
						<snm>Lynch</snm>
						<fnm>AG</fnm>
					</au>
					<au>
						<snm>Tavare</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ritchie</snm>
						<fnm>ME</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2008</pubdate>
				<volume>9</volume>
				<fpage>85</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">2291044</pubid>
						<pubid idtype="pmpid" link="fulltext">18254947</pubid>
						<pubid idtype="doi">10.1186/1471-2105-9-85</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>High-resolution copy number analysis of paraffin-embedded archival tissue using SNP BeadArrays</p>
				</title>
				<aug>
					<au>
						<snm>Oosting</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Lips</snm>
						<fnm>EH</fnm>
					</au>
					<au>
						<snm>van Eijk</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Eilers</snm>
						<fnm>PH</fnm>
					</au>
					<au>
						<snm>Szuhai</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Wijmenga</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Morreau</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>van Wezel</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2007</pubdate>
				<volume>17</volume>
				<fpage>368</fpage>
				<lpage>376</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1800928</pubid>
						<pubid idtype="pmpid" link="fulltext">17267813</pubid>
						<pubid idtype="doi">10.1101/gr.5686107</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>YH</fnm>
					</au>
					<au>
						<snm>Dudoit</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Luu</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Lin</snm>
						<fnm>DM</fnm>
					</au>
					<au>
						<snm>Peng</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Ngai</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Speed</snm>
						<fnm>TP</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>e15</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">100354</pubid>
						<pubid idtype="pmpid" link="fulltext">11842121</pubid>
						<pubid idtype="doi">10.1093/nar/30.4.e15</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Microarray data normalization and transformation</p>
				</title>
				<aug>
					<au>
						<snm>Quackenbush</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2002</pubdate>
				<issue>32 Suppl</issue>
				<fpage>496</fpage>
				<lpage>501</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/ng1032</pubid>
						<pubid idtype="pmpid" link="fulltext">12454644</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Normalization of cDNA microarray data</p>
				</title>
				<aug>
					<au>
						<snm>Smyth</snm>
						<fnm>GK</fnm>
					</au>
					<au>
						<snm>Speed</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Methods</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>265</fpage>
				<lpage>273</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1046-2023(03)00155-5</pubid>
						<pubid idtype="pmpid" link="fulltext">14597310</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>A stepwise framework for the normalization of array CGH data</p>
				</title>
				<aug>
					<au>
						<snm>Khojasteh</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Lam</snm>
						<fnm>WL</fnm>
					</au>
					<au>
						<snm>Ward</snm>
						<fnm>RK</fnm>
					</au>
					<au>
						<snm>MacAulay</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>274</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1310623</pubid>
						<pubid idtype="pmpid" link="fulltext">16297240</pubid>
						<pubid idtype="doi">10.1186/1471-2105-6-274</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Normalization of array-CGH data: influence of copy number imbalances</p>
				</title>
				<aug>
					<au>
						<snm>Staaf</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Jonsson</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Ringner</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Vallon-Christersson</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>BMC Genomics</source>
				<pubdate>2007</pubdate>
				<volume>8</volume>
				<fpage>382</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">2190775</pubid>
						<pubid idtype="pmpid" link="fulltext">17953745</pubid>
						<pubid idtype="doi">10.1186/1471-2164-8-382</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Spatial normalization of array-CGH data</p>
				</title>
				<aug>
					<au>
						<snm>Neuvial</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Hupe</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Brito</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Liva</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Manie</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Brennetot</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Radvanyi</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Aurias</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Barillot</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2006</pubdate>
				<volume>7</volume>
				<fpage>264</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1523216</pubid>
						<pubid idtype="pmpid" link="fulltext">16716215</pubid>
						<pubid idtype="doi">10.1186/1471-2105-7-264</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>SNP arrays in heterogeneous tissue: highly accurate collection of both germline and somatic genetic information from unpaired single tumor samples</p>
				</title>
				<aug>
					<au>
						<snm>Assie</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>LaFramboise</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Platzer</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Bertherat</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Stratakis</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Eng</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Am J Hum Genet</source>
				<pubdate>2008</pubdate>
				<volume>82</volume>
				<fpage>903</fpage>
				<lpage>915</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.ajhg.2008.01.012</pubid>
						<pubid idtype="pmpid" link="fulltext">18355774</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data</p>
				</title>
				<aug>
					<au>
						<snm>Lin</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Wei</snm>
						<fnm>LJ</fnm>
					</au>
					<au>
						<snm>Sellers</snm>
						<fnm>WR</fnm>
					</au>
					<au>
						<snm>Lieberfarb</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Wong</snm>
						<fnm>WH</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>1233</fpage>
				<lpage>1240</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bth069</pubid>
						<pubid idtype="pmpid" link="fulltext">14871870</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays</p>
				</title>
				<aug>
					<au>
						<snm>Staaf</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Lindgren</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Vallon-Christersson</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Isaksson</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Goransson</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Juliusson</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Rosenquist</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Hoglund</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Borg</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Ringner</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2008</pubdate>
				<volume>9</volume>
				<fpage>R136</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1186/gb-2008-9-9-r136</pubid>
						<pubid idtype="pmpid" link="fulltext">18796136</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Screening for copy-number alterations and loss of heterozygosity in chronic lymphocytic leukemia-A comparative study of four differently designed, high resolution microarray platforms</p>
				</title>
				<aug>
					<au>
						<snm>Gunnarsson</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Staaf</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Jansson</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Ottesen</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Goransson</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Liljedahl</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Ralfkiaer</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Mansouri</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Buhl</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Smedby</snm>
						<fnm>KE</fnm>
					</au>
					<au>
						<snm>Hjalgrim</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Syvanen</snm>
						<fnm>AC</fnm>
					</au>
					<au>
						<snm>Borg</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Isaksson</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Jurlander</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Juliusson</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Rosenquist</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Genes Chromosomes Cancer</source>
				<pubdate>2008</pubdate>
				<volume>47</volume>
				<fpage>697</fpage>
				<lpage>711</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/gcc.20575</pubid>
						<pubid idtype="pmpid" link="fulltext">18484635</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data</p>
				</title>
				<aug>
					<au>
						<snm>Wang</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Hadley</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Glessner</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Grant</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Hakonarson</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Bucan</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2007</pubdate>
				<volume>17</volume>
				<fpage>1665</fpage>
				<lpage>1674</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">2045149</pubid>
						<pubid idtype="pmpid" link="fulltext">17921354</pubid>
						<pubid idtype="doi">10.1101/gr.6861907</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>A comparison of DNA copy number profiling platforms</p>
				</title>
				<aug>
					<au>
						<snm>Greshock</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Feng</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Nogueira</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Ivanova</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Perna</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Nathanson</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Protopopov</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Weber</snm>
						<fnm>BL</fnm>
					</au>
					<au>
						<snm>Chin</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Cancer Res</source>
				<pubdate>2007</pubdate>
				<volume>67</volume>
				<fpage>10173</fpage>
				<lpage>10180</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1158/0008-5472.CAN-07-2102</pubid>
						<pubid idtype="pmpid" link="fulltext">17968032</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>HapMap</p>
				</title>
				<url>http://www.hapmap.org</url>
			</bibl>
			<bibl id="B27">
				<title>
					<p>PennCNV</p>
				</title>
				<url>http://www.neurogenome.org/cnv/penncnv/</url>
			</bibl>
			<bibl id="B28">
				<title>
					<p>SCIBLU Genomics, Lund University, Sweden</p>
				</title>
				<url>http://www.lth.se/sciblu</url>
			</bibl>
			<bibl id="B29">
				<title>
					<p>SNP Technology Platform in Uppsala, Sweden</p>
				</title>
				<url>http://www.genotyping.se</url>
			</bibl>
			<bibl id="B30">
				<title>
					<p>The R project for statistical computing</p>
				</title>
				<url>http://www.r-project.org</url>
			</bibl>
			<bibl id="B31">
				<title>
					<p>BioConductor</p>
				</title>
				<url>http://www.bioconductor.org</url>
			</bibl>
			<bibl id="B32">
				<title>
					<p>A faster circular binary segmentation algorithm for the analysis of array CGH data</p>
				</title>
				<aug>
					<au>
						<snm>Venkatraman</snm>
						<fnm>ES</fnm>
					</au>
					<au>
						<snm>Olshen</snm>
						<fnm>AB</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2007</pubdate>
				<volume>23</volume>
				<fpage>657</fpage>
				<lpage>663</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btl646</pubid>
						<pubid idtype="pmpid" link="fulltext">17234643</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Gene Expression Omnibus</p>
				</title>
				<url>http://www.ncbi.nlm.nih.gov/geo/</url>
			</bibl>
		</refgrp>
	</bm>
</art>
