<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-8-382</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>Normalization of array-CGH data: influence of copy number imbalances</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Staaf</snm>
               <fnm>Johan</fnm>
               <insr iid="I1"/>
               <email>johan.staaf@med.lu.se</email>
            </au>
            <au id="A2">
               <snm>J&#246;nsson</snm>
               <fnm>G&#246;ran</fnm>
               <insr iid="I1"/>
               <email>goran_b.jonsson@med.lu.se</email>
            </au>
            <au id="A3">
               <snm>Ringn&#233;r</snm>
               <fnm>Markus</fnm>
               <insr iid="I1"/>
               <email>markus.ringner@med.lu.se</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Vallon-Christersson</snm>
               <fnm>Johan</fnm>
               <insr iid="I1"/>
               <email>johan.vallon-christersson@med.lu.se</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Division of Oncology, Department of Clinical Sciences, Lund University, 221 85 Lund, Sweden</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>382</fpage>
         <url>http://www.biomedcentral.com/1471-2164/8/382</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17953745</pubid>
               <pubid idtype="doi">10.1186/1471-2164-8-382</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>15</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>22</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>22</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Staaf et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>High-resolution microarray-based comparative genomic hybridization (CGH) techniques have successfully been applied to study copy number imbalances in a number of settings such as the analysis of cancer genomes. For normalization of array-CGH data, methods initially developed for gene expression microarray analysis have, in general, been directly adopted and used. However, these methods are designed to work under assumptions that may not be valid for array-CGH data when copy number imbalances are present. We therefore sought to investigate the effect on normalization imposed by copy number imbalances.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here we demonstrate that copy number imbalances correlate with intensity in array-CGH data thereby causing problems for conventional normalization methods. We propose a strategy to circumvent these problems by taking copy number imbalances into account during normalization, and we test the proposed strategy using several data sets from the analysis of cancer genomes. In addition, we show how the strategy can be applied to conveniently define adaptive sample-specific boundaries between balanced copy number, losses, and gains to facilitate management of variation in tissue heterogeneity when calling copy number changes.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We highlight the importance of considering copy number imbalances during normalization of array-CGH data, and show how failure to do so can deleteriously affect data and hamper interpretation.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Microarray-based techniques for genome-wide investigation of copy number aberrations (CNAs) have recently gained much attention. Initially employing arrays developed for gene expression analysis <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, or low-density arrays produced from large-insert genomic clones such as bacterial artificial chromosomes (BACs) <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, the application has evolved rapidly. Currently, specialized high-density arrays with oligonucleotide probes or probes derived from BAC clones are predominately used. Two-channel array-based comparative genomic hybridization (aCGH) is a direct successor to conventional metaphase CGH <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. In both cases, DNA from two samples are differentially labeled with fluorescent dyes and co-hybridized to immobilized genomic capture probes. By use of aCGH, DNA derived from tumor tissue can be compared with reference DNA, e.g., normal whole blood DNA, and genomic imbalances can effectively be investigated. The main advantage of aCGH over conventional CGH is the increased resolution achieved by microarrays with a large number of individual probes, routinely up to hundreds of thousands, covering the entire genome <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. The power of aCGH has been demonstrated in tumor studies <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>, as well as in the field of clinical genetics <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, and the basis of the technique is reviewed elsewhere <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. In essence, relative ratios of copy number between two DNA samples are obtained by comparing the two fluorescent signal intensities for each probe under the assumption that intensities reflect the amount of corresponding genomic DNA in the respective sample.</p>
         <p>In much the same way as for gene expression microarray analysis, relative ratios must be normalized to account for systemic technical bias while retaining relevant biological changes <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Although much effort has been invested in developing methods for analysis of aCGH data, including break-point identification and segmentation <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, less attention has been devoted to normalization. For this latter purpose, methods originally developed for gene expression microarray data, such as global-median (Median) and intensity-based lowess (Lowess) normalization, have been adopted <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. Recent reports have evaluated the performance of gene expression normalization strategies when applied on aCGH data and have proposed more specific approaches <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Although valid concerns about directly adopting existing normalization techniques are expressed, proposed strategies rely on available conventional methods and the inherent properties of aCGH data have, rather than being incorporated in the strategies, mainly been used for calibration and validation. Microarray data is frequently visualized using M-A plots in which the log ratio, referred to as M, is plotted as a function of log mean intensity, referred to as A (Figure <figr fid="F1">1a</figr>) <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. When normalizing data using Median, the median M value is identified and subtracted from all M values. This procedure centers data such that the median M value becomes zero. Lowess normalization works in much the same way but use a locally fitted regression curve along the full range of A to identify M values to center data at. This intensity-based strategy has the added advantage over Median normalization of correcting for intensity-based bias of M. Intensity-based bias can introduce curvature in M across A (Figure <figr fid="F1">1a</figr>) which remains uncorrected for after Median based normalization.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Genome plot and M-A plot representing two frequently used ways of visualizing aCGH data</p>
            </caption>
            <text>
               <p>Genome plot and M-A plot representing two frequently used ways of visualizing aCGH data. Plots show data from the L56Br-C1 xenograft [19] analyzed using a tiling 32 K BAC array and illustrate how copy number imbalances readily observed in genome plots can be difficult to discern as copy number populations in M-A plots. <b>(a) </b>M-A plot of un-normalized data. <b>(b) </b>Genome plot of un-normalized data.</p>
            </text>
            <graphic file="1471-2164-8-382-1"/>
         </fig>
         <p>Using self-self comparisons, in which a sample is compared with itself, it has been observed that other forms of technical bias, e.g., spatial- or plate bias, exist that can skew measured M values enough to revoke the validity of the aforementioned normalization methods <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Both methods have therefore been implemented in ways that include stratification of M values in groups of data that are individually subjected to the correction. Stratification can be performed based on, e.g., spatial probe location, or probe source <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. The general thought is that stratification will result in groups, i.e., populations, of data in which the validity of the normalization method is upheld. It has also been observed that the assumptions, required for conventional normalization methods to work, can fail as a result of a true biological distribution of M, e.g., in situations where the majority of probes measure true differences between compared samples <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
         <p>We here highlight a well known and commonly displayed property of tumor cells, namely the presence of biologically true CNAs. Figure <figr fid="F1">1b</figr> shows a genome plot of raw M values obtained by aCGH of a female breast cancer tumor xenograft <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> compared with male normal whole blood DNA. In the genome plot, M is plotted as a function of the genomic location of the probe sequence. In figure <figr fid="F1">1b</figr>, several genomic regions with different and discrete M can readily be observed. We sought to investigate the effect on normalization imposed by this property of aCGH data. We show that this property results in consequential drawbacks when using conventional normalization methods and propose a strategy that incorporates any populations present in the data into the normalization.</p>
         <p>The proposed strategy can be integrated with any of several existing normalization methods and results in improved data quality. Also, spatial effects resulting in non-biological, but relevant, populations that can bias normalization are handled when calculating corrections. We also note that part of the procedure can be applied to assign adaptive sample-specific thresholds for calling copy number changes. The proposed normalization strategy, as well as the adaptive sample-specific level scaling, provides powerful and convenient means for improved copy number analysis using aCGH.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <p>This study is outlined as follows with results and discussion presented accordingly. To investigate the influence of copy number imbalances on normalization we first created a set of mimicked data representing states of an increasing fraction of genomic gain. Using the mimicked data we demonstrate the effects of gain on normalization using Median and Lowess. We then evaluated an alternative normalization strategy in which data is stratified into separate populations representing gain and balanced copy number respectively. Whereas mimicked data provide prior knowledge facilitating stratification, most experiments lack this information. Therefore, we developed a method for stratification of data and evaluated the method using previously characterized cases. By applying our procedure for stratification and normalization to tumor specimens on different aCGH platforms we compare performance with standard methods. We investigate the implication of technical spatial effects and propose a strategy for improved normalization. In addition, we evaluate the possibility to apply our method to assess noise levels in data and assign sample-specific thresholds for detection of copy number imbalances.</p>
         <sec>
            <st>
               <p>Normalization of aCGH data using Median</p>
            </st>
            <p>We assumed that aCGH data from samples with a substantial amount of imbalances could be erroneously corrected using Median normalization. This problem is not unexpected and the effect is well known in corresponding cases when gene expression microarray data is normalized <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. We investigated this issue using aCGH data derived from tiling BAC arrays comparing copy number between DNA from a normal female with karyotype 46, XX and a cell line with 47, XXX <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. In this case, autosomes are expected to yield log ratio values of M = 0 and the X chromosome is expected to yield log ratio values of M = 0.58 corresponding to XXX/XX. By first removing Y chromosome values and then randomly omitting a varying number of values for autosomes, while retaining all X chromosome values, we could mimic cases with different percentage of gain. In this way we created mimicked data sets with 5, 10, 15, 20, 25, 30, 35, and 40 percent gain, respectively, where 5 percent gain corresponds to not omitting any autosome values. Data sets were created from raw data and then subjected to normalization using Median. After normalization we investigated ratios for autosomes and the X chromosome (Figure <figr fid="F2">2</figr>). As a result of an increased fraction of gain, the median M for the X chromosome is shifted from 0.42 to 0.30 (Figure <figr fid="F2">2a</figr>), confirming our belief that normalization strategies for aCGH should account for the presence of different copy-number populations. The observed shift is a direct result of the composition of aCGH data with respect to copy number populations and can also be observed when looking at autosomes for which the median M is shifted from -0.01 to -0.13 (Figure <figr fid="F2">2b</figr>). When visualizing the normalized data in genome plots the shift clearly appears: M = 0 is in between the two populations (Figure <figr fid="F2">2c</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Median and Lowess normalization of aCGH data</p>
               </caption>
               <text>
                  <p>Median and Lowess normalization of aCGH data. Data is from a normal female with the karyotype 46, XX and a cell line with 47, XXX. Data sets with mimicked fractions of probes with gain (5, 10, 15, 20, 25, 30, 35, or 40 percent) were constructed by randomly omitting varying number of probes for autosomes. Box-plots display M values after normalization for data sets with varying fraction of probes with gain. <b>(a) </b>M values for X-chromosome probes after median normalization. <b>(b) </b>M values for autosomal probes after median normalization. <b>(c) </b>Genome plot after median normalization for the data with 35 percent of probes with gain. <b>(d) </b>M values for X-chromosome probes after lowess normalization. <b>(e) </b>M values for autosomal probes after lowess normalization. <b>(f) </b>Genome plot after lowess normalization for the data with 35 percent of probes with gain.</p>
               </text>
               <graphic file="1471-2164-8-382-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Genomic imbalances correlate with intensity in aCGH data</p>
            </st>
            <p>Importantly, when creating the mimicked data sets we did not generate any simulated ratio values; rather, we formed different selections of values using real experimental data. We believe that this use of real experimental data is of significance for aCGH data. This belief is founded on that, in contrast to expression levels, copy number levels are restricted to a, by comparison, moderate dynamic range. Therefore, when a genomic region is subjected to gain or amplification, the increase of genomic material is relatively substantial. Thus, we reasoned that probes for regions of gain would yield comparably higher average intensities than those for regions of normal copy number and that this, in turn, would result in a correlation between M and A: probes measuring ratios of gain will have higher average intensities. The opposite relationship would apply for probes measuring ratios of loss. Consequently, utilizing normalization strategies based on Lowess would possibly correct for correlations between M and A related to genomic imbalances, resulting in loss of biologically relevant variation. To test this, we subjected the mimicked XXX/XX data sets to Lowess normalization. Once again, as a result of an increased fraction of gain the median M for the X chromosome is shifted, this time from 0.42 to 0.22. The shift can also be observed when looking at autosomes for which the median M is shifted from -0.01 to -0.14 (Figures <figr fid="F2">2d</figr> and <figr fid="F2">2e</figr>). Notably, the variation in M for the X chromosome increases with the fraction of gain (Figure <figr fid="F2">2d</figr>). The interquartile range (IQR) for X chromosome M values increases from 0.21 to 0.24, indicating that Lowess normalization is less suitable when discrete copy number populations exist. When visualizing the normalized data in genome plots the shift, as well as the increase in variation of the X chromosome, is apparent (Figure <figr fid="F2">2f</figr>). To illustrate the differences between how Median and Lowess fail in normalizing the data, and to explain the introduced variation, we created M-A plots for the different data sets including correction lines for the two methods (Figure <figr fid="F3">3</figr>). With an increased fraction of gain the median M value is shifted as seen for the correction line for Median (Figures <figr fid="F3">3a</figr> to d, yellow lines). The correction line for Lowess follows the same shift in the lower range of intensities but diverge at higher intensities (Figures <figr fid="F3">3a</figr> to d, green lines). This divergence indicates that the X chromosome ratios yield higher average intensities and that when the percentage of gain increases an intensity bias is introduced for M. Importantly, this intensity bias is not of a technical nature but represents biologically relevant changes and is a result of inherent properties of aCGH data. In the low range of intensities the Lowess correction line is fitted to local means of M reflecting predominantly autosomes. However, at some point as intensities increase, local means are affected by the X chromosome and then reflect a mixed population of autosome and X chromosome M values, i.e., balanced copy number and gain respectively. As the intensities increase further the locally fitted line will be affected by increasing fractions of X chromosome M values and when normalization is applied this will result in differences in the corrections for X chromosome M values. Thus, this normalization introduces variation. We concluded that Lowess normalization erroneously corrects for biological gain &#8211; as gain correlates with intensity in aCGH data &#8211; resulting in suppressed ratios and increased variation within copy number populations.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Differences between Median and Lowess normalization</p>
               </caption>
               <text>
                  <p>Differences between Median and Lowess normalization. M-A plots of un-normalized log ratios with correction lines for Median (orange) and Lowess (green) normalization. The plots show data from figure 2 for data sets with mimicked fraction of; 5 percent <b>(a)</b>, 15 percent <b>(b)</b>, 25 percent <b>(c)</b>, or 35 percent <b>(d) </b>of probes with gain.</p>
               </text>
               <graphic file="1471-2164-8-382-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Normalization of aCGH data using population-based intensity-based lowess</p>
            </st>
            <p>We sought to develop a method that corrects for intensity dependence of M due to technical bias while retaining intensity dependence of biological relevance. We reasoned that if we could stratify aCGH ratios from an experiment with respect to copy number populations, we could use this information to circumvent the drawbacks with Lowess. One way to do this would be to run Lowess on one selected population and then apply the resulting correction line on all M values. We refer to this general strategy of considering copy number populations when using Lowess as population-based intensity-based lowess (popLowess). Applying popLowess would serve two purposes. Firstly, data would be centered at a copy number population rather than a mean or median of a mixture of different and possibly diverse copy number levels. Secondly, correlations between M and A related to technical bias would be identified and corrected for without affecting the intensity dependence due to different copy numbers. To test this strategy, we subjected the mimicked XXX/XX data sets to popLowess. Since we had prior knowledge about this case we could stratify values into copy number populations based on chromosome mapping. All values for autosomes were considered to comprise one population and all values from the X chromosome another.</p>
            <p>After stratification, raw M and A values for the largest population were used to create a Lowess correction curve. The correction curve was generalized to cover the entire range of A and used to correct all values. Results are presented in figure <figr fid="F4">4</figr>. As expected, no apparent shift in median M or in variation for the X chromosome or autosomes can now be seen between the different percentages of gain (Figures <figr fid="F4">4a</figr> and <figr fid="F4">4b</figr>), demonstrating the effectiveness of popLowess. Notably, the correction line for popLowess exhibits a slight curvature (Figure <figr fid="F4">4c</figr>), indicating that certain intensity dependence of M exists for autosomes, possibly of a technical nature. Albeit small in the presented case, the observed intensity dependence underlines the importance of being able to correct aCGH data for technical bias while retaining biological variation. Based on the results in figure <figr fid="F4">4</figr>, we argue that the strategy behind popLowess offers improved means for normalizing aCGH data. However, we utilized prior knowledge about copy number populations, which guided us in data stratification. Equivalent information for tumor samples can be obtained by karyotyping using, e.g., G-banding, multicolor FISH (M-FISH) analysis, or SKY. This information can also be used to relate a ratio level to an absolute copy number. Having verified copy numbers can guide in centering of data, assuring that gains and losses are presented as relative changes in an appropriate fashion. Then again, these are not trivial experimental procedures and, thus, do not provide a plausible solution in most cases.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>PopLowess normalization of aCGH data</p>
               </caption>
               <text>
                  <p>PopLowess normalization of aCGH data. Data from figure 2 is normalized using popLowess. For normalization data was stratified into populations based on genomic mapping of probes. Box-plots display M values after normalization for data sets with varying fraction of probes with gain. <b>(a) </b>M values for X-chromosome probes after popLowess normalization. <b>(b) </b>M values for autosomal probes after popLowess normalization. <b>(c) </b>M-A plot of un-normalized log ratios for the data with 35 percent of probes with gain. Red line corresponds to the popLowess correction line. <b>(d) </b>Genome plot after popLowess normalization for the data with 35 percent of probes with gain.</p>
               </text>
               <graphic file="1471-2164-8-382-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Stratification of M values into copy number populations</p>
            </st>
            <p>We aimed at developing a method for stratifying data into populations without prior knowledge regarding copy number allowing us to perform popLowess, and sought to identify populations in an automated fashion that requires minimal manual input and that adapt to varying noise levels. To accomplish this, we took advantage of the naively simplistic form of aCGH data, with a predetermined sequential genomic order of probes, and created a procedure described schematically in figure <figr fid="F5">5</figr>, steps 1&#8211;5. By removing outlier data based on ratio similarity between adjacent probes, the proposed strategy enriches populations from genomic regions with similar copy number. That is, regions with high variation in M, e.g., breakpoints or high level amplifications or deletions, are filtered out (Figure <figr fid="F5">5</figr>, steps 1&#8211;3). The enrichment of copy number populations can be observed in M-A plots and genome plots displaying data before and after the filter is applied (Figures <figr fid="F6">6a</figr> to <figr fid="F6">6d</figr>). We use a sample adaptive cut off for variation inferred from the data to account for a varying noise level between samples. The filtered data is subsequently segmented to further accentuate the underlying copy number populations and clustered into three distinct groups of values by k-means clustering (k = 3) (Figure <figr fid="F5">5</figr>, steps 4&#8211;5). The resulting clusters would roughly correspond to dividing the data into three copy number populations. To address situations where less than three populations exist, a merge cluster criterion can be used to merge clusters with insufficient centre-to-centre distance in M.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Schematic overview of the proposed popLowess strategy</p>
               </caption>
               <text>
                  <p>Schematic overview of the proposed popLowess strategy.</p>
               </text>
               <graphic file="1471-2164-8-382-5"/>
            </fig>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Copy number population enrichment for the L56Br-C1 xenograft analyzed using a tiling 32 K BAC array</p>
               </caption>
               <text>
                  <p>Copy number population enrichment for the L56Br-C1 xenograft analyzed using a tiling 32 K BAC array. <b>(a) </b>Genome plot before enrichment. <b>(b) </b>Genome plot after enrichment using median of the standard deviation distribution as cut off. <b>(c) </b>M-A plot before copy number enrichment. <b>(d) </b>M-A plot after copy number enrichment using median of the standard deviation distribution as cut off. <b>(e)</b>. M-A plot of all data values with popLowess correction curve superimposed in red.</p>
               </text>
               <graphic file="1471-2164-8-382-6"/>
            </fig>
            <p>To test the performance of our stratification procedure in identifying copy number populations, we used a sample set (data set 8) containing eight hyperdiploid childhood acute lymphoblastic leukemia (ALL) cases previously investigated with aCGH, G-banding and M-FISH <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. All cases show multiple whole chromosome gains and some cases also minor chromosomal regions of gain. For each case, a population of genomic regions affected by copy number gain was identified based on available karyotyping data. Remaining regions were identified as a diploid population. We performed steps 1&#8211;5 of the popLowess stratification procedure on each case using a merge cluster criterion of M = 0.3. Effectively, two popLowess populations were obtained for each case, corresponding closely to the karyotyping data of a normal diploid population and a population of copy number gain. For both the gain and diploid popLowess populations the total number of called probes divided by expected total number of probes from karyotyping data was calculated. Furthermore, the fraction of correctly called probes by popLowess for the specific regions of gain defined by karyotyping was calculated. The results demonstrate that the procedure can effectively stratify data into enriched populations that represents discrete copy number levels (Table <tblr tid="T1">1</tblr>).</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Comparison of popLowess enriched population assignment to karyotyping data for eight hyperdiploid cases [21]</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Case</p>
                     </c>
                     <c ca="center">
                        <p>Gain (called/karyotype)*</p>
                     </c>
                     <c ca="center">
                        <p>Diploid (called/karyotype)**</p>
                     </c>
                     <c ca="center">
                        <p>Gain (fraction of karyotype called)***</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.97</p>
                     </c>
                     <c ca="center">
                        <p>1.05</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                     <c ca="center">
                        <p>1.09</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                     <c ca="center">
                        <p>0.85</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>1.14</p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                     <c ca="center">
                        <p>0.85</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                     <c ca="center">
                        <p>1.21</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>0.97</p>
                     </c>
                     <c ca="center">
                        <p>1.08</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                     <c ca="center">
                        <p>1.11</p>
                     </c>
                     <c ca="center">
                        <p>0.80</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>1.09</p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                     <c ca="center">
                        <p>0.63</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Two populations corresponding to a normal diploid population and a population of copy number gain were created using steps 1&#8211;5 of the procedure in figure 5. The ratio of probes called as belonging to the gain population and probes determined as gain according to karyotype data is shown*. The equivalent ratio for the diploid population is also shown**. In addition, the fraction of probes called correctly as gain within regions determined as gain by karyotype data is displayed***.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>A procedure for normalization of aCGH data using popLowess</p>
            </st>
            <p>Once data is stratified into sets of enriched copy number populations we can select one, e.g., the largest, to perform Lowess normalization on. The generated correction curve must be generalized to cover the full range of A allowing for correction of all M values (Figure <figr fid="F6">6e</figr>). This procedure will ensure that the lowess derived correction line trails one population and remains unaffected by adjacent ones. We refer to this action as popLowess-o (where the letter <it>o </it>is a mnemonic for <it>one</it>) as it makes use of one population to derive a correction line for all data. The complete procedure of data stratification and popLowess normalization is shown in figure <figr fid="F5">5</figr>, steps 1&#8211;8. Naturally, once data is stratified alternative variants of calculating normalization corrections are imaginable. For example, one could fit lowess lines to each population and correct them individually or one could individually center populations and then use the combined data to create a lowess derived correction line. We refer to these alternatives as popLowess-i (where the letter <it>i </it>is a mnemonic for <it>individual</it>) and popLowess-c (where the letter <it>c </it>is a mnemonic for <it>common</it>) respectively. The latter alternative has the added advantage of reducing the degree to which the correction line needs to be extrapolated to cover the full range of A. Both alternatives require an additional step to center a selected copy number population at M = 0. The variants popLowess-o and popLowess-c rely on that the intensity-based curvature in M-A space is reasonably shared between populations.</p>
         </sec>
         <sec>
            <st>
               <p>Selecting a population to represent intrinsic copy number</p>
            </st>
            <p>The normalization procedure presented herein will center a population with unknown copy number at M = 0. The rationale for selecting an appropriate population for this purpose can differ depending on samples analyzed and the aim of a project. For instance, in the field of cytogenetics, gains and losses in tumors are by convention described as net changes relative to intrinsic balanced copy number, i.e., relative ploididy. As the number of centromeres determines ploidity, a parallel rationale would be to relate imbalances relative to the largest identified population and therefore center this population at M = 0. However, in some applications it might be more appropriate to relate imbalances to a normal diploid state. Thus, selecting a population to center data at can include using prior knowledge about regions with known copy number or selecting the middle population out of three, if present. Irrespectively of preferences of how data best be centered, the proposed popLowess procedure will alleviate the normalization problems related to mixed copy number populations. Importantly, when performing focused aCGH with specialized arrays that do not cover the entire genome, or comprise probes with a disproportioned focus on specific genomic regions, even CNAs that affect a minor part of the genome can introduce a significant correlation between copy number and intensity, and can result in misinterpretations of how a given ratio level relate to copy number.</p>
         </sec>
         <sec>
            <st>
               <p>Application to tumor specimens on different aCGH platforms</p>
            </st>
            <p>We next set out to test the proposed popLowess strategy on tumor aCGH data that display a more complex pattern of genomic imbalances and to test its performance on data derived from different array platforms. Figures <figr fid="F7">7a</figr> and <figr fid="F7">7c</figr> show genome- and M-A plots of a primary <it>BRCA1 </it>mutation positive breast cancer analyzed on a tiling 32 K BAC array. The genomic profile (Figure <figr fid="F7">7a</figr>) shows clear regions of aberration; however factors such as normal cell contamination and potential tumor heterogeneity have decreased the range in M for the sample specific CNAs. In the M-A plot (Figure <figr fid="F7">7c</figr>) the different copy number populations are not as distinct as for the sample in figure <figr fid="F6">6</figr>, likely making it more difficult to identify the copy number populations. We used the proposed popLowess strategy to identify copy number populations and visualized the result in a contour plot. Results are shown in figure <figr fid="F7">7e</figr> together with correction lines for Median, Lowess and popLowess. As observed in figure <figr fid="F7">7e</figr> neither the Lowess, nor the Median correction curve, accurately track a single copy number population. Figure <figr fid="F7">7g</figr> shows the normalized genomic profile after popLowess with the identified populations colored. The genomic profile has now been centered correctly and matches a previous report with detailed investigations of this tumor <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Figure <figr fid="F7">7</figr> (panels b, d, f and h) show the same collection of plots for the same tumor profiled using Agilent 244 K CGH oligonucleotide arrays (DLR-value 0.196). The vast number of probes and a considerably higher level of technical noise for raw data, renders it virtually impossible to visually distinguish the populations, clearly seen in the genomic profile (Figure <figr fid="F7">7b</figr>), using the 2D M-A plot (Figure <figr fid="F7">7d</figr>). Employment of the popLowess strategy enriches copy number populations of data as observed in the contour plot (Figure <figr fid="F7">7f</figr>). Similarly to the BAC array case, neither the Median, nor the Lowess correction curve, accurately track a single copy number population. Figure <figr fid="F7">7h</figr> shows the genomic profile after popLowess with identified populations colored. To assess the effect of normalization on variation for data in Figure <figr fid="F7">7</figr> we calculated IQR for M values of identified populations. In the BAC case the average change in IQR for the three identified populations was an increase by 0.0012 when Lowess normalization was applied. Contrary, after popLowess the average change in IQR was a decrease by -0.00029. For the Agilent case the corresponding changes were an average increase by 0.059 after Lowess compared to an average decrease by -0.0011 after popLowess. Again, we conclude that Lowess, by not tracking a single population, erroneously corrects for CNAs resulting in an increased variation within copy number populations.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p><it>BRCA1 </it>mutation positive breast cancer sample analyzed using a tiling 32 K BAC array and an Agilent 244 K oligonucleotide CGH array</p>
               </caption>
               <text>
                  <p><it>BRCA1 </it>mutation positive breast cancer sample analyzed using a tiling 32 K BAC array and an Agilent 244 K oligonucleotide CGH array. Correction lines for Median (orange), Lowess (green), and popLowess (red) normalization are superimposed in panels (e) and (f). Identified copy number populations are differentially colored in panels (g) and (h) according to size where yellow corresponds to the largest identified copy number population, red to the second largest, and green to the smallest. Data in panels (g) and (h) are centered on the middle population <b>(a) </b>Genome plot of un-normalized BAC data. <b>(b) </b>Genome plot of un-normalized Agilent data. <b>(c) </b>M-A plot of un-normalized BAC data. <b>(d) </b>M-A plot of un-normalized Agilent data. <b>(e) </b>Contour plot of copy number population enriched BAC data. <b>(f) </b>Contour plot of copy number population enriched Agilent data. <b>(g) </b>Genome plot of BAC data after popLowess. <b>(h) </b>Genome plot of Agilent data after popLowess.</p>
               </text>
               <graphic file="1471-2164-8-382-7"/>
            </fig>
            <p>In order to illustrate the differences between alternative popLowess strategies we used variants to derive correction lines (Figure <figr fid="F8">8</figr>). In figure <figr fid="F8">8a</figr> correction lines for individual populations are presented. The popLowess strategy (popLowess-o) used to produce the results in figure <figr fid="F7">7</figr> corresponds to normalizing data by selecting one of the correction lines in figure <figr fid="F8">8a</figr>. For the results in figure <figr fid="F7">7</figr>, the correction line for the largest population was selected (colored yellow in Figures <figr fid="F7">7</figr> and <figr fid="F8">8</figr>). In figure <figr fid="F8">8b</figr> the correction line derived from popLowess-c is shown together with individually median centered populations. As mentioned, popLowess rely on that the intensity-based curvature in M-A space is reasonably shared between populations. When inspecting the individual correction lines in figure <figr fid="F8">8a</figr>, populations appear to display similar intensity-based curvature although small differences appear. Differences may partly be a result of extrapolating correction curves at the ends. A thorough investigation of these differences, although outside the scope of this study, would be of interest.</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>M-A plots of popLowess normalization variants for BAC data</p>
               </caption>
               <text>
                  <p>M-A plots of popLowess normalization variants for BAC data. Data from figure 7 is used. <b>(a) </b>M-A plot with lowess correction lines for each identified population superimposed. <b>(b) </b>M-A plot of median centered populations (popLowess-c) with lowess correction line based on all populations.</p>
               </text>
               <graphic file="1471-2164-8-382-8"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Comparison of popLowess strategy to standard normalization methods</p>
            </st>
            <p>We set out to test if the popLowess strategy could systematically reduce variation in M within copy number populations in different aCGH data sets. We hypothesized that when correction curves cross, or not accurately track, copy number populations; or when intensity-based curvature is not properly addressed, a larger variation in M is obtained after normalization. To this aim, we compared the performance of the popLowess strategy versus Median and Lowess using seven different aCGH data sets (data sets 1&#8211;6, 8). The data sets cover three different types of aCGH platforms hybridized with a variety of cell line and tumor samples displaying a large variation of CNAs.</p>
            <p>We used the strategy in figure <figr fid="F5">5</figr> to identify copy number populations in each of the data sets. We then normalized each data set in parallel using popLowess, Lowess, and Median. After normalization, we calculated standard deviations of M for each identified population for each method and compared results.</p>
            <p>Results from the comparison are displayed in table <tblr tid="T2">2</tblr>, showing that the popLowess strategy generated normalized copy number data with smaller standard deviations in M within identified populations for all comparisons and data sets. We repeated the test using the inter-quartile range (IQR) of M for each population instead of the standard deviation and obtained similar results (data not shown).</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Comparison of effect on population variance between different normalization strategies</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="8" ca="center">
                        <p>P-values for data sets</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Data set</p>
                     </c>
                     <c ca="center">
                        <p>1 [23]</p>
                     </c>
                     <c ca="center">
                        <p>2 [25]</p>
                     </c>
                     <c ca="center">
                        <p>3 [20]</p>
                     </c>
                     <c ca="center">
                        <p>4 [8]</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>8 [21]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="1">
                        <hr/>
                     </c>
                     <c cspan="1">
                        <hr/>
                     </c>
                     <c cspan="1">
                        <hr/>
                     </c>
                     <c cspan="1">
                        <hr/>
                     </c>
                     <c cspan="1">
                        <hr/>
                     </c>
                     <c cspan="1">
                        <hr/>
                     </c>
                     <c cspan="1">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Nbr of samples</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>52</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Platform</p>
                     </c>
                     <c ca="center">
                        <p>BAC 32 K</p>
                     </c>
                     <c ca="center">
                        <p>BAC 32 K</p>
                     </c>
                     <c ca="center">
                        <p>BAC 32 K</p>
                     </c>
                     <c ca="center">
                        <p>BAC 1 Mb</p>
                     </c>
                     <c ca="center">
                        <p>Agilent 244 K</p>
                     </c>
                     <c ca="center">
                        <p>Agilent 44 K</p>
                     </c>
                     <c ca="center">
                        <p>BAC 32 K</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>popLowess vs Lowess</p>
                     </c>
                     <c ca="right">
                        <p>All populations</p>
                     </c>
                     <c ca="center">
                        <p>1.1e-4</p>
                     </c>
                     <c ca="center">
                        <p>7.0e-12</p>
                     </c>
                     <c ca="center">
                        <p>5.6e-8</p>
                     </c>
                     <c ca="center">
                        <p>3.4e-28</p>
                     </c>
                     <c ca="center">
                        <p>2.5e-05</p>
                     </c>
                     <c ca="center">
                        <p>1.6e-4</p>
                     </c>
                     <c ca="center">
                        <p>7.2e-5</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Population 1</p>
                     </c>
                     <c ca="center">
                        <p>7.8e-3</p>
                     </c>
                     <c ca="center">
                        <p>1.4e-5</p>
                     </c>
                     <c ca="center">
                        <p>9.8e-4</p>
                     </c>
                     <c ca="center">
                        <p>9.9e-32</p>
                     </c>
                     <c ca="center">
                        <p>2.0e-3</p>
                     </c>
                     <c ca="center">
                        <p>2.0e-3</p>
                     </c>
                     <c ca="center">
                        <p>3.9e-3</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Population 2</p>
                     </c>
                     <c ca="center">
                        <p>7.8e-3</p>
                     </c>
                     <c ca="center">
                        <p>1.5e-6</p>
                     </c>
                     <c ca="center">
                        <p>9.8e-4</p>
                     </c>
                     <c ca="center">
                        <p>7.4e-4</p>
                     </c>
                     <c ca="center">
                        <p>2.0e-3</p>
                     </c>
                     <c ca="center">
                        <p>0.09</p>
                     </c>
                     <c ca="center">
                        <p>3.5e-2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Population 3</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                     <c ca="center">
                        <p>6.3e-3</p>
                     </c>
                     <c ca="center">
                        <p>2.0e-2</p>
                     </c>
                     <c ca="center">
                        <p>2.5e-7</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.09</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>popLowess vs Median</p>
                     </c>
                     <c ca="right">
                        <p>All populations</p>
                     </c>
                     <c ca="center">
                        <p>&lt; 1e-32</p>
                     </c>
                     <c ca="center">
                        <p>&lt; 1e-32</p>
                     </c>
                     <c ca="center">
                        <p>&lt; 1e-32</p>
                     </c>
                     <c ca="center">
                        <p>&lt; 1e-32</p>
                     </c>
                     <c ca="center">
                        <p>&lt; 1e-32</p>
                     </c>
                     <c ca="center">
                        <p>&lt; 1e-32</p>
                     </c>
                     <c ca="center">
                        <p>&lt; 1e-32</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Population 1</p>
                     </c>
                     <c ca="center">
                        <p>7.8e-3</p>
                     </c>
                     <c ca="center">
                        <p>3.7e-9</p>
                     </c>
                     <c ca="center">
                        <p>9.8e-4</p>
                     </c>
                     <c ca="center">
                        <p>9.9e-32</p>
                     </c>
                     <c ca="center">
                        <p>2.0e-3</p>
                     </c>
                     <c ca="center">
                        <p>2.0e-3</p>
                     </c>
                     <c ca="center">
                        <p>3.9e-3</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Population 2</p>
                     </c>
                     <c ca="center">
                        <p>6.3e-2</p>
                     </c>
                     <c ca="center">
                        <p>1.4e-5</p>
                     </c>
                     <c ca="center">
                        <p>0.17</p>
                     </c>
                     <c ca="center">
                        <p>3.8e-2</p>
                     </c>
                     <c ca="center">
                        <p>0.50</p>
                     </c>
                     <c ca="center">
                        <p>0.09</p>
                     </c>
                     <c ca="center">
                        <p>0.14</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Population 3</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                     <c ca="center">
                        <p>9.0e-5</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.28</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.50</p>
                     </c>
                     <c ca="center">
                        <p>0.75</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>P-values for different populations for data sets are shown. The test corresponds to the null hypothesis that lower standard deviations for popLowess are obtained by chance. Population 1 always relates to the largest identified population (# probes), population 2 to the second largest and population 3 to the smallest.</p>
               </tblfn>
            </tbl>
            <p>Since we do not have prior knowledge of CNAs in most of the cases we cannot evaluate variation within confirmed genomic regions of similar copy number. Therefore, one could argue that the better performance of popLowess, resulting in lower variation within populations when compared with conventional normalization, is biased by the fact that populations are inferred from the data. However, from looking at the data in table <tblr tid="T1">1</tblr>, and at the genome plots in figure <figr fid="F7">7</figr> (panel g and h) we note that the identified populations reflect regions with discrete copy number levels. Therefore, we argue that decreased intra population variation is beneficial to both interpretation and downstream analysis and provides improved data quality.</p>
         </sec>
         <sec>
            <st>
               <p>Spatial effects</p>
            </st>
            <p>Presence of technical artifacts in array data resulting in correlation between M and spatial probe location on the array is a well-known and previously described phenomenon. We focused on two plausible consequences of such spatial effects in aCGH data. Firstly, affected values can introduce populations that compromise normalization in the same way as copy number populations. Secondly, affected values will be incorrectly scaled compared to non-affected.</p>
            <p>We reasoned that ratios biased by spatial artifacts are controlled for by our proposed popLowess strategy as it filters outlier data guided by genomic mapping. Thus, when calculating an intensity dependent correction for normalization, our strategy would not be compromised by spatial bias as affected values are disregarded together with values from break points, high-level amplifications, and homozygous deletions. On the other hand, popLowess does not correct for spatial effects and affected values would remain incorrectly scaled after normalization even if the intensity bias is removed.</p>
            <p>As the proposed popLowess strategy does not correct for spatial effects, we reasoned that a pre-normalization step might be appropriate for data displaying spatially related bias in order to properly scale affected values. This could be accomplished by applying one of many available spatial correction methods <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, or variations thereof, prior to popLowess. However, since we have shown that genomic imbalances correlate with intensity, we are cautious about addressing spatial effects using pre-normalization algorithms that are intensity-based.</p>
            <p>To test our reasoning we applied popLowess to data set 7. Samples in this set have little to no genomic alterations but the data display variation in M-A curvature and spatial effects. Data set 7 was normalized using popLowess, block-based Median followed by popLowess, or block-based Lowess followed by popLowess. For popLowess, by itself or in combination with a pre-normalization step, a merge cluster criteria of 0.3 in M was employed to account for the presence of only two copy number populations.</p>
            <p>As a measurement of spatial effects we calculated the standard deviation of medians of M from pin-tip blocks before and after normalization. We found that spatial bias may be corrected for by a pre-normalization step, preceding popLowess (Table <tblr tid="T3">3</tblr>).</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Effect of pre-normalization to correct spatial bias prior to applying popLowess</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>un-normalized*</p>
                     </c>
                     <c ca="center">
                        <p>popLowess**</p>
                     </c>
                     <c ca="center">
                        <p>pre-normalization by block-based Median***</p>
                     </c>
                     <c ca="center">
                        <p>pre-normalization by block-based Lowess****</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>XY vs XY</p>
                     </c>
                     <c ca="center">
                        <p>0.062</p>
                     </c>
                     <c ca="center">
                        <p>0.062</p>
                     </c>
                     <c ca="center">
                        <p>0.009</p>
                     </c>
                     <c ca="center">
                        <p>0.003</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>XY vs XY</p>
                     </c>
                     <c ca="center">
                        <p>0.067</p>
                     </c>
                     <c ca="center">
                        <p>0.067</p>
                     </c>
                     <c ca="center">
                        <p>0.003</p>
                     </c>
                     <c ca="center">
                        <p>0.003</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>XX vs XX</p>
                     </c>
                     <c ca="center">
                        <p>0.097</p>
                     </c>
                     <c ca="center">
                        <p>0.069</p>
                     </c>
                     <c ca="center">
                        <p>0.035</p>
                     </c>
                     <c ca="center">
                        <p>0.003</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>XX vs XX</p>
                     </c>
                     <c ca="center">
                        <p>0.125</p>
                     </c>
                     <c ca="center">
                        <p>0.127</p>
                     </c>
                     <c ca="center">
                        <p>0.031</p>
                     </c>
                     <c ca="center">
                        <p>0.004</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>XXX vs XX</p>
                     </c>
                     <c ca="center">
                        <p>0.051</p>
                     </c>
                     <c ca="center">
                        <p>0.049</p>
                     </c>
                     <c ca="center">
                        <p>0.003</p>
                     </c>
                     <c ca="center">
                        <p>0.004</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>XX vs XY</p>
                     </c>
                     <c ca="center">
                        <p>0.046</p>
                     </c>
                     <c ca="center">
                        <p>0.045</p>
                     </c>
                     <c ca="center">
                        <p>0.003</p>
                     </c>
                     <c ca="center">
                        <p>0.003</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>XXXX vs XX</p>
                     </c>
                     <c ca="center">
                        <p>0.060</p>
                     </c>
                     <c ca="center">
                        <p>0.033</p>
                     </c>
                     <c ca="center">
                        <p>0.004</p>
                     </c>
                     <c ca="center">
                        <p>0.004</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>XXX vs XY</p>
                     </c>
                     <c ca="center">
                        <p>0.058</p>
                     </c>
                     <c ca="center">
                        <p>0.058</p>
                     </c>
                     <c ca="center">
                        <p>0.007</p>
                     </c>
                     <c ca="center">
                        <p>0.005</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>XXXX vs XY</p>
                     </c>
                     <c ca="center">
                        <p>0.060</p>
                     </c>
                     <c ca="center">
                        <p>0.058</p>
                     </c>
                     <c ca="center">
                        <p>0.008</p>
                     </c>
                     <c ca="center">
                        <p>0.004</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Standard deviation of medians of M from pin-tip blocks. Standard deviation is calculated before normalization* and after normalization using popLowess alone** or together with a pre-normalization step. Applied pre-normalization steps include either block-based Median*** or block-based Lowess****.</p>
               </tblfn>
            </tbl>
            <p>We conclude that the proposed popLowess strategy is robust in the sense that it can handle the presence of otherwise deleterious populations without relying on them. We also conclude that, whereas popLowess is inert to spatial effects, in the sense that it does not compromise calculation of an intensity dependent correction, a pre-normalization step that correct for spatial bias is warranted.</p>
         </sec>
         <sec>
            <st>
               <p>Adaptive sample-specific thresholds for calling copy number change</p>
            </st>
            <p>During development of the popLowess strategy, we recognized that the sample-specific cut-off value (Figure <figr fid="F5">5</figr>, step 3) could be used to assess noise level in data and to assign thresholds for copy number imbalances on a sample-specific basis. Several reports <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B8">8</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp> have utilized global thresholds in M for calling CNA as gains or losses. These thresholds are assigned by adding/subtracting a value in M from a base line typically at M = 0. Determining suitable thresholds may be problematic in large sample sets with samples of varying quality and heterogeneity, often the case for tumor studies <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, and may result in setting too conservative thresholds for certain samples in order to avoid erroneous CNA calls. Deriving sample-specific threshold values scalable for desired stringency in an automated fashion is then of relevance.</p>
            <p>A parallel can be made to the derivative log ratio spread (DLR) value calculated by the Agilent CGH Analytics software. The DLR-value can be used to assess hybridization quality and provide a sample scalable threshold for calling CNAs using, e.g., the Z-scoring algorithm in the CGH Analytics software.</p>
            <p>We used sample specific level thresholds derived from popLowess on aCGH data for a <it>BRCA1 </it>mutation positive tumor analyzed on two array platforms (Figure <figr fid="F9">9</figr>). Figure <figr fid="F9">9a</figr> shows thresholds after popLowess normalization for the BAC array data and figure <figr fid="F9">9b</figr> after application of a 250 kBp smoothing window. Figure <figr fid="F9">9c</figr> shows the same tumor analyzed on the Agilent platform after popLowess and figure <figr fid="F9">9d</figr> after application of a 50 kBp smoothing window. As shown in figure <figr fid="F9">9</figr>, thresholds are automatically adapted to specifically match data. We believe that the use of sample-specific adaptive thresholds will greatly facilitate the analysis of larger aCGH data sets that include samples of varying heterogeneity and quality.</p>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Use of sample adaptive gain/loss thresholds</p>
               </caption>
               <text>
                  <p>Use of sample adaptive gain/loss thresholds. Thresholds are applied to data from figure 7. <b>(a) </b>Copy number profile derived from BAC data for chromosome 4 after popLowess normalization with adaptive thresholds superimposed (&#177; 0.372). <b>(b) </b>Copy number profile for chromosome 4 for data from panel (a) after smoothing (250 kBp) and new threshold estimate (&#177; 0.176). <b>(c) </b>Copy number profile of Agilent data for chromosome 4 after popLowess normalization with adaptive thresholds superimposed (&#177; 0.453). <b>(d) </b>Copy number profile for chromosome 4 for data from panel (c) after smoothing (50 kBp) and new threshold estimate (&#177; 0.171).</p>
               </text>
               <graphic file="1471-2164-8-382-9"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Normalization affects downstream analysis</p>
            </st>
            <p>To exemplify how normalization can affect downstream analysis and interpretation we used data generated from the Agilent array presented in Figure <figr fid="F7">7</figr>. We normalized the raw data shown in Figure <figr fid="F7">7</figr> (panels b and d) with either Lowess or popLowess. Correction lines for both normalization methods are shown in Figure <figr fid="F7">7f</figr>. We then smoothed data (50 kBp window) and performed segmentation using the CGHplotter algorithm <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Results for chromosome 4 are shown in Figure <figr fid="F10">10</figr>. In the given example, segmentation after Lowess and popLowess (Figures <figr fid="F10">10a</figr> and <figr fid="F10">10b</figr> respectively) broadly identifies the same break points. However, after Lowess the data is not centered on any of the identified segments as a result of the correction line not tracking a specific population in the raw data (Figure <figr fid="F10">10a</figr>). Contrary, after popLowess the data is centered on a specific segment level (Figure <figr fid="F10">10b</figr>; e.g. blue arrow). Shifting data to center it on a specific population can be done after any conventional normalization method. An example is shown in Figure <figr fid="F10">10c</figr> where data have been centered after Lowess. Determining the point at which data is centered can for example be achieved by stratifying data into populations using the method presented herein or by the method proposed by Lipson et al. <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Importantly, to center data after Lowess does not alleviate the aforementioned problem of introduction of variation and the inappropriate correction of biological gain and loss. As a result, in the example given the dynamic range between segments of gain (Figure <figr fid="F10">10</figr>, red arrows) and loss (Figure <figr fid="F10">10</figr>, green arrows) is reduced after Lowess compared with popLowess, 1.38 versus 1.58. For Lowess a smaller dynamic range between levels is present in both directions relative to the baseline level (Figure <figr fid="F10">10</figr>). Reduced dynamic range or inappropriate centralization of aCGH data can result in misinterpretations when investigating genomic copy number profiles.</p>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>Example of segmentation of data after alternative normalization methods</p>
               </caption>
               <text>
                  <p>Example of segmentation of data after alternative normalization methods. Segmented copy number profile of chromosome 4 from smoothed (50 kBp) Agilent data for the sample from figure 7 with superimposed adaptive thresholds (&#177; 0.171). Three different segments are highlighted with colored arrows in each panel to exemplify regions with different copy number level <b>(a) </b>Segmentation applied after Lowess. M values for selected segments; blue arrow 0.16, red arrow 1.02, and green arrow -0.36. <b>(b) </b>Segmentation applied after popLowess. M values for selected segments; blue arrow -0.07, red arrow 0.94, and green arrow -0.64.<b>(c) </b>Segmentation applied after Lowess normalization subsequently followed by centralization of data on median M of an individual population. M values for selected segments; blue arrow -0.05, red arrow 0.81, and green arrow -0.57.</p>
               </text>
               <graphic file="1471-2164-8-382-10"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We show that the presence of copy number populations in aCGH data deleteriously affects normalization using curve-generating algorithms such as intensity-based lowess and may cause erroneous centering of data. We demonstrate that genomic imbalances correlate with intensity in aCGH data and therefore must be accounted for during normalization in order to correct for intensity dependence of M due to technical bias while retaining intensity dependence of biological relevance. Here we propose a population-based normalization strategy that accounts for the presence of copy number populations. We show that benefits of a population-based normalization approach are clearly recognized for data displaying numerous CNAs. We also demonstrate that the proposed procedure can be applied to assign adaptive sample-specific thresholds for calling copy number changes. We appreciate that the suggested strategy represents only one conceivable way of implementing population-based normalization and that any implementation that effectively discerns copy number populations in aCGH data, whether utilizing prior knowledge regarding samples or inference from the data itself, could be used. In addition, once copy number populations are identified, this information can be used in a variety of ways to circumvent highlighted problems related to conventional normalization of aCGH data. Taken together, we demonstrate that copy number populations in aCGH data should be accounted for during normalization and that the proposed normalization strategy, as well as the adaptive sample-specific level scaling, provides powerful and convenient means for improved copy number analysis using aCGH.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data sets</p>
            </st>
            <p>We used eight data sets derived from BAC arrays and from Agilent 244 K oligonucleotide CGH arrays to evaluate normalization methods. Data set 1 consists of seven breast cancer cell lines analyzed using tiling 32 K BAC arrays <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Data set 2 consists of 28 lung cancer cell lines analyzed using tiling 32 K BAC arrays <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Data set 3 consists of ten breast cancer cell lines analyzed using tiling 32 K BAC arrays <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Data set 4 consists of 52 breast cancer tumors analyzed in dye-swaps on 1 Mb BAC arrays <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Data set 5 consists of 8 breast cancer tumors and one dye-swap analyzed using Agilent 244 K oligonucleotide CGH arrays <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. These tumors displayed DLR values between 0.196 and 0.364 when analyzed with Agilent CGHAnalytics software ver 3.4.27 <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Data set 6 was created from data set 5 by matching the oligonucleotide probe IDs from the 244 K arrays to the Agilent 44B probe IDs available through Agilent eArray <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, thus creating a virtual 44 K oligonucleotide CGH array. Of 42,447 genome-mapped probe IDs on the 44B array, 41,599 were found on the 244 K arrays (98%). Data set 7 consists of nine hybridizations of chromosome X aberrant cell lines with karyotype 47, XXX and 48, XXXX, and male 46, XY and female 46, XX samples in various combinations <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Samples in data set 7 are expected to display a normal karyotype for chromosomes 1&#8211;22. Data set 8 consists of eight hyperdiploid childhood ALL cases analyzed using tiling 32 K BAC arrays <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Pre-filtering and conventional normalization of aCGH data</p>
            </st>
            <p>All data sets were loaded into BioArray Software Environment (BASE) <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> for analysis. Positive and non-saturated spots were background corrected using the median foreground minus the median background signal intensity for each channel and log ratios (M) were calculated from the background corrected intensities. In all analysis we used M = log<sub>2</sub>(int1/int2) and A = log<sub>10</sub>(sqrt(int1*int2)), where int1 and int2 are background corrected intensities from the investigated sample and reference, respectively. Data sets 1&#8211;4 and 7&#8211;8 were filtered for signal-to-noise ratio for each spot in both channels according to published reports and the remaining data sets for signal-to-noise ratio > 5 in both channels before BASE implemented software plug-ins of the different normalization strategies were employed. A lowess smooth factor of 0.33, delta of 0.1, and four iterations were used for standard Lowess, popLowess and block-based lowess normalization. Block group size was set to 1 for all block-based normalizations.</p>
         </sec>
         <sec>
            <st>
               <p>Population-based intensity-based lowess</p>
            </st>
            <p>A schematic overview of the proposed popLowess normalization strategy is shown in figure <figr fid="F5">5</figr>. The approach is applied on a per sample basis starting with genomic mapping and raw intensities (int1 and int2) for N probe IDs (step 1, Figure <figr fid="F5">5</figr>). The probes are sorted according to genomic position and M and A are calculated for each probe (step 2, Figure <figr fid="F5">5</figr>). Next, a standard deviation in M is calculated for each probe in sliding windows of user-defined size along the genome. The resulting distribution of N standard deviations is subjected to a cut-off criterion generating K probes with standard deviations &lt; cut-off for continued population analysis (step 3, Figure <figr fid="F5">5</figr>). A moving window size of 11 probes was used and the median of the standard deviation distribution was used as cut-off value. This selection criterion is sample adaptive avoiding problems with using a global cut-off criterion. The K selected probes are next segmented on a per chromosome basis using, e.g., the CGHplotter algorithm <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> or the faster circular binary segmentation (CBS) algorithm <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> (step 4, Figure <figr fid="F5">5</figr>). Herein, the segmentation algorithm proposed by Autio et al. was used with the constant for computing the number of changes (c-parameter) set to 10 <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Segmented values are used to cluster the K probes into three distinct clusters by means of robust k-means clustering (step 5, Figure <figr fid="F5">5</figr>). After clustering, there is an option to merge clusters with cluster centers close to each other. Merging is typically useful for samples not displaying three populations, e.g., samples with 1 or 2 copy number populations. When indicated, a merge cluster criterion of 0.2 or 0.3 in M was used. The resulting data consists of 1&#8211;3 distinct populations of data that contains information about the genomic mapping, M, and A for each probe. The largest population is selected for lowess normalization <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> generating a population specific correction curve (step 6, Figure <figr fid="F5">5</figr>). The correction curve is next extrapolated to the entire range of A and used to correct M for all N reporters similar to Lowess (step 7, Figure <figr fid="F5">5</figr>). The extrapolation is done conservatively in the end points of A by using the first/last data point of the population specific correction curve to level out the global correction curve horizontally in the M-A plot thereby moderating the impact of extreme points or missing values. After lowess correction, one population is selected as the center population and all data is shifted such that this population obtains median M equal to 0. Selection of a center population can be based on different assumptions. Finally, the normalized int1 and int2 intensities are returned (step 8, Figure <figr fid="F5">5</figr>). By not segmenting the entire set of observations, and by setting the crucial segmentation parameters for detecting breakpoints in the lower scale, speed is gained while still retaining robustness as long as the standard deviation cut off is not set too low. The purpose of segmentation is to refine large regions with identical copy number and not to detect small complex copy number alterations.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison of normalization methods</p>
            </st>
            <p>For comparisons, the R implemented lowess function was used to create lowess-normalized data. For each identified population (step 1&#8211;5, Figure <figr fid="F5">5</figr>) in every sample in data sets 1&#8211;6 and 8, the standard deviations in M of the reporters in the population after Lowess, popLowess, and no normalization (equal to Median) were calculated separately. The number of populations in a data set for which the popLowess strategy rendered a lower standard deviation compared to the competitor was calculated. To evaluate if popLowess resulted in a significant number of populations with lower standard deviations, one sided p-values were calculated using the binomial distribution with p = 0.5. This binomial test corresponds to the null hypothesis that lower standard deviations for popLowess are obtained by chance. This comparison was done both when studying all populations as a whole and for each population individually.</p>
         </sec>
         <sec>
            <st>
               <p>Sample adaptive gain/loss thresholds</p>
            </st>
            <p>Sample adaptive thresholds for calling gain or loss can be generated by performing steps 1&#8211;3 in Figure <figr fid="F5">5</figr> using the same form of data input and standard deviation cut-off criteria. The identified standard deviation cut-off value can be scaled by multiplicative factors to generate sample specific gain/loss thresholds of desired stringency for downstream applications, e.g., calling CNAs after segmentation. Before creating sample adaptive thresholds, data was pre-filtered and normalized using the popLowess strategy. Sample adaptive thresholds for the Ca13928 breast tumor were created before and after a smoothing window of 250 kBp size for 32 K BAC data and 50 kBp for Agilent 244 K data. Thresholds were estimated using a chromosomal moving window of size 1% of the total probe number for each chromosome separately and the standard deviation cut-off value was selected as the median of the standard deviation distribution. The cut-off value was scaled by a factor 2 to create the &#177; thresholds in M displayed in figure <figr fid="F9">9</figr>.</p>
         </sec>
         <sec>
            <st>
               <p>Availability and requirements</p>
            </st>
            <p>An implementation of popLowess in R <url>http://www.r-project.org</url> is available both as a plugin to the BioArray Software Environment (BASE) <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and as a stand-alone version.</p>
            <p>Project name: popLowess</p>
            <p>Project home page: <url>http://baseplugins.thep.lu.se/wiki/se.lu.onk.popLowess</url></p>
            <p>Operating system(s): Platform independent</p>
            <p>Programming language: R</p>
            <p>License: GNU GPL</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>List of abbreviations</p>
         </st>
         <p>aCGH: array-based CGH</p>
         <p>ALL: acute lymphoblastic leukemia</p>
         <p>BAC: bacterial artificial chromosome</p>
         <p>BASE: BioArray Software Environment</p>
         <p>CGH: comparative genomic hybridization</p>
         <p>CNA: copy number aberration</p>
         <p>CNV: copy number variation</p>
         <p>FISH: Fluorescence in situ hybridization</p>
         <p>IQR: Inter Quartile Range</p>
         <p>Lowess: Global intensity-based lowess normalization</p>
         <p>Median: Global median normalization</p>
         <p>popLowess: population-based intensity-based lowess normalization</p>
         <p>SKY: Spectral karyotyping technique</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The author(s) declares that there are no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>All authors participated in the development of the model. JS implemented and developed the methods. JS and MR performed the statistical tests. JVC conceived the study. JS and JVC drafted the manuscript. All authors participated in the design of the study and in completing the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We wish to thank Patrik Ed&#233;n and Mattias H&#246;glund for helpful comments on the manuscript. This work was supported by the Knut and Alice Wallenberg Foundation via the SWEGENE program (JS and JVC), the Swedish Cancer Society (GJ), the American Cancer Society (GJ and JVC), John och Augusta Perssons stiftelse (GJ and JVC), and the Swedish Foundation for Strategic Research through CREATE Health &#8211; the Lund Strategic Centre for Clinical Cancer Research (MR).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Genome-wide analysis of DNA copy-number changes using cDNA microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Pollack</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Perou</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Alizadeh</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Pergamenschikov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Jeffrey</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>1999</pubdate>
            <volume>23</volume>
            <issue>1</issue>
            <fpage>41</fpage>
            <lpage>46</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/12640</pubid>
                  <pubid idtype="pmpid" link="fulltext">10471496</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Pinkel</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Segraves</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sudar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Poole</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kowbel</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kuo</snm>
                  <fnm>WL</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhai</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Dairkee</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Ljung</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Albertson</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>1998</pubdate>
            <volume>20</volume>
            <issue>2</issue>
            <fpage>207</fpage>
            <lpage>211</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/2524</pubid>
                  <pubid idtype="pmpid" link="fulltext">9771718</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors</p>
            </title>
            <aug>
               <au>
                  <snm>Kallioniemi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kallioniemi</snm>
                  <fnm>OP</fnm>
               </au>
               <au>
                  <snm>Sudar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rutovitz</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Waldman</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Pinkel</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1992</pubdate>
            <volume>258</volume>
            <issue>5083</issue>
            <fpage>818</fpage>
            <lpage>821</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1359641</pubid>
                  <pubid idtype="pmpid" link="fulltext">1359641</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>BAC to the future! or oligonucleotides: a perspective for micro array comparative genomic hybridization (array CGH)</p>
            </title>
            <aug>
               <au>
                  <snm>Ylstra</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>van den Ijssel</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Carvalho</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Brakenhoff</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Meijer</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>2</issue>
            <fpage>445</fpage>
            <lpage>450</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1356528</pubid>
                  <pubid idtype="pmpid" link="fulltext">16439806</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj456</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Novel patterns of genome rearrangement and their association with survival in breast cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Hicks</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Krasnitz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lakshmi</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Navin</snm>
                  <fnm>NE</fnm>
               </au>
               <au>
                  <snm>Riggs</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Leibu</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Esposito</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Alexander</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Troge</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Grubor</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Yoon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wigler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Borresen-Dale</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Naume</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Schlicting</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Norton</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hagerstrom</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Skoog</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Auer</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Maner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lundin</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zetterberg</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <issue>12</issue>
            <fpage>1465</fpage>
            <lpage>1479</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1665631</pubid>
                  <pubid idtype="pmpid" link="fulltext">17142309</pubid>
                  <pubid idtype="doi">10.1101/gr.5460106</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Breast tumor copy number aberration phenotypes and genomic instability</p>
            </title>
            <aug>
               <au>
                  <snm>Fridlyand</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Snijders</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Ylstra</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Olshen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Segraves</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Dairkee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tokuyasu</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ljung</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Jain</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>McLennan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ziegler</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chin</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Devries</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Feiler</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Waldman</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Pinkel</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Albertson</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>BMC Cancer</source>
            <pubdate>2006</pubdate>
            <volume>6</volume>
            <fpage>96</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1459181</pubid>
                  <pubid idtype="pmpid" link="fulltext">16620391</pubid>
                  <pubid idtype="doi">10.1186/1471-2407-6-96</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Genomic and transcriptional aberrations linked to breast cancer pathophysiologies</p>
            </title>
            <aug>
               <au>
                  <snm>Chin</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>DeVries</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fridlyand</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Roydasgupta</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kuo</snm>
                  <fnm>WL</fnm>
               </au>
               <au>
                  <snm>Lapuk</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Neve</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Qian</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Ryder</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Feiler</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tokuyasu</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kingsley</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dairkee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Meng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Chew</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pinkel</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Jain</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ljung</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Esserman</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Albertson</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Waldman</snm>
                  <fnm>FM</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Cancer Cell</source>
            <pubdate>2006</pubdate>
            <volume>10</volume>
            <issue>6</issue>
            <fpage>529</fpage>
            <lpage>541</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ccr.2006.10.009</pubid>
                  <pubid idtype="pmpid" link="fulltext">17157792</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Distinct genomic profiles in hereditary breast tumors identified by array-based comparative genomic hybridization</p>
            </title>
            <aug>
               <au>
                  <snm>Jonsson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Naylor</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Vallon-Christersson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Staaf</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Greshock</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Luts</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Olsson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Rahman</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Stratton</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ringner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Borg</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>BL</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2005</pubdate>
            <volume>65</volume>
            <issue>17</issue>
            <fpage>7612</fpage>
            <lpage>7621</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16140926</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Identification of disease genes by whole genome CGH arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Vissers</snm>
                  <fnm>LE</fnm>
               </au>
               <au>
                  <snm>Veltman</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>van Kessel</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Brunner</snm>
                  <fnm>HG</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2005</pubdate>
            <volume>14</volume>
            <issue>Spec No 2</issue>
            <fpage>R215</fpage>
            <lpage>223</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/ddi268</pubid>
                  <pubid idtype="pmpid" link="fulltext">16244320</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Comparative genomic hybridization</p>
            </title>
            <aug>
               <au>
                  <snm>Pinkel</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Albertson</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Annu Rev Genomics Hum Genet</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>331</fpage>
            <lpage>354</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genom.6.080604.162140</pubid>
                  <pubid idtype="pmpid" link="fulltext">16124865</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Microarray data normalization and transformation</p>
            </title>
            <aug>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>32</volume>
            <issue>Suppl</issue>
            <fpage>496</fpage>
            <lpage>501</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1032</pubid>
                  <pubid idtype="pmpid" link="fulltext">12454644</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Breakpoint identification and smoothing of array comparative genomic hybridization data</p>
            </title>
            <aug>
               <au>
                  <snm>Jong</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Marchiori</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Meijer</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Vaart</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Ylstra</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>18</issue>
            <fpage>3636</fpage>
            <lpage>3637</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth355</pubid>
                  <pubid idtype="pmpid" link="fulltext">15201182</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Circular binary segmentation for the analysis of array-based DNA copy number data</p>
            </title>
            <aug>
               <au>
                  <snm>Olshen</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Venkatraman</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Lucito</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wigler</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Biostatistics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>4</issue>
            <fpage>557</fpage>
            <lpage>572</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/biostatistics/kxh008</pubid>
                  <pubid idtype="pmpid" link="fulltext">15475419</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>CGH-Plotter: MATLAB toolbox for CGH-data analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Autio</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hautaniemi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kauraniemi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Yli-Harja</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Astola</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kallioniemi</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>13</issue>
            <fpage>1714</fpage>
            <lpage>1715</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg230</pubid>
                  <pubid idtype="pmpid" link="fulltext">15593402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Spatial normalization of array-CGH data</p>
            </title>
            <aug>
               <au>
                  <snm>Neuvial</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hupe</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brito</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Liva</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Manie</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Brennetot</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Radvanyi</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Aurias</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Barillot</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>264</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1523216</pubid>
                  <pubid idtype="pmpid" link="fulltext">16716215</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-264</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A stepwise framework for the normalization of array CGH data</p>
            </title>
            <aug>
               <au>
                  <snm>Khojasteh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lam</snm>
                  <fnm>WL</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>MacAulay</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>274</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1310623</pubid>
                  <pubid idtype="pmpid" link="fulltext">16297240</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-274</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Normalization of cDNA microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Smyth</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Methods</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>4</issue>
            <fpage>265</fpage>
            <lpage>273</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1046-2023(03)00155-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">14597310</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes</p>
            </title>
            <aug>
               <au>
                  <snm>Oshlack</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Emslie</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Corcoran</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Smyth</snm>
                  <fnm>GK</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <issue>1</issue>
            <fpage>R2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1839120</pubid>
                  <pubid idtype="pmpid" link="fulltext">17204140</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-1-r2</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Characterization of a novel breast carcinoma xenograft and cell line derived from a BRCA1 germ-line mutation carrier</p>
            </title>
            <aug>
               <au>
                  <snm>Johannsson</snm>
                  <fnm>OT</fnm>
               </au>
               <au>
                  <snm>Staff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vallon-Christersson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kytola</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gudjonsson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rennstam</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hedenfalk</snm>
                  <fnm>IA</fnm>
               </au>
               <au>
                  <snm>Adeyinka</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kjellen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Wennerberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Baldetorp</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Petersen</snm>
                  <fnm>OW</fnm>
               </au>
               <au>
                  <snm>Olsson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Oredsson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Isola</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Borg</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Lab Invest</source>
            <pubdate>2003</pubdate>
            <volume>83</volume>
            <issue>3</issue>
            <fpage>387</fpage>
            <lpage>396</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12649339</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>High-resolution genomic profiles of breast cancer cell lines assessed by tiling BAC array comparative genomic hybridization</p>
            </title>
            <aug>
               <au>
                  <snm>Jonsson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Staaf</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Olsson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Heidenblad</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vallon-Christersson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Osoegawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>de Jong</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Oredsson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ringner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hoglund</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Borg</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genes Chromosomes Cancer</source>
            <pubdate>2007</pubdate>
            <volume>46</volume>
            <issue>6</issue>
            <fpage>543</fpage>
            <lpage>558</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/gcc.20438</pubid>
                  <pubid idtype="pmpid" link="fulltext">17334996</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Identification of cryptic aberrations and characterization of translocation breakpoints using array CGH in high hyperdiploid childhood acute lymphoblastic leukemia</p>
            </title>
            <aug>
               <au>
                  <snm>Paulsson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Heidenblad</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Morse</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Borg</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fioretos</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Johansson</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Leukemia</source>
            <pubdate>2006</pubdate>
            <volume>20</volume>
            <issue>11</issue>
            <fpage>2002</fpage>
            <lpage>2007</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.leu.2404372</pubid>
                  <pubid idtype="pmpid" link="fulltext">16990785</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes</p>
            </title>
            <aug>
               <au>
                  <snm>de Leeuw</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Rosenwald</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bebb</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gascoyne</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Dyer</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Staudt</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Martinez-Climent</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Lam</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <issue>17</issue>
            <fpage>1827</fpage>
            <lpage>1837</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/ddh195</pubid>
                  <pubid idtype="pmpid" link="fulltext">15229187</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Comprehensive copy number profiles of breast cancer cell model genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Shadeo</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lam</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Breast Cancer Res</source>
            <pubdate>2006</pubdate>
            <volume>8</volume>
            <issue>1</issue>
            <fpage>R9</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1413994</pubid>
                  <pubid idtype="pmpid" link="fulltext">16417655</pubid>
                  <pubid idtype="doi">10.1186/bcr1370</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Determining the center of array-CGH data</p>
            </title>
            <aug>
               <au>
                  <snm>Lipson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ben-Dor</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yakhini</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Computational aspects of DNA copy number measurement</source>
            <publisher>Technion &#8211; Israel Institute of Technology, Computer Science Department</publisher>
            <pubdate>2007</pubdate>
            <fpage>105</fpage>
            <lpage>110</lpage>
         </bibl>
         <bibl id="B25">
            <title>
               <p>High resolution analysis of non-small cell lung cancer cell lines by whole genome tiling path array CGH</p>
            </title>
            <aug>
               <au>
                  <snm>Garnis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lockwood</snm>
                  <fnm>WW</fnm>
               </au>
               <au>
                  <snm>Vucic</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ge</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Girard</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Minna</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Gazdar</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Lam</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Macaulay</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lam</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Int J Cancer</source>
            <pubdate>2006</pubdate>
            <volume>118</volume>
            <issue>6</issue>
            <fpage>1556</fpage>
            <lpage>1564</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/ijc.21491</pubid>
                  <pubid idtype="pmpid" link="fulltext">16187286</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Agilent Technologies</p>
            </title>
            <url>http://www.agilent.com</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Agilent eArray</p>
            </title>
            <url>http://earray.chem.agilent.com/earray</url>
         </bibl>
         <bibl id="B28">
            <title>
               <p>BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Saal</snm>
                  <fnm>LH</fnm>
               </au>
               <au>
                  <snm>Troein</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vallon-Christersson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gruvberger</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Borg</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <issue>8</issue>
            <fpage>SOFTWARE0003</fpage>
            <lpage/>
            <note/>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2002-3-8-software0003</pubid>
                  <pubid idtype="pmpid" link="fulltext">12186655</pubid>
                  <pubid idtype="pmcid">139402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>A statistical method for flagging weak spots improves normalization and ratio estimates in microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Ruan</snm>
                  <fnm>QG</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Eckenrode</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McIndoe</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>She</snm>
                  <fnm>JX</fnm>
               </au>
            </aug>
            <source>Physiol Genomics</source>
            <pubdate>2001</pubdate>
            <volume>7</volume>
            <issue>1</issue>
            <fpage>45</fpage>
            <lpage>53</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid">11595791</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
