<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-485</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>CrossHybDetector: detection of cross-hybridization events in DNA microarray experiments</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Uva</snm>
               <fnm>Paolo</fnm>
               <insr iid="I1"/>
               <email>paolo_uva@merck.com</email>
            </au>
            <au id="A2" ca="yes">
               <snm>de Rinaldis</snm>
               <fnm>Emanuele</fnm>
               <insr iid="I1"/>
               <email>emanuele_derinaldis@merck.com</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Istituto di Ricerche di Biologia Molecolare, Merck Research Laboratories, 00040 Pomezia, Rome, Italy</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>485</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/485</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19014642</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-485</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>26</day>
               <month>8</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>17</day>
               <month>11</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>17</day>
               <month>11</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Uva and de Rinaldis; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>DNA microarrays contain thousands of different probe sequences represented on their surface. These are designed in such a way that potential cross-hybridization reactions with non-target sequences are minimized. However, given the large number of probes, the occurrence of cross hybridization events cannot be excluded. This problem can dramatically affect the data quality and cause false positive/false negative results.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p><it>CrossHybDetector </it>is a software package aimed at the identification of cross-hybridization events occurred during individual array hybridization, by using the probe sequences and the array intensity values. As output, the software provides the user with a list of array spots potentially 'corrupted' and their associated p-values calculated by Monte Carlo simulations. Graphical plots are also generated, which provide a visual and global overview of the quality of the microarray experiment with respect to cross-hybridization issues.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p><it>CrossHybDetector </it>is implemented as a package for the statistical computing environment <it>R </it>and is freely available under the LGPL license within the CRAN project.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Expression microrrays are used in a wide range of applications to simultaneously monitor the relative abundance of thousands of target sequences. A basic requirement of array probes is that of having low reciprocal similarity in order to reduce the likelihood of cross-hybridization effects. On the other hand, the permissible sequence 'dissimilarity' between probes is constrained by the requirement that probes have to share similar temperatures of annealing. This guarantees that all target sequences correctly hybridize with their probes in the same experimental conditions. Thus, the optimal array probe design is a trade-off solution between probe Tm similarities and probe sequence 'dissimilarities'. As a consequence, the greater the number of probes spotted on the array, the greater the chance of cross-hybridization events occurring. In this case the signal intensity measured by a chip spot carrying a given probe is affected by the unspecific binding of an off-target sequence similar to the target sequence. This effect, if not detected, can result in a number of false positives signals on the array. Even in the case of optimal probe design, suboptimal experimental conditions may favor cross-hybridization over specific binding <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. This issue can be particularly relevant for customized microarray designs. In fact, hybridization protocols of standard commercial platforms are optimized and rigorous quality controls are carried out before the microarray platform gets deployed <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. On the contrary, customized arrays are more prone to be affected by cross-hybridization or other issues related to the novel probe design. As a consequence, the set up of ad-hoc quality controls procedures is a crucial pre-requisite to improve data quality. <it>CrossHybDetector </it>identifies probes highly similar and checks for 'suspicious' spot intensity patterns based on the outcome of a single microarray experiment. A p-value expressing the likelihood of the pattern occurring by chance is calculated for each probe using Monte Carlo simulations. In addition, a global 'cross-hybridization quality control' parameter is provided in output and plots are generated which allow a visual overview of the cross-hybridization events in the microarray experiment. Here we present the <it>CrossHybDetector </it>software and, as a proof of concept, we illustrate two case studies in which the software and the underlying methodology have been successfully applied to detect cross-hybridization events.</p>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <p><it>CrossHybDetector </it>is implemented as a package within the statistical computing environment R <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Functions of <it>marray </it>and <it>methods R </it>packages are internally utilized and are required by <it>CrossHybDetector </it>to work.</p>
         <sec>
            <st>
               <p>Data formats</p>
            </st>
            <p><it>CrossHybDetector </it>algorithm uses as input data i) the array probe sequences, ii) the spot intensities and array layout, iii) the spot type information (i.e. for each spot, whether it is "standard probe", "negative control", "spike-in"). This information is respectively contained into three separated text files. Exemplary analysis and related input files are submitted as  supplemental materials (Additional files <supplr sid="S1">1</supplr> and <supplr sid="S2">2</supplr>).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Application of <it>CrossHybDetector </it>to a microarray experiment.</b> PDF file containing step-by-step instructions on how to run a complete analysis using the <it>CrossHybDetector </it>package. The document also includes all the plots produced by the package. Input files used in this example are available in Additional file <supplr sid="S2">2</supplr>.</p>
               </text>
               <file name="1471-2105-9-485-S1.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p><b>Input files used for <it>CrossHybDetector </it>exemplary analysis.</b> This ZIP archive include three files containing i) the array probe sequences, ii) the array spot intensities and array layout, iii) the spot type information (i.e. for each spot, whether it is "probe", "negative control", "spike-in").</p>
               </text>
               <file name="1471-2105-9-485-S2.zip">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Algorithm</p>
            </st>
            <p>A cross-hybridization event takes place when a target sequence not only hybridizes to its related spot(s) on the chip, but also 'corrupts' chip spots carrying probes with similar sequences. The degree of the cross-hybridization effect is proportional to the sequence similarity between probes and to the relative abundance of the off-target sequence versus the target sequence. As a consequence, abundant target sequences can generate high signal intensities on their related chip spots as well as 'pushing ahead' the intensity values of spots carrying similar probes. The algorithm implemented in <it>CrossHybDetector </it>works as follows:</p>
            <p>1. The probes with the highest intensities are selected as follows: probes with intensities higher than the saturation value (default = 65535) OR probes with intensity higher than a z-score threshold (default = 3). Among the two generated probe list, the largest one is selected. These probes are more prone to cause detectable cross-hybridization events on probes with similar sequences. In principle even low abundant targets could cross-hybridize to non specific probes on the chip, but this would have a minor impact on the final readout. For this reason these probes are not considered by default. However, the default threshold values can be modified by the user to extend the analysis to the probes with lower intensities.</p>
            <p>2. Each of the selected probes is aligned against all the others. For each pairwise alignment the similarity between sequences is expressed as the Smith-Waterman (SW) score <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
            <p>Among the available measures of pairwise sequence similarity (i.e. percent identity, longest common stretch) <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>, we decided to adopt the SW score. This metric can be rapidly computed and is reported to be highly correlated (r = 0.98, p-value p &lt; 10<sup>-165</sup>) with the best univariate predictor of cross-hybridization ("most contiguous base pairs between probe and target sequences") <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
            <p>In the presence of a particular composition of the probes spotted on the array, alternative methods to compute the pairwise sequence similarity can be easily plugged in.</p>
            <p>3. All the probes similar in sequence to probe <it>i </it>are identified by selecting all the alignments with a &#916;SW score below a user-defined threshold. &#916;SW is computed as:</p>
            <p>
               <display-formula>&#916;SW<sub>i,k </sub>= SW<sub>i,i </sub>- SW<sub>i,k</sub></display-formula>
            </p>
            <p>where SW<sub>i,i </sub>is the score of the alignment of probe<sub>i </sub>vs. probe<sub>i </sub>('perfect' pairing) and SW<sub>i,k </sub>is the score of the aligment of probe<sub>i </sub>vs probe<sub>k </sub>('imperfect' pairing).</p>
            <p>4. Monte Carlo simulations: the sum of intensities for each subset of probes selected in 3 is compared against the distribution of the sums obtained from the random sampling of an equal number of probes on the array (10,000 samplings by default). A p-value is therefore calculated for each subset of probes as the fraction of the random samplings having a sum of intensities higher than the observed value, and corrected for multiple testing using the FDR procedure <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Probe subsets with p-value below a user-defined threshold (0.01 by default) are assessed as being composed of probes affected by unspecific binding. These probes are flagged as <it>corrupted</it>. The probe corresponding to the target sequence causing cross-hybridization is instead flagged as <it>corruptor</it>. A <it>corruptor/corrupted </it>probe pair is therefore composed by a <it>corruptor </it>probe with one of its <it>corrupted </it>probes.</p>
            <p>The total number of <it>corruptor/corrupted </it>probe pairs reflects the amount of cross-hybridization in the microarray experiment.</p>
         </sec>
         <sec>
            <st>
               <p>Output</p>
            </st>
            <p>Different types of outputs are generated as results of the analysis (see also Additional file <supplr sid="S1">1</supplr>):</p>
            <p>1) A plot showing for each analyzed probe the p-value resulting from the Monte Carlo simulation</p>
            <p>2) A list of the probes identified as <it>corruptors </it>and their related p-values</p>
            <p>3) A list of the probes identified as <it>corrupted </it>and their related p-values</p>
            <p>4) A plot showing the spatial distribution of <it>corruptors </it>and <it>corrupted </it>probes on the array</p>
            <p>5) A plot showing the ratio versus average intensity values (<it>MA plot</it>) with <it>corruptors </it>and <it>corrupted </it>probes highlighted in colors (Figure <figr fid="F1">1</figr>).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Examples of cross-hybridization in two independent data sets</p>
               </caption>
               <text>
                  <p><b>Examples of cross-hybridization in two independent data sets</b>. Magnitude versus amplitude (MA) plot of two arrays from the "Phage TAG array" (A) and the "Yeast TAG array" (B) data sets. The x-axis represents the average log2 intensity of the two channels, and the y-axis represents the log2 ratio of channel1/channel2. Probes identified as <it>corruptors </it>or <it>corrupted </it>in the first (R), the second (G) and both (RG) channels are highlighted with the respective labels. Horizontal lines indicate 2-fold change (log2 ratio = -/+1).</p>
               </text>
               <graphic file="1471-2105-9-485-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Performance</p>
            </st>
            <p>The analysis of a double channel Agilent 22K array (with the default parameters) takes about 8 min using a Pentium 4, 3 GHz, 1GB RAM</p>
            <p>A step-by-step description on how to run an exemplary microarray analysis and the output figures is illustrated in supplemental materials (Additional file <supplr sid="S1">1</supplr>).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>To validate the software and the related methodology, we have used the <it>CrossHybDetector </it>software to analyze independent microarray data sets obtained from two different array layouts, here referred as "Phage TAG array" and "Yeast TAG array". In both cases the oligonucleotide probes spotted on the arrays were representing artificially designed "DNA barcodes". These types of arrays are largely used in a variety of applications to monitor the respective abundances of DNA synthetic sequences ("barcodes") present in different samples <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>The "Phage TAG array" dataset is composed of 76 previously reported microarray experiments carried out in our laboratory <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. The array layout includes probes complementary to a repertoire of 20.736 synthetic DNA "barcode" sequences and was designed to analyze collection of phage clones tagged with short synthetic DNA sequences <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. The "Yeast TAG array" dataset contains 135 microarray generated with in-house synthesized oligonucleotide arrays <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. The array layout includes probes complementary to a collection of 11.986 different DNA barcodes. In this case, it was designed to profile the relative abundances of yeast strains from the Yeast Knockout (YKO) strain collection <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>, tagged with short synthetic DNA sequences.</p>
         <p>The results obtained by applying <it>CrossHybDetector </it>to each hybridization experiment of the two data sets (using a Monte Carlo p-value threshold equal to 0.01) are presented in Table <tblr tid="T1">1</tblr>. In both data sets <it>corruptor </it>and <it>corrupted </it>probes were identified. In the case of the "Phage TAG array" data set, after the adoption of more stringent conditions to the hybridization and washing protocols, the hybridization experiments affected by cross-hybridization problems decreased from 22 (out of an initial number of 152 hybridization experiments) to 5 (data not shown).</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Summary results of <it>CrossHybDetector</it></p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>
                        <b>
                           <it>Phage TAG Array</it>
                        </b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>
                           <it>Yeast TAG Array</it>
                        </b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Oligonucleotide probes spotted on the array</p>
                  </c>
                  <c ca="center">
                     <p>20736</p>
                  </c>
                  <c ca="center">
                     <p>11986</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Arrays analyzed</p>
                  </c>
                  <c ca="center">
                     <p>76</p>
                  </c>
                  <c ca="center">
                     <p>135</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Hybridization exps analyzed</p>
                  </c>
                  <c ca="center">
                     <p>152</p>
                  </c>
                  <c ca="center">
                     <p>270</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Hybridization exps affected by cross-hybridization events</p>
                  </c>
                  <c ca="center">
                     <p>22</p>
                  </c>
                  <c ca="center">
                     <p>35</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Probes identified as <it>corrupted </it>in more than 10% hybridization exps</p>
                  </c>
                  <c ca="center">
                     <p>277</p>
                  </c>
                  <c ca="center">
                     <p>940</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>The output plots in Figure <figr fid="F1">1</figr> (panels A and B) show the results of the <it>CrossHybDetector </it>analysis in one experiment affected by cross-hybridization events in each of the two data sets. In panel A, it can be observed that many spots with high intensity values are identified as <it>corruptors </it>and affect a large part of the probes with a fold change greater than two, thus resulting as <it>corrupted</it>.</p>
         <p>The complete set of output results can be generated by following the software guidelines provided as supplemental materials (Additional file <supplr sid="S1">1</supplr>).</p>
         <p>To further validate the results, we separately run an additional analysis on the probes identified as <it>corrupted </it>in more than 10% of the experiments of each data set.</p>
         <p>For each <it>corrupted </it>probe, the correlation with each of its respective <it>corruptors </it>in the different experiments was calculated across the entire data set.</p>
         <p>The rationale for this analysis is that any statistically significant correlation observed between two probes of a <it>corruptor/corrupted </it>pair can only be ascribed to cross-hybridization effects as no functional relationship exist between the synthetic sequences monitored in the two experiment settings. Results are illustrated in Table <tblr tid="T2">2</tblr>. In both data sets we observed that the average Pearson's correlation for the whole set of <it>corruptor/corrupted </it>pairs was significantly higher than the average correlation obtained on an equal number of randomly selected probe pairs (10,000 Monte Carlo samplings, p-value &lt; 0.0001). As an example, the correlation between the probes of two <it>corruptor/corrupted </it>pairs, respectively identified in each data set, is shown in Figure <figr fid="F2">2</figr>.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Plots of the intensity of <it>corruptor </it>versus <it>corrupted </it>probes</p>
            </caption>
            <text>
               <p><b>Plots of the intensity of <it>corruptor </it>versus <it>corrupted </it>probes</b>. Correlation between the probe intensity values of two <it>corruptor/corrupted </it>pairs, respectively identified in each data set. Each point represents a different array hybridization experiment. The Pearson's correlation and the corresponding p-values are indicated.</p>
            </text>
            <graphic file="1471-2105-9-485-2"/>
         </fig>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>Summary results of the probe pairs correlation analysis</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>
                        <b>
                           <it>Phage TAG Array</it>
                        </b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>
                           <it>Yeast TAG Array</it>
                        </b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p><it>Corruptor/corrupted </it>probe pairs</p>
                  </c>
                  <c ca="center">
                     <p>3034</p>
                  </c>
                  <c ca="center">
                     <p>1048</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Average Pearson's correlation</p>
                  </c>
                  <c ca="center">
                     <p>0.75</p>
                  </c>
                  <c ca="center">
                     <p>0.24</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Monte Carlo Average Pearson's correlation (on 10,000 random pairs selections)</p>
                  </c>
                  <c ca="center">
                     <p>0.36</p>
                  </c>
                  <c ca="center">
                     <p>0.18</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Monte Carlo p-value</p>
                  </c>
                  <c ca="center">
                     <p>&lt;0.0001</p>
                  </c>
                  <c ca="center">
                     <p>&lt;0.0001</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>These results represent independent confirmation that the probes identified as <it>corrupted </it>in individual experiments by the <it>CrossHybDetector </it>were affected by artifactual cross-hybridization effects.</p>
      </sec>
      <sec>
         <st>
            <p>Discussion and conclusion</p>
         </st>
         <p>Other methods have been reported that address the problem of cross-hybridization in DNA microarray. Flikka et al. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> developed a web-tool for the assessment of the reliability of hybridization signals in different array designs by comparing probe sequences against human, mouse and rat transcript collections. Gene candidates for cross-hybridization are selected on the basis of sequence similarity calculated using the BLAST algorithm <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. As opposed to <it>CrossHybDetector</it>, this tool was not conceived as a quality control tool for array hybridization experiments and it does not take into account the hybridization signals derived from specific experiments. A different approach was used by Casneuf and collaborators <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Here probe sets affected by off-target hybridization were identified on the basis of positive correlations between sequence similarity and expression across a series of microarray experiments. This approach is similar to the correlation analysis we carried out to validate the results of <it>CrossHybDetector</it>. The most important difference between the two methodologies lies in the fact that while Casneuf's approach uses an entire data series to identify potential cross-hybridizing probes, <it>CrossHybDetector </it>focuses on individual hybridization array experiments.</p>
         <p>In this respect, <it>CrossHybDetector </it>is mainly a quality control software for single hybridization experiments, conceptually similar to other published tools aiming at the monitoring of different quality parameters such as the 'geographical' bias, the spot replicate concordance, the two-channel correlation <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. Other methods also exist, which are aimed at correcting the spot intensities with a model-based approach <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp> and therefore with a different purpose than experimental quality control.</p>
         <p><it>CrossHybDetector </it>uses both the probe intensity values and the probe sequences to identify potentially 'corrupted' spots. As a consequence, cross-hybridization events that do not cause an intensity increase of the spots carrying similar probe sequences cannot be detected. This has to be considered as an intrinsic limitation due to the 'in-silico' nature of the method and does not affect its general utility.</p>
         <p><it>CrossHybDetector </it>can be applied to all arrays where for each individual probe its intensity signal is provided. As a consequence, it can not be applied to Affymetrix chips where the intensity values are associated to multiple probes (PM and MM probe sets). We envisage <it>CrossHybDetector </it>to be extremely useful to quality control experiments performed on customized microarrays. In these particular settings, microarrays are more prone to present issues related to suboptimal probe design and/or experimental conditions.</p>
         <p>Moreover, being developed as a standard R package, <it>CrossHybDetector </it>is well suited to integrating into more complex quality control platforms and automated analysis workflows.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p><b>Project name: </b>CrossHybDetector</p>
         <p>
            <b>Project home page: </b>
            <url>http://cran.r-project.org/</url>
         </p>
         <p><b>Operating systems: </b>The crosshybDetector package can be installed on all the platforms supporting R. These include a wide variety of UNIX platforms, Windows and MacOS.</p>
         <p><b>Programming language: </b>R</p>
         <p><b>License: </b>LGPL</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>PU conceived and implemented the method, wrote the code and drafted the manuscript. EDR contributed to the conceptualization of the method and wrote the final version of the manuscript. Both authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We wish to thank Paolo Monaci and Armin Lahm for useful discussions and Janet Clench for the linguistic revision of the text. We also would like to thank the insightful suggestions from anonymous reviewers. This work was supported in part by a grant from Ministero dell'Istruzione, dell'Universit&#224; e della Ricerca.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Use of hybridization kinetics for differentiating specific from non-specific binding to oligonucleotide microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Dai</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stepaniants</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ziman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stoughton</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>e86</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">134259</pubid>
                  <pubid idtype="pmpid" link="fulltext">12177314</pubid>
                  <pubid idtype="doi">10.1093/nar/gnf085</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements</p>
            </title>
            <aug>
               <au>
                  <snm>Shi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Reid</snm>
                  <fnm>LH</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>WD</fnm>
               </au>
               <au>
                  <snm>Shippy</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Warrington</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>de Longueville</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kawasaki</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>KY</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2006</pubdate>
            <volume>24</volume>
            <fpage>1151</fpage>
            <lpage>1161</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1239</pubid>
                  <pubid idtype="pmpid" link="fulltext">16964229</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Microarray data quality &#8211; review of current developments</p>
            </title>
            <aug>
               <au>
                  <snm>Wilkes</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Laux</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Foy</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Omics</source>
            <pubdate>2007</pubdate>
            <volume>11</volume>
            <fpage>1</fpage>
            <lpage>13</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/omi.2006.0001</pubid>
                  <pubid idtype="pmpid" link="fulltext">17411392</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The R Project for Statistical Computing</p>
            </title>
            <url>http://www.r-project.org</url>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Identification of common molecular subsequences</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Waterman</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1981</pubdate>
            <volume>147</volume>
            <fpage>195</fpage>
            <lpage>197</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(81)90087-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">7265238</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>GenXHC: a probabilistic generative model for cross-hybridization compensation in high-density genome-wide microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>QD</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Frey</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>Suppl 1</issue>
            <fpage>i222</fpage>
            <lpage>231</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti1045</pubid>
                  <pubid idtype="pmpid" link="fulltext">15961461</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>A multivariate prediction model for microarray cross-hybridization</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>YA</fnm>
               </au>
               <au>
                  <snm>Chou</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Slate</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Peck</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Voit</snm>
                  <fnm>EO</fnm>
               </au>
               <au>
                  <snm>Almeida</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>101</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1409802</pubid>
                  <pubid idtype="pmpid" link="fulltext">16509965</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Controlling the false discovery rate: a practical and powerful approach to multiple testing</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hochberg</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Royal Stat Soc B</source>
            <pubdate>1995</pubdate>
            <volume>57</volume>
            <fpage>289</fpage>
            <lpage>300</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Signature-tagged mutagenesis: barcoding mutants for genome-wide screens</p>
            </title>
            <aug>
               <au>
                  <snm>Mazurkiewicz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Boone</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Holden</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>929</fpage>
            <lpage>939</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1984</pubid>
                  <pubid idtype="pmpid" link="fulltext">17139324</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A unique and universal molecular barcode array</p>
            </title>
            <aug>
               <au>
                  <snm>Pierce</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Fung</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Jaramillo</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Nislow</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Giaever</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nat Methods</source>
            <pubdate>2006</pubdate>
            <volume>3</volume>
            <fpage>601</fpage>
            <lpage>603</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nmeth905</pubid>
                  <pubid idtype="pmpid" link="fulltext">16862133</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Differential screening of phage-ab libraries by oligonucleotide microarray technology</p>
            </title>
            <aug>
               <au>
                  <snm>Monaci</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Luzzago</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Santini</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>De Pra</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Arcuri</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Magistri</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bellini</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ansuini</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ambrosio</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ammendola</snm>
                  <fnm>V</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS ONE</source>
            <pubdate>2008</pubdate>
            <volume>3</volume>
            <fpage>e1508</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2204054</pubid>
                  <pubid idtype="pmpid" link="fulltext">18231595</pubid>
                  <pubid idtype="doi">10.1371/journal.pone.0001508</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Improved microarray methods for profiling the Yeast Knockout strain collection</p>
            </title>
            <aug>
               <au>
                  <snm>Yuan</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Ooi</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Peyser</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Spencer</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>Irizarry</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Boeke</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>e103</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1169235</pubid>
                  <pubid idtype="pmpid" link="fulltext">15994458</pubid>
                  <pubid idtype="doi">10.1093/nar/gni105</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar-coding strategy</p>
            </title>
            <aug>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>DD</fnm>
               </au>
               <au>
                  <snm>Lashkari</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mittmann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>1996</pubdate>
            <volume>14</volume>
            <fpage>450</fpage>
            <lpage>456</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1296-450</pubid>
                  <pubid idtype="pmpid" link="fulltext">8944025</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>XHM: a system for detection of potential cross hybridizations in DNA microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Flikka</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yadetie</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Laegreid</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Jonassen</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>117</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">517492</pubid>
                  <pubid idtype="pmpid" link="fulltext">15333145</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-117</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2231712</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation</p>
            </title>
            <aug>
               <au>
                  <snm>Casneuf</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Peer</snm>
                  <mnm>Van de</mnm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Huber</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>461</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2213692</pubid>
                  <pubid idtype="pmpid" link="fulltext">18039370</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-8-461</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>arrayMagic: two-colour cDNA microarray quality control and preprocessing</p>
            </title>
            <aug>
               <au>
                  <snm>Buness</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Huber</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Steiner</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sultmann</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Poustka</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>554</fpage>
            <lpage>556</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti052</pubid>
                  <pubid idtype="pmpid" link="fulltext">15454413</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Quality assessment of microarrays: visualization of spatial artifacts and quantitation of regional biases</p>
            </title>
            <aug>
               <au>
                  <snm>Reimers</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weinstein</snm>
                  <fnm>JN</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>166</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1189079</pubid>
                  <pubid idtype="pmpid" link="fulltext">15992406</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-166</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
