<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-461</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Casneuf</snm>
               <fnm>Tineke</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>tineke.casneuf@psb.ugent.be</email>
            </au>
            <au id="A2">
               <snm>Van de Peer</snm>
               <fnm>Yves</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>yves.vandepeer@psb.ugent.be</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Huber</snm>
               <fnm>Wolfgang</fnm>
               <insr iid="I3"/>
               <email>huber@ebi.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium</p>
            </ins>
            <ins id="I2">
               <p>Department of Molecular Genetics, Ghent University, B-9052 Ghent, Belgium</p>
            </ins>
            <ins id="I3">
               <p>EMBL &#8211; European Bioinformatics Institute, Cambridge CB10 1SD, UK</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>461</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/461</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18039370</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-461</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>05</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>26</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>26</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Casneuf et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Microarray co-expression signatures are an important tool for studying gene function and relations between genes. In addition to genuine biological co-expression, correlated signals can result from technical deficiencies like hybridization of reporters with off-target transcripts. An approach that is able to distinguish these factors permits the detection of more biologically relevant co-expression signatures.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We demonstrate a positive relation between off-target reporter alignment strength and expression correlation in data from oligonucleotide genechips. Furthermore, we describe a method that allows the identification, from their expression data, of individual probe sets affected by off-target hybridization.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The effects of off-target hybridization on expression correlation coefficients can be substantial, and can be alleviated by more accurate mapping between microarray reporters and the target transcriptome. We recommend attention to the mapping for any microarray analysis of gene expression patterns.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Microarrays are a valuable tool in functional genomics research. The breadth of their applications is reflected by the myriad of computational methods that have been developed for their analysis in the last decade. One popular practice is to compare expression patterns of genes by calculating correlation coefficients on expression level estimates across a set of conditions. Many downstream analysis tools are based on the presence or absence of correlation in the expression profiles of genes, like the inference of co-expression <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>, gene regulatory <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and Bayesian networks <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp> and the study of gene family evolution <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. From a biological point of view, these approaches are useful and informative, but here we show that if care has not been taken as to how these correlations are calculated and how the reporters for each transcript are selected, incorrect conclusions can be drawn.</p>
         <p>A gene is represented on a microarray by one or more reporters, i. e. nucleotide sequences that are designed to uniquely match its transcript, or transcripts if different splice variants exist <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Affymetrix GeneChips are the most widely used microarray platform, and a wealth of data measured on these arrays is publicly available. Affymetrix reporters are 25-mer oligonucleotides whose sequence is complementary to the intended target. Each target is represented by a set of reporters, called <it>composite sequences </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp> or <it>probe set </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Probe set size varies between 11 and 20, depending on the type of array, but is the same for the majority of the probe sets within one array. The signals of these different individual reporters are combined into one expression value for the probe set in a step called <it>summarization </it><abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>.</p>
         <p>The composition of the probe sets and the identifier of their gene transcript is contained in what is referred to as a CDF, a chip description file. Affymetrix, as array manufacturer, provides this information, and thanks to the openness of their technology specification, users can also construct their own custom-made CDFs. For Affymetrix' CDFs, probe set compositions are considered static and probe set annotation dynamic: with an updated annotation of a genome, the assignment of a probe set to a particular target gene can change, but never the content of its reporters <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. For custom-made CDFs, this restriction is not necessary, as reporters can be arbitrarily assigned to targets.</p>
         <p>Microarray technology confronts researchers with various challenges. Our understanding of transcriptomes is incomplete, and our estimates of which transcripts exist in a genome are constantly evolving. Therefore, for the analysis of microarray data it is important to ascertain that a reporter does in fact measure the transcript it was intended to target when the array was designed. Another concern is cross-hybridization, where transcripts other than the ones intended hybridize to a reporter. The signal that is obtained for such a reporter will be that of a combination of multiple different transcripts.</p>
         <p>The widespread use of expression arrays encouraged different research groups to study the extent and effect of hybridization of cDNA molecules to reporters with mismatches in more detail. The cardinal importance of reporter annotation was underscored by observations made and evaluation tools developed by several research groups <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. Dai et al. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> conducted a comparative analysis of GeneChip data with original and redefined probe set definitions and described a discrepancy of 30 to 50% difference in the lists of reported genes using various analyses. These authors provide up-to-date reporter mapping files for various types of GeneChips that match individual reporters to transcripts. Based on the same observation of problematic reporter annotation, Zhang et al. <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> conducted an in-depth analysis of the reporter assignment on specific microarrays and pinpointed consistent but inaccurate signals across multiple experiments resulting from problematic reporters that are either non-specific or miss their target. They concluded that up to around 10% of the reporters on widely used arrays are non-specific in that they target multiple transcripts and another 10% miss their target.</p>
         <p>Different efforts have also aimed to model hybridization strength and extent of cross-hybridization to improve the design of high affinity reporters that are less prone to cross-hybridization <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. In addition, tools have been developed to infer the extent of cross-hybridization of individual reporter sets subsequent to data analysis <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
         <p>The technical aspect of the microarray technology has also been tackled: Eklund et al. <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> reported that replacing cRNA with cDNA hybridization targets substantially reduces cross-hybridization. Alternative technologies to detect cross-hybridization on microarrays have also been suggested <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>.</p>
         <p>Wren et al. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> described a positive relationship between the observed signal and the amount of contiguous hydrogen bonds involved in duplex formation during reporter-transcript binding. Okoniewski and Miller <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> conducted a large-scale analysis to map all interactions between reporters, probe sets and transcripts on the HGUI33A array. First, a set of basic motifs were defined to identify families of interacting probe sets as in some cases a reporter can bind more than one transcript, or a transcript can bind more than one reporter. The motifs were then used to build a bipartite graph of interactions with the probe sets and transcripts as nodes and matches as edges. The authors were able to identify several hub probe sets, whose expression combines the signals of many available transcripts. A detailed investigation of the expression signals revealed that reporters targeting multiple transcripts had higher absolute expression signal than those targeting a unique transcript, and that probe sets that contain reporters with multiple matches had increased expression correlation between them.</p>
         <p>A different approach <it>in situ </it>was taken by Wu et al. <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> for the construction of a free energy model for cross-hybridization. These authors observed a clear relationship between the known concentrations of spiked-in transcripts in different experiments and the measured signals of reporters not designed to target these specific transcripts. Based on the sequences of these affected reporters, the authors constructed a free energy model to assess the sequence dependence of cross-hybridization which can be used to refine the algorithms used in reporter design.</p>
         <p>These different studies intelligibly show that cross-hybridization is a critical concern for microarray analysis. It is clear that a reporter can bind different transcripts or that a transcript can bind to different reporters if stable, partial binding occurs or if hairpin structures are formed <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. As a result, the signals of the reporters a transcript binds will be similar and correlation coefficients, calculated on these signals during downstream analysis, will be artifactual. The <it>in situ </it>effect of sequence similarity on expression correlation is however not known.</p>
         <p>For this study we worked with the ATH1 Affymetrix GeneChip that was designed for the analysis of gene expression in <it>Arabidopsis thaliana</it>. <it>Arabidopsis </it>is the most commonly studied model plant organism and a wealth of high quality data has been generated with this GeneChip. We investigated the relationship between reporter-to-transcript sequence similarity and correlation of expression signals. We assessed the extent to which inclusion of off-target reporters in probe sets, i. e. reporters that are highly alignable to another transcript than the intended one, influences this correlation. The conventional probe set design, as defined by the manufacturer of the microarray was evaluated with respect to cross-hybridization and compared to our custom-made probe set composition.</p>
         <p>We show that numerous probe sets on a widely used commercial array contain off-target reporters, and that inclusion of these reporters in a probe set gives rise to a signal pattern that is highly similar to that of the unintended probe set. We illustrate our findings with examples and demonstrate the effect of individual reporters through simulation. Furthermore, we put forward a novel method to detect unreliable probe set to transcript hybridization events. Our results show that excluding reporters that align well to another transcript diminishes this effect to a substantial extent and provides a method to pinpoint the occurrence of cross-hybridization in existing microarray datasets. We conclude from this study that reporter-to-transcript sequence alignment strength can be a source of error in studies of correlation of expression signals and that proper probe set composition is effective in minimizing the effect of cross-hybridization.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>Two definitions of probe set annotation</p>
            </st>
            <p>The ATH1 is an Affymetrix GeneChip for the analysis of gene expression in the premier plant model organism <it>Arabidopsis thaliana</it>. A wealth of high quality data measured with this array is publicly available and has been widely used for various applications, such as the inference of gene co-expression networks and the study of functional aspects of the evolution of gene families <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp> (reviewed in <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>).</p>
            <p>For the Affymetrix CDF of the ATH1, a probe set was assigned to a gene if nine or more of its reporters had perfect sequence identity with the gene's transcript consensus sequence. If this condition was fulfilled for multiple genes, the probe set was assigned to all of them. In this way, 22,810 probe sets were assigned to more than 24,000 genes. A probe set can thus contain up to eight reporters that align perfectly to another gene's transcript without being assigned to it <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
            <p>We built a custom-made CDF with alternative probe set definitions and annotations. We aligned each 25-mer reporter sequence to the predicted transcripts of <it>Arabidopsis thaliana </it>(see Methods for details). A reporter was assigned to a gene if it had perfect sequence identity with its transcript(s) and did not align to any other gene's transcript with zero or one mismatches. We removed reporters that had multiple hits in the genome, and reporters that had hits in the reverse complementary direction. Probe sets were defined as eight or more reporters all assigned to a particular gene's transcript(s). This resulted in 19,937 probe sets with unique assignments to 19,937 target genes. Table <tblr tid="T1">1</tblr> shows some statistics on the probe set definitions. The approach we took is highly similar to the one introduced by Dai et al. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Statistics of probe set definitions. The first 2 rows contain the number of probe sets and reporters in the Affymetrix and the custom-made CDF. The number of reporters times the number of predicted transcripts, in the bottom row, results in the total number of reporter-to-transcript alignment scores (see also Figure 1).</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CDF Affymetrix</p>
                     </c>
                     <c ca="left">
                        <p>Custom-made CDF</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of probe sets:</p>
                     </c>
                     <c ca="left">
                        <p>22,810</p>
                     </c>
                     <c ca="left">
                        <p>19,937</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of reporters:</p>
                     </c>
                     <c ca="left">
                        <p>251,078</p>
                     </c>
                     <c ca="left">
                        <p>217,811</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of alignment scores:</p>
                     </c>
                     <c ca="left">
                        <p>6,926,739,864</p>
                     </c>
                     <c ca="left">
                        <p>6,008,969,868</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="1" ca="center">
                        <p>Total number of transcripts in TAIR6:</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>27,588</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>In those cases where their probe set annotations are based on the UniGene database, Dai and colleagues require perfect hits to unigene clusters and unique hits of a reporter to a genomic location. For their CDFs that are based on databases other than UniGene, the rule of one transcript assignment per reporter does not apply <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, so reporters can be assigned to multiple transcripts. As this is currently the case for the ATH1 array, for which the CDF of Dai et al. is based on the TAIR annotation, we computed a custom CDF that requires uniqueness. Hence, we expect that our results can be generalized to other arrays for which Dai et al. have computed CDFs with 1:1 reporter-target mapping, and in the future, when their ATH1 CDF will be changed to unique 1:1 mapping (personal communication), it could be used instead of our custom CDF.</p>
         </sec>
         <sec>
            <st>
               <p>Off-target alignments</p>
            </st>
            <p>Our aim was to investigate the relationship between correlation coefficients of microarray gene expression profiles and potential off-target sensitivity of reporters and probe sets. Figures <figr fid="F1">1A</figr> and <figr fid="F1">1B</figr> explain our procedure of calculating the score for off-target sensitivity. For a probe set with <it>n </it>reporters designed to target gene <it>X</it>, and another gene <it>Y</it>, we computed the alignment scores {<it>a</it><sub>1</sub>,...,<it>a</it><sub><it>n</it></sub>} of <it>X</it>'s reporters to <it>Y</it>'s transcript sequence(s) with <it>Needle </it><abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, a Needleman-Wunsch alignment <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> program. A global alignment algorithm was used to align the full length of the reporter to the target while allowing for gaps and hairpin-forming. Furthermore, we used an exact algorithm to ensure that the optimal alignment was reached. <it>Needle </it>scores an identical match with a positive score of 5 and penalizes a mismatch score with <it>-</it>4. The gap open penalty was set to -50 and gap extension penalty to -0.5. The reporters have a length of 25, so a perfectly matching reporter will have a score of 125. Some interesting scores are shown in Table <tblr tid="T2">2</tblr>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Table with some of the highest Needleman-Wunsch scores. P and M stand for the number of perfect and mismatch scores. Gap openings and extensions in the alignment were penalized with -50 and -0.5, respectively.</p>
               </caption>
               <tblbdy cols="12">
                  <r>
                     <c cspan="3" ca="left">
                        <p>Matches</p>
                     </c>
                     <c cspan="3" ca="left">
                        <p>Matches</p>
                     </c>
                     <c cspan="3" ca="left">
                        <p>Matches</p>
                     </c>
                     <c cspan="3" ca="left">
                        <p>Matches</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="12">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="center">
                        <p>M</p>
                     </c>
                     <c ca="center">
                        <p>Score</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="center">
                        <p>M</p>
                     </c>
                     <c ca="center">
                        <p>Score</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="center">
                        <p>M</p>
                     </c>
                     <c ca="center">
                        <p>Score</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="center">
                        <p>M</p>
                     </c>
                     <c ca="center">
                        <p>Score</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="12">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>125</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>102</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>101</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>81</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>116</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>115</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>111</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>110</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>78</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>107</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>106</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>84</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>76</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>105</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Setup of our study</p>
               </caption>
               <text>
                  <p><b>Setup of our study</b>. Illustration of our approach: A) for a given probe set <it>x</it>, assigned to measure the expression of gene <it>X </it>and the transcript of a given gene <it>Y</it>, two variables <inline-formula><m:math name="1471-2105-8-461-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mi>p</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabdchaWbaaaaa@3108@</m:annotation></m:semantics></m:math></inline-formula> and <it>&#961;</it><sub><it>XY </it></sub>were calculated. B) <inline-formula><m:math name="1471-2105-8-461-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mi>p</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabdchaWbaaaaa@3108@</m:annotation></m:semantics></m:math></inline-formula> is a summary statistic (e. g. <it>p </it>= 75 for the 75% percentile) of the alignment scores of the reporters of <it>X </it>to the transcript of <it>Y</it>. C) <it>&#961;</it><sub><it>XY </it></sub>is the correlation coefficient of the expression signals of genes <it>X </it>and <it>Y</it>. This procedure was repeated for each probe set against every other transcript of the <it>Arabidopsis </it>transcriptome.</p>
               </text>
               <graphic file="1471-2105-8-461-1"/>
            </fig>
            <p>To quantify the potential off-target affinity of a probe set, different percentiles <inline-formula><m:math name="1471-2105-8-461-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mi>p</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabdchaWbaaaaa@3108@</m:annotation></m:semantics></m:math></inline-formula> were calculated of the reporter alignment scores {<it>a</it><sub>1</sub>,...,<it>a</it><sub><it>n</it></sub>}, where <it>p </it>&#8712; [0, 100] is the percentile, <it>X </it>is the intended target gene of the probe set and <it>Y </it>is the potential off-target. For the results presented in this paper, we used <it>p </it>= 75, but qualitatively equivalent results were obtained with other values of <it>p</it>.</p>
            <p>This analysis was carried out for each probe set against every sequence of the transcriptome of <it>Arabidopsis </it>(as found in the TAIR6 sequence database <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>), which results in a total number of 6,926,739,864 alignments for the Affymetrix CDF and 6,008,969,868 for the custom-made CDF (see Table <tblr tid="T1">1</tblr>). Additional File <supplr sid="S1">1</supplr> shows a histogram of the highest alignment scores of the pairs of the two CDFs.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Off-target scores of Custom-made versus Affymetrix CDF. Barplot of the off-target sensitivity scores <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> of all probe set pairs in the Affymetrix (in pink) and the custom-made CDF (in light blue). This figure only shows pairs with an <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> &#8805; 80.</p>
               </text>
               <file name="1471-2105-8-461-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Correlation of microarray expression profiles</p>
            </st>
            <p>Pearson correlation coefficients, <it>&#961;</it><sub><it>XY </it></sub>were calculated for every pair of probe sets <it>X </it>and <it>Y </it>on two different ATH1 microarray datasets. One dataset contains expression data in 14 different plant tissues and the other is a dataset of nine stress conditions and consists of 60 datapoints (see Methods). Both datasets were generated by the AtGenExpress project <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Probe set off-target sensitivity and expression correlation</p>
            </st>
            <p>The relation between expression correlation, <it>&#961;</it><sub><it>XY </it></sub>and off-target sensitivity, <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> is shown in Figure <figr fid="F2">2</figr>. Figure <figr fid="F2">2A</figr> shows the results we obtained with all probe set pairs of the Affymetrix CDF and Figure <figr fid="F2">2C</figr> shows those of the custom-made CDF. These boxplots reveal a positive relation between the two variables: a gene whose expression is measured by reporters that align well to a different gene's transcript tends to have an expression signal that is correlated with that of the other gene.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Probe set off-target sensitivity and expression correlation</p>
               </caption>
               <text>
                  <p><b>Probe set off-target sensitivity and expression correlation</b>. Boxplots depicting the expression correlation coefficients, <it>&#961;</it><sub><it>XY </it></sub>stratified by off-target sensitivity score, <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula>. Figures A and C show the data for all probe set pairs; for Figures B and D gene pairs with a BLAST hit in at least one direction with an E-value smaller than 10<sup>-10 </sup>were omitted. A-B) Results obtained with Affymetrix' CDF. C-D) Results obtained with the custom-made CDF. The widths of the boxes are proportional to the number of observations in each group. <it>&#961;</it><sub><it>XY </it></sub>was calculated on the tissue microarray dataset. The plots show results for all pairs with <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> &#8805; 55.</p>
               </text>
               <graphic file="1471-2105-8-461-2"/>
            </fig>
            <p>Because a positive trend between (reporter) alignment strength and expression correlation is not unexpected for functionally related genes like paralogous genes or genes that share protein domains, we defined a filtering criterion to set aside gene pairs that aligned to each other with BLAST <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> in at least one direction with an E-value smaller than 10<sup>-10 </sup>(see Methods). Figure <figr fid="F2">2B</figr> and Figure <figr fid="F2">2D</figr> show the data for the remaining probe set pairs of the Affymetrix and the custom-made CDF, respectively. For both, we see that for <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> values of up to around 70, the distribution of signal correlations of the probe set pairs is centered around zero. Pairs with higher <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> values are however accompanied by elevated signal correlation, even though for the gene pairs no functional relation is suggested by their sequence comparison. For a probe set with 11 reporters, the <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> summary statistic with <it>p </it>= 75 corresponds to the third strongest off-target reporter. A reporter alignment score value larger than 70 results from 15 or more perfect matches (cf. Table <tblr tid="T2">2</tblr>). Hence, our results imply that three or more well-aligning off-target reporters in a probe set are associated with elevated expression correlation. Figures <figr fid="F2">2A</figr> and <figr fid="F2">2B</figr> also reveal that some probe sets in the Affymetrix CDF contain three or more reporters with perfect sequence identity to an off-target gene. These probe sets are in the rightmost boxes of these figures, corresponding to the score interval (112, 125]. The custom-made CDF does not contain such reporters, since all reporters uniquely map to their target gene's transcript and have at least two mismatches with any other sequence. As a result, the rightmost score interval in Figures <figr fid="F2">2C</figr> and <figr fid="F2">2D</figr> does not contain any probe sets, and the second-highest interval (100, 112] contains only a few. A slight trend however remains. The results shown in Figure <figr fid="F2">2</figr> were calculated on the tissue dataset, similar results were obtained for the stress dataset. Different forces can give rise to the trend we observe here. First of all, genes with partially similar sequences can show biologically relevant expression correlation. Even though many such pairs will have been removed by the above filtering criterion, some may still remain in our dataset. Second, the trend can be due to cross-hybridization, where the cDNA of a gene's transcript binds to both the reporters of its own probe set and those of other genes' probe sets. Both effects, functional relatedness and cross-hybridization, can play at the same time.</p>
         </sec>
         <sec>
            <st>
               <p>Reporter off-target sensitivity and expression correlation</p>
            </st>
            <p>In an attempt to discern cross-hybridization from functional relatedness and to identify incidences of unreliable reporter to transcript hybridization, we designed a method that studies the behavior of off-target sensitivity and signal correlation of different reporters within a probe set. For a probe set <it>X </it>and an off-target gene <it>Y</it>, we calculated the metacorrelation cor(<inline-formula><m:math name="1471-2105-8-461-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#961;</m:mi><m:mrow><m:msub><m:mi>X</m:mi><m:mi>i</m:mi></m:msub><m:mi>Y</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacciGae8xWdi3aaSbaaSqaaiabdIfaynaaBaaameaacqWGPbqAaeqaaSGaemywaKfabeaaaaa@31CD@</m:annotation></m:semantics></m:math></inline-formula>, <it>a</it><sub><it>i</it></sub>) between the alignment scores <it>a</it><sub><it>i </it></sub>of <it>X</it>'s reporters to <it>Y</it>'s transcript sequence and the Pearson correlation coefficients of the reporters' signal patterns to the expression pattern of <it>Y</it>. We reasoned that if cross-hybridization occurs, a positive trend between reporter to off-target correlation and the alignment score <it>a</it><sub><it>i </it></sub>can be detected. Conversely, lack of such a trend may indicate that cross-hybridization is negligible.</p>
            <p>Figure <figr fid="F3">3</figr> depicts this metacorrelation coefficient for all probe set pairs with <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> &#8805; 55 of the Affymetrix CDF stratified by their off-target sensitivity score <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula>. The results for the custom-made CDF are similar, except for the highest score interval (112, 125], which does not occur with the custom-made CDF. The distribution of the metacorrelations of most probe set pairs corresponds to a random distribution centered around zero. However, for those strata with high off-target sensitivity scores the distribution is shifted upwards. This means that within these probe sets some reporters do not correlate with the off-target, while others do, depending on their alignments score.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Reporter off-target sensitivity and expression correlation</p>
               </caption>
               <text>
                  <p><b>Reporter off-target sensitivity and expression correlation</b>. A boxplot showing the metacorrelation coefficients cor(<inline-formula><m:math name="1471-2105-8-461-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#961;</m:mi><m:mrow><m:msub><m:mi>X</m:mi><m:mi>i</m:mi></m:msub><m:mi>Y</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacciGae8xWdi3aaSbaaSqaaiabdIfaynaaBaaameaacqWGPbqAaeqaaSGaemywaKfabeaaaaa@31CD@</m:annotation></m:semantics></m:math></inline-formula>, <it>a</it><sub><it>i</it></sub>) of all probe set pairs of the Affymetrix CDF, stratified by their off-target sensitivity score <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula>. Only pairs with <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> &#8805; 55 are included. The correlation coefficients were calculated on the intensities measured in the tissue dataset.</p>
               </text>
               <graphic file="1471-2105-8-461-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Examples</p>
            </st>
            <p>The metacorrelation method we developed was used to search for examples that illustrate our findings. Three examples are discussed in detail, each of which are presented in a row of Figure <figr fid="F4">4</figr>. The plots in the first column of this figure contain the summarized expression values of a probe set <it>X </it>(in blue) and an off-target gene <it>Y </it>(in orange) in the tissue dataset. The plots in the second column show the background corrected, normalized signal profiles of <it>X</it>'s reporters. The color used to plot such a profile corresponds to the alignment score of that reporter to <it>Y</it>'s transcript and is explained in the legend in Figure <figr fid="F4">4B</figr>. In the third column, for each reporter <inline-formula><m:math name="1471-2105-8-461-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#961;</m:mi><m:mrow><m:msub><m:mi>X</m:mi><m:mi>i</m:mi></m:msub><m:mi>Y</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacciGae8xWdi3aaSbaaSqaaiabdIfaynaaBaaameaacqWGPbqAaeqaaSGaemywaKfabeaaaaa@31CD@</m:annotation></m:semantics></m:math></inline-formula>, the Pearson correlation coefficient calculated between its signal profile and that of <it>Y </it>(orange in A-D-G) is plotted in function of its alignment score <inline-formula><m:math name="1471-2105-8-461-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>a</m:mi><m:mrow><m:msub><m:mi>X</m:mi><m:mi>i</m:mi></m:msub><m:mi>Y</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyyae2aaSbaaSqaaiabdIfaynaaBaaameaacqWGPbqAaeqaaSGaemywaKfabeaaaaa@3151@</m:annotation></m:semantics></m:math></inline-formula>. The colors are identical to those used in the second column.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Examples</p>
               </caption>
               <text>
                  <p><b>Examples</b>. Each of the three rows presents an example of cross-hybridization. Each time, the first of the plots (A-D-G) shows the summarized expression values of probe set <it>X </it>(in blue) and probe set <it>Y </it>(in orange) in 14 different plant tissues. The plots in the second column (B-E-H) present the background corrected, normalized expression patterns of <it>X</it>'s reporters. The signal profile of the reporter is plotted in a color that corresponds to its alignment score to <it>Y </it>and is explained in the legend of plot B. In the third column (C-F-I) for each of <it>X</it>'s reporters, <inline-formula><m:math name="1471-2105-8-461-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#961;</m:mi><m:mrow><m:msub><m:mi>X</m:mi><m:mi>i</m:mi></m:msub><m:mi>Y</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacciGae8xWdi3aaSbaaSqaaiabdIfaynaaBaaameaacqWGPbqAaeqaaSGaemywaKfabeaaaaa@31CD@</m:annotation></m:semantics></m:math></inline-formula>, calculated between its signal profile to that of <it>Y</it>, is plotted against its alignment score, <inline-formula><m:math name="1471-2105-8-461-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>a</m:mi><m:mrow><m:msub><m:mi>X</m:mi><m:mi>i</m:mi></m:msub><m:mi>Y</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyyae2aaSbaaSqaaiabdIfaynaaBaaameaacqWGPbqAaeqaaSGaemywaKfabeaaaaa@3151@</m:annotation></m:semantics></m:math></inline-formula>. Colors correspond to those used in the plot in the second column.</p>
               </text>
               <graphic file="1471-2105-8-461-4"/>
            </fig>
            <p>Probe set <it>X </it>in our first example is <it>245875_at</it>, which was designed to target gene <it>AT1G26240</it>, an extensin-like family protein. As shown in Figure <figr fid="F4">4A</figr>, the expression profile of this gene resembles that of <it>AT3G28550</it>, a protein that belongs to a zinc finger family. The Pearson correlation coefficient of these expression patterns is 0.63 in the tissue and 0.62 in the stress dataset. Figures <figr fid="F4">4B</figr> and <figr fid="F4">4C</figr> show that six of <it>X</it>'s reporters with <inline-formula><m:math name="1471-2105-8-461-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>a</m:mi><m:mrow><m:msub><m:mi>X</m:mi><m:mi>i</m:mi></m:msub><m:mi>Y</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyyae2aaSbaaSqaaiabdIfaynaaBaaameaacqWGPbqAaeqaaSGaemywaKfabeaaaaa@3151@</m:annotation></m:semantics></m:math></inline-formula> &#8805; 80 have a signal profile that is highly correlated with that of <it>AT3G28550</it>. The remaining five have lower off-target sensitivity values and have a signal profile that is correlated less well with it. The <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> value of <it>245875_at </it>to <it>AT3G28550 </it>is 89, the metacorrelation coefficient of the reporters of <it>245875_at </it>is 0.89.</p>
            <p>The second example is of probe set <it>250857_at</it>, which was designed for <it>AT5G04790</it>, and gene <it>AT1G75180</it>. The function of both genes is unknown. Their <it>&#961;</it><sub><it>XY </it></sub>is 0.70 and 0.89 in the tissue (in Figure <figr fid="F4">4D</figr>) and stress dataset respectively. Figures <figr fid="F4">4E</figr> and <figr fid="F4">4F</figr> reveal a positive relationship between off-target sensitivity and signal correlation. Interestingly, four reporters of probe set <it>250857_at </it>have 25 identical matches to <it>AT1G75180 </it>and show an expression profile with <it>&#961; </it>> 0.8. Two other reporters, with lower sensitivity to this off-target (107 and 89) also show high signal correlation to it. The <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> value of probe set <it>250857_at </it>to gene <it>AT1G75180 </it>is 125, the metacorrelation coefficient of the reporters of <it>250857_at </it>is 0.62.</p>
            <p>Figure <figr fid="F4">4G</figr> shows the expression patterns of probe set <it>258508_at </it>and <it>AT3G06650</it>. <it>258508_at </it>was designed to target <it>AT3G06640</it>, a protein kinase family protein. <it>AT3G06650 </it>is a gene that encodes a subunit of the trimeric enzyme ATP citrate lyase. <it>AT3G06650 </it>and <it>AT3G06640 </it>are neighboring genes that align for a stretch of about 50 base pairs with sequence similarity of >90%. The Pearson correlation coefficients of their expression profiles in the tissue and stress dataset are 0.30 and 0.16, respectively. Three reporters of <it>258508_at </it>have an off-target sensitivity to <it>AT3G06650 </it>of 107 (Figure <figr fid="F4">4H</figr> and <figr fid="F4">4I</figr>). Two of them have a <inline-formula><m:math name="1471-2105-8-461-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#961;</m:mi><m:mrow><m:msub><m:mi>X</m:mi><m:mi>i</m:mi></m:msub><m:mi>Y</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacciGae8xWdi3aaSbaaSqaaiabdIfaynaaBaaameaacqWGPbqAaeqaaSGaemywaKfabeaaaaa@31CD@</m:annotation></m:semantics></m:math></inline-formula> &#8805; 0.6, but the mean intensity of all three is higher than that of the other reporters. The <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> value of this gene pair is 102.5, the metacorrelation coefficient of the reporters of probe set <it>258508_at </it>is 0.55. The examples presented here show that reporters that align best to the off-target <it>Y </it>have the most correlated signal with it and that the number of well aligning reporters plays an important role in the effect of cross-hybridization. For example, the <it>X </it>probe set in our second example has several reporters with highly correlated signal profiles to the target: the four reporters that have perfect sequence similarity with it, as well as two others with alignment scores of 107 and 89. The Pearson correlation coefficient of the summarized expression pattern of this probe set pair is high in both expression datasets (0.70 and 0.89). In the first example five reporters show relatively high signal correlation to the off-target gene. The correlation of the summarized probe set values are 0.63 and 0.62. Different to these two, the probe set pair in our third example has a comparable <inline-formula><m:math name="1471-2105-8-461-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Q</m:mi><m:mrow><m:mi>X</m:mi><m:mi>Y</m:mi></m:mrow><m:mrow><m:mn>75</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyuae1aa0baaSqaaiabdIfayjabdMfazbqaaiabiEda3iabiwda1aaaaaa@3193@</m:annotation></m:semantics></m:math></inline-formula> value but only two reporters show high signal correlation to gene <it>Y</it>. The correlation coefficient of this pair's expression pattern is much lower (0.30 and 0.16).</p>
         </sec>
         <sec>
            <st>
               <p>Effect of individual reporters on probe set summaries</p>
            </st>
            <p>It may come as a surprise that a few reporters out of 11 can affect the summarized expression profile of a probe set to the extent that their inclusion coerces it to resemble that of another gene. To better understand how this can happen, consider the following simulated data example. Assume that a gene <it>A </it>has a sinusoidal expression pattern over the course of 14 time points in an experiment. Figure <figr fid="F5">5A</figr> shows the signal profiles of the 11 reporters of this gene's probe set, with data simulated using an established error model for microarray data <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. The 11 reporters of a probe set <it>B </it>in Figure <figr fid="F5">5B</figr> show random signals without any underlying trend. Nine of the reporters of probe set <it>C </it>have identical signals as nine reporters of probe set <it>B</it>, while the remaining two reporters cross-hybridize with the transcript of gene <it>A </it>(Figure <figr fid="F5">5C</figr>). The summarized expression values obtained by applying the median polish method <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> are shown in Figure <figr fid="F5">5D</figr>. Interestingly, the Pearson correlation between probe set <it>A </it>and <it>B </it>is -0.07, while the correlation between <it>A </it>and <it>C </it>is 0.73. What is the explanation for this? The RMA method <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp> exploits the fact that sensitivity to target abundance is strongly reporter-dependent and repeatable across arrays. RMA fits a model that explains the measured intensities as the product of a reporter effect and the target abundance. It estimates the model parameters, and hence the target abundance, with an outlier resistant method called <it>median polish</it>. These estimates can, however, be susceptible to subtle changes in the data, especially when the data from the reporters disagree, like here in our simulation <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Eeffect of individual reporters on probe set summaries</p>
               </caption>
               <text>
                  <p><b>Effect of individual reporters on probe set summaries</b>. A) The expression profiles of the reporters of a probe set <it>A </it>that binds the transcript of a target gene with a sinusoidal expression pattern. Each reporter is drawn in a different color. B) The expression profiles of eleven reporters of a probe set <it>B </it>that show random signals without any underlying trend. Each reporter is drawn in a different color. C) Nine of the reporters of a probe set <it>C </it>have identical expression values as nine of those of probe set <it>B</it>. Two other reporters of this probe set cross-hybridize with the transcript of gene <it>A </it>and thus have a expression pattern that is highly similar to the reporters of probe set <it>A</it>. The expression values of these two reporters are colored red. The other nine have the same colors as the corresponding reporters of probe set <it>B </it>in Figure 5B. D) The expression patterns of these three probe sets after summarization with <it>median polish </it>[15,39,40].</p>
               </text>
               <graphic file="1471-2105-8-461-5"/>
            </fig>
            <p>We also explored other summarization methods. With dChip <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B42">42</abbr></abbrgrp> for example, the effect of the two contaminating reporters is even stronger: the correlation between <it>A </it>and <it>B </it>is 0.30, while it is 0.95 between <it>A </it>and <it>C</it>. The statistical model that dChip uses is similar to the one of RMA, however, there are differences in the variance assumptions and the robust estimation algorithm. Affymetrix' MAS 5 software uses an algorithm called one-step Tukey's Biweight <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. This algorithm appears to be less influenced by the two off-target reporters: the correlation between probe set <it>A </it>and <it>B </it>is -0.22, while it is -0.19 between <it>A </it>and <it>C</it>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Microarrays are an important source of functional data. Many inferential tools are based on the presence or absence of correlation in the expression profiles of genes, for example when inferring co-expression networks <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>, in the study of the evolution of gene duplicates or families <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp> and in the inference of gene regulatory networks <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> or Bayesian networks <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>Different research groups have pinpointed the critical concern of cross-hybridization for microarray analysis <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. Dai et al. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and Zhang et al. <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> highlighted problematic reporter annotation and underscored the importance of up-to-date reporter mappings. Zhang et al. <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> showed that about 10% of the reporters on widely-used arrays are non-specific in that they target multiple transcripts and approximately another 10% miss their target. Okoniewski and Miller <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> constructed a network of different levels of interactions between reporters and transcripts, as some reporters are able to bind more than one transcript, and some transcripts can bind more than one reporter. In this network they were able to identify several hub probe sets that show a higher absolute expression signal of reporters targeted by multiple transcripts than those that target a unique transcript because they combine the signals of many available transcripts. Moreover, their analysis revealed that probe sets whose reporters have multiple matches also show higher expression correlation with each other. Wu et al. <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> described a linear relationship between spiked-in concentrations and the measured signals of reporters that were not designed to target these particular transcripts.</p>
         <p>We described a positive relationship between the correlation of microarray gene expression profiles and the off-target sensitivity of microarray probe sets, as estimated by sequence alignment of microarray reporters to off-target genes. Probe sets that contain reporters that align well to off-target genes show correlated intensity values to these other genes (Figure <figr fid="F2">2A</figr> and <figr fid="F2">2C</figr>).</p>
         <p>In many cases, this positive relationship is likely not due to functional relatedness of the genes, but to a cross-hybridization artifact. Three lines of argument support this statement: first, the positive trend is present even between gene pairs that do not share longer stretches of sequence similarity and where the reporter to off-target alignment is only based on short near-matches (Figures <figr fid="F2">2A</figr> versus <figr fid="F2">2B</figr> and <figr fid="F2">2C</figr> versus <figr fid="F2">2D</figr>). Second, this effect can be observed within probe sets (Figures <figr fid="F3">3</figr> and <figr fid="F4">4</figr>). Third, omitting reporters liable to cross-hybridization results in decreased artifactual correlation coefficients between probe sets (Figures <figr fid="F2">2B</figr> versus <figr fid="F2">2D</figr>).</p>
         <p>Different summarization methods perform differently when dealing with cross-hybridizing reporters: methods that do majority weighting of reporters, such as RMA <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, can become unstable when there are two disagreeing groups of reporters that are close to balancing each other and when small changes can lead to a flip of the majority from one side to the other. Examples for this are shown in Figures <figr fid="F4">4</figr> and by simulation. Simpler methods that are based on averages or trimmed averages, such as MAS <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, appear to be less affected by this problem, however, such methods suffer from the serious disadvantage of an overall smaller sensitivity <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B44">44</abbr></abbrgrp>. The latter thus cannot be regarded as a solution for the cross-hybridization problem.</p>
         <p>The standard probe set definition, as made available by the manufacturer of the array, Affymetrix, was compared to a custom-made one. In Affymetrix' definition, a probe set is a fixed set of reporters that is annotated to those genes to which a particular number of its reporters align perfectly. Probe sets can contain up to a certain number of reporters with perfect sequence identity to an off-target gene. In the custom-made CDF, a probe set is a set of reporters that align perfectly and uniquely to one gene's transcript. The use of more stringent probe set mapping and annotation results in decreased artifactual correlation coefficients. This will improve the quality of downstream analysis results. Our probe set definition is highly similar to the one used by Dai et al. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Our results support and provide further evidence for the beneficial effect of probe set reorganization they and others <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> reported.</p>
         <p>In conclusion, off-target sensitivity is a factor that should be taken into account when doing correlation analysis from microarray data. High-quality assignment of reporters to target genes is essential for inferring genuine biological expression correlations. The correlation coefficient calculated between alignment strength and expression correlation coefficients, the metacorrelation coefficient, is a novel method to identify instances of unreliable reporter behavior.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>All analyses, except for the alignments, were done with development versions of R 2.6.0 <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> and Bioconductor 2.1 <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> packages. An R package, <it>XhybCasneuf</it>, containing a reproducible compendium of the datasets and scripts used for this study, is made available and is distributed through Bioconductor <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
         <sec>
            <st>
               <p>Two Chip Description Files</p>
            </st>
            <p>This analysis was carried out on the GeneChip <it>Arabidopsis </it>ATH1 genome array of Affymetrix <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. For Affymetrix' annotation of the probe sets, a file was downloaded from the Affymetrix website <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> on August 12th, 2007. Affymetrix requires a 100% match of reporter's sequence to a consensus gene sequence and assigns a probe set to a particular locus if nine or more of the reporters in the probe set match it. We filtered out probe sets which Affymetrix assigned to multiple transcripts in addition to those that are assigned to a gene model that is not present in the TAIR6 <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> sequence database.</p>
            <p>For the custom-made chip description file, <it>Exonerate </it><abbrgrp><abbr bid="B50">50</abbr></abbrgrp> was used to map reporters onto the genome and transcripts. The target sequences were the predicted transcripts from the TAIR6 release, including mitochondrial and chloroplast-encoded genes. These sequences include UTRs but not introns. The fasta file was downloaded from TAIR <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> on August 10th, 2007. We selected reporters that have perfect sequence identity with a single target gene's transcript. Reporters that hybridize with one mismatch to another gene's transcript are filtered out. We also filtered out reverse complementary matching reporters, and reporters that hybridize multiple times on the genomic sequence. The latter was done in order to remove reporters that match unannotated sequences. We included probe sets in this study if they consisted of at least eight reporters which resulted in 19,937 unique probe sets. The custom-made CDF is also available and distributed through Bioconductor (<abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, <it>tinesath1cdf</it>).</p>
         </sec>
         <sec>
            <st>
               <p>Reporter-to-transcript alignments</p>
            </st>
            <p>Reporter-to-transcript alignment scores were obtained with <it>Needle</it>, a global Needleman-Wunsch <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> alignment tool <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. The analysis was carried out on the TAIR6 release of the <it>A</it>rabidopsis genome. The target sequences were the predicted transcripts, including mitochondrial and chloroplast-encoded genes and include UTRs but not introns. These cDNA sequences were downloaded from TAIR <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> on November 9, 2006. We ran the alignment analysis twice, with a gap penalty of -10 and -50. The same conclusions were reached but our findings were stronger when this penalty was set to -50. This means that higher correlation coefficients can be observed for reporter-to-transcript alignments without gaps.</p>
         </sec>
         <sec>
            <st>
               <p>Microarray data</p>
            </st>
            <p>The microarray data we used were generated within the framework of the AtGenExpress project <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. The first set is a subset of the development dataset <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> and contains the expression data of genes in 14 plant tissues. The second contains expression data of plants under nine different abiotic stress conditions <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, measured over six different time points. Both datasets were normalized using RMA <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp>, summarized using a median polish algorithm and averaged over replicates.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of gene pairs with long stretches of sequence similarity</p>
            </st>
            <p>To identify possibly functionally related gene pairs, we carried out a within-genome, all-against-all BLASTP <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Gene pairs with an E-value smaller than 10<sup>-10 </sup>in at least one direction were set aside during different parts of this study.</p>
         </sec>
         <sec>
            <st>
               <p>Metacorrelation</p>
            </st>
            <p>The metacorrelation was obtained as follows: for a probe set pair <it>X </it>and <it>Y</it>, the Pearson correlation coefficient was calculated between the alignment scores of <it>X</it>'s reporters to the transcript sequence of <it>Y </it>and the (Pearson) signal correlation coefficient of these reporters to the expression pattern of <it>Y</it>. We used the non-parametric measure for this metacorrelation because of the limited number of datapoints for each observation.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>TC designed the study, analyzed data, and wrote the paper. YVdP wrote the paper. WH designed the study, supervised the project, and wrote the paper. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported by a grant from the Fund for Scientific Research, Flanders (3G031805) and by the European Commission through a Marie Curie Host Fellowship program (MEST-CT-2004-513973). WH acknowledges support from the European Commission through the Integrated Project Heart Repair (LSHM-CT-2005-018630). Grateful acknowledgements are made to Richard Bourgon, J&#246;rn T&#246;dling and Stefanie De Bodt for fruitful discussions.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Qualitative network models and genome-wide expression data define carbon/nitrogen-responsive molecular machines in Arabidopsis</p>
            </title>
            <aug>
               <au>
                  <snm>Gutierrez</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Lejay</snm>
                  <fnm>LV</fnm>
               </au>
               <au>
                  <snm>Dean</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Shasha</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Coruzzi</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>R7</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1839130</pubid>
                  <pubid idtype="pmpid" link="fulltext">17217541</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-1-r7</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana</p>
            </title>
            <aug>
               <au>
                  <snm>Wille</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zimmermann</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Vranova</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Furholz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Laule</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Bleuler</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hennig</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Prelic</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>von Rohr</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Thiele</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zitzler</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gruissem</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Buhlmann</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>11</issue>
            <fpage>R92</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545783</pubid>
                  <pubid idtype="pmpid" link="fulltext">15535868</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-11-r92</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Transcriptional coordination of the metabolic network in Arabidopsis</p>
            </title>
            <aug>
               <au>
                  <snm>Wei</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Persson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mehta</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Srinivasasainagendra</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Page</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Somerville</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Loraine</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Plant Physiology</source>
            <pubdate>2006</pubdate>
            <volume>142</volume>
            <issue>2</issue>
            <fpage>762</fpage>
            <lpage>774</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1586052</pubid>
                  <pubid idtype="pmpid" link="fulltext">16920875</pubid>
                  <pubid idtype="doi">10.1104/pp.106.080358</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>A gene expression map of the Arabidopsis root</p>
            </title>
            <aug>
               <au>
                  <snm>Birnbaum</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shasha</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>JY</fnm>
               </au>
               <au>
                  <snm>Jung</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Lambert</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Galbraith</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Benfey</snm>
                  <fnm>PN</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <issue>5652</issue>
            <fpage>1956</fpage>
            <lpage>1960</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1090022</pubid>
                  <pubid idtype="pmpid" link="fulltext">14671301</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Coexpression of Neighboring Genes in the Genome of Arabidopsis thaliana</p>
            </title>
            <aug>
               <au>
                  <snm>Williams</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Bowles</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>6</issue>
            <fpage>1060</fpage>
            <lpage>1067</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">419784</pubid>
                  <pubid idtype="pmpid" link="fulltext">15173112</pubid>
                  <pubid idtype="doi">10.1101/gr.2131104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Clustering of genes into regulons using integrated modeling-COGRIM</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Stoeckert</snm>
                  <fnm>CJJ</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>R4</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1839128</pubid>
                  <pubid idtype="pmpid" link="fulltext">17204163</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-1-r4</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Using Bayesian networks to analyze expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Friedman</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Linial</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nachman</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pe&#233;r</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <issue>3&#8211;4</issue>
            <fpage>601</fpage>
            <lpage>620</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652700750050961</pubid>
                  <pubid idtype="pmpid" link="fulltext">11108481</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Reverse engineering of genetic networks with Bayesian networks</p>
            </title>
            <aug>
               <au>
                  <snm>Husmeier</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Biochem Soc Trans</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>1516</fpage>
            <lpage>1518</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14641102</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks</p>
            </title>
            <aug>
               <au>
                  <snm>Werhli</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Grzegorczyk</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Husmeier</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>2523</fpage>
            <lpage>2531</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl391</pubid>
                  <pubid idtype="pmpid" link="fulltext">16844710</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>An empirical Bayes approach to inferring large-scale gene association networks</p>
            </title>
            <aug>
               <au>
                  <snm>Schafer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Strimmer</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>754</fpage>
            <lpage>764</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti062</pubid>
                  <pubid idtype="pmpid" link="fulltext">15479708</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Nonrandom divergence of gene expression following gene and genome duplications in the flowering plant Arabidopsis thaliana</p>
            </title>
            <aug>
               <au>
                  <snm>Casneuf</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>De Bodt</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Raes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Maere</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Van de Peer</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>2</issue>
            <fpage>R13</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1431724</pubid>
                  <pubid idtype="pmpid" link="fulltext">16507168</pubid>
                  <pubid idtype="doi">10.1186/gb-2006-7-2-r13</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Blanc</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>KH</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>2004</pubdate>
            <volume>16</volume>
            <issue>7</issue>
            <fpage>1679</fpage>
            <lpage>1691</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">514153</pubid>
                  <pubid idtype="pmpid" link="fulltext">15208398</pubid>
                  <pubid idtype="doi">10.1105/tpc.021410</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Minimum information about a microarray experiment (MIAME)-toward standards for microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Brazma</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hingamp</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Stoeckert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Aach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ansorge</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Causton</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Gaasterland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Glenisson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Holstege</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>IF</fnm>
               </au>
               <au>
                  <snm>Markowitz</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Matese</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Parkinson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sarkans</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Schulze-Kremer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Vilo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>4</issue>
            <fpage>365</fpage>
            <lpage>371</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1201-365</pubid>
                  <pubid idtype="pmpid" link="fulltext">11726920</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <source>GeneChip&#174; Expression Analysis Data Analysis Fundamentals</source>
            <pubdate>2006</pubdate>
            <url>http://www.affymetrix.com/support/downloads/manuals/data_analysis_fundamentals_manual.pdf</url>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Exploration, normalization, and summaries of high density oligonucleotide array probe level data</p>
            </title>
            <aug>
               <au>
                  <snm>Irizarry</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Hobbs</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Collin</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Beazer-Barclay</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Antonellis</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Scherf</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>Biostatistics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>2</issue>
            <fpage>249</fpage>
            <lpage>264</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/biostatistics/4.2.249</pubid>
                  <pubid idtype="pmpid" link="fulltext">12925520</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences of the United Sates of America</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>31</fpage>
            <lpage>36</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.011404098</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Transcript Assignment for NetAffx Annotations</p>
            </title>
            <url>http://www.affymetrix.com/support/technical/manual/alignments_psl_manual.affx</url>
         </bibl>
         <bibl id="B18">
            <title>
               <p>ProbeLynx: a tool for updating the association of microarray probes to genes</p>
            </title>
            <aug>
               <au>
                  <snm>Roche</snm>
                  <fnm>FM</fnm>
               </au>
               <au>
                  <snm>Hokamp</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Acab</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Babiuk</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Hancock</snm>
                  <fnm>REW</fnm>
               </au>
               <au>
                  <snm>Brinkman</snm>
                  <fnm>FSL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <issue>32 Web Server</issue>
            <fpage>471</fpage>
            <lpage>474</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/nar/gkh452</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>A novel design of whole-genome microarray probes for Saccharomyces cerevisiae which minimizes cross-hybridization</p>
            </title>
            <aug>
               <au>
                  <snm>Talla</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Tekaia</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Brino</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Dujon</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>38</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">239980</pubid>
                  <pubid idtype="pmpid" link="fulltext">14499002</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-4-38</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Detecting false expression signals in high-density oligonucleotide arrays by an in silico approach</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Finney</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Clifford</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Derr</snm>
                  <fnm>LK</fnm>
               </au>
               <au>
                  <snm>Buetow</snm>
                  <fnm>KH</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2005</pubdate>
            <volume>85</volume>
            <issue>3</issue>
            <fpage>297</fpage>
            <lpage>308</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ygeno.2004.11.004</pubid>
                  <pubid idtype="pmpid" link="fulltext">15718097</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data</p>
            </title>
            <aug>
               <au>
                  <snm>Dai</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Boyd</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Kostov</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Athey</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Bunney</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
               <au>
                  <snm>Akil</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Watson</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Meng</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>20</issue>
            <fpage>e175</fpage>
            <lpage>e175</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1283542</pubid>
                  <pubid idtype="pmpid" link="fulltext">16284200</pubid>
                  <pubid idtype="doi">10.1093/nar/gni179</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Probe selection for high-density oligonucleotide arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Mei</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hubbell</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bekiranov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mittmann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Christians</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Fang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Ryder</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kaplan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kulp</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Webster</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <issue>20</issue>
            <fpage>11237</fpage>
            <lpage>11242</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">208741</pubid>
                  <pubid idtype="pmpid" link="fulltext">14500916</pubid>
                  <pubid idtype="doi">10.1073/pnas.1534744100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Sequence dependence of cross-hybridization on short oligo microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Carta</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>9</issue>
            <fpage>e84</fpage>
            <note>[Evaluation Studies]</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1140085</pubid>
                  <pubid idtype="pmpid" link="fulltext">15914663</pubid>
                  <pubid idtype="doi">10.1093/nar/gni082</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>GenXHC: a probabilistic generative model for cross-hybridization compensation in high-density genome-wide microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>QD</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Frey</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>Suppl 1</issue>
            <fpage>222</fpage>
            <lpage>231</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/bioinformatics/bti1045</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>A multivariate prediction model for microarray cross-hybridization</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>YA</fnm>
               </au>
               <au>
                  <snm>Chou</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Slate</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Peck</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Voit</snm>
                  <fnm>EO</fnm>
               </au>
               <au>
                  <snm>Almeida</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>101</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1409802</pubid>
                  <pubid idtype="pmpid" link="fulltext">16509965</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>XHM: a system for detection of potential cross hybridizations in DNA microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Flikka</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yadetie</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Laegreid</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Jonassen</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>117</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">517492</pubid>
                  <pubid idtype="pmpid" link="fulltext">15333145</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-117</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Replacing cRNA targets with cDNA reduces microarray cross-hybridization</p>
            </title>
            <aug>
               <au>
                  <snm>Eklund</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>LR</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>RV</fnm>
               </au>
               <au>
                  <snm>deFeo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kopf-Sill</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Szallasi</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2006</pubdate>
            <volume>24</volume>
            <issue>9</issue>
            <fpage>1071</fpage>
            <lpage>1073</lpage>
            <note>[Letter]</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt0906-1071</pubid>
                  <pubid idtype="pmpid" link="fulltext">16964210</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>A direct glimpse of cross-hybridization: background-passified microarrays that allow mass-spectrometric detection of captured oligonucleotides</p>
            </title>
            <aug>
               <au>
                  <snm>Plutowski</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Richert</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Angew Chem Int Ed Engl</source>
            <pubdate>2005</pubdate>
            <volume>44</volume>
            <issue>4</issue>
            <fpage>621</fpage>
            <lpage>625</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/anie.200461212</pubid>
                  <pubid idtype="pmpid" link="fulltext">15597393</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Cross-hybridization on PCR-spotted microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Wren</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Kulkarni</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Joslin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Butow</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Garner</snm>
                  <fnm>HR</fnm>
               </au>
            </aug>
            <source>IEEE Eng Med Biol Mag</source>
            <pubdate>2002</pubdate>
            <volume>21</volume>
            <issue>2</issue>
            <fpage>71</fpage>
            <lpage>75</lpage>
            <note>[Comparative Study]</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1109/MEMB.2002.1046118</pubid>
                  <pubid idtype="pmpid">12012609</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations</p>
            </title>
            <aug>
               <au>
                  <snm>Okoniewski</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>276</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1513401</pubid>
                  <pubid idtype="pmpid" link="fulltext">16749918</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-276</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Thermodynamics of competitive surface adsorption on DNA-microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Binder</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Journal of Physics: Condensed Matter</source>
            <pubdate>2006</pubdate>
            <volume>18</volume>
            <fpage>491</fpage>
            <lpage>523</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1088/0953-8984/18/18/S02</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Approaches for extracting practical information from gene co-expression networks in plant biology</p>
            </title>
            <aug>
               <au>
                  <snm>Aoki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ogata</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Shibata</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Plant Cell Physiol</source>
            <pubdate>2007</pubdate>
            <volume>48</volume>
            <issue>3</issue>
            <fpage>381</fpage>
            <lpage>390</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/pcp/pcm013</pubid>
                  <pubid idtype="pmpid" link="fulltext">17251202</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The European Molecular Biology Open Source Suite</p>
            </title>
            <aug>
               <au>
                  <snm>Rice</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Longden</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bleasby</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Trends in Genetics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <issue>6</issue>
            <fpage>276</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02024-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">10827456</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>A general method applicable to the search for similarities in the amino acid sequences of two proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Needleman</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Wunsch</snm>
                  <fnm>CD</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>1970</pubdate>
            <volume>48</volume>
            <fpage>443</fpage>
            <lpage>453</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(70)90057-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">5420325</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The Arabidopsis Information Resource</p>
            </title>
            <url>http://www.arabidopsis.org</url>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A gene expression map of Arabidopsis thaliana development</p>
            </title>
            <aug>
               <au>
                  <snm>Schmid</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Davison</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Henz</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Pape</snm>
                  <fnm>UJ</fnm>
               </au>
               <au>
                  <snm>Demar</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Scholkopf</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Weigel</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lohmann</snm>
                  <fnm>JU</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2005</pubdate>
            <volume>37</volume>
            <issue>5</issue>
            <fpage>501</fpage>
            <lpage>506</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1543</pubid>
                  <pubid idtype="pmpid" link="fulltext">15806101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2231712</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A Model for Measurement Error for Gene Expression Arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Rocke</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Blythe</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>2001</pubdate>
            <volume>8</volume>
            <issue>6</issue>
            <fpage>557</fpage>
            <lpage>569</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652701753307485</pubid>
                  <pubid idtype="pmpid" link="fulltext">11747612</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Summaries of Affymetrix GeneChip probe level data</p>
            </title>
            <aug>
               <au>
                  <snm>Irizarry</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Bolstad</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Collin</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Cope</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Hobbs</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>4</issue>
            <fpage>e15</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">150247</pubid>
                  <pubid idtype="pmpid" link="fulltext">12582260</pubid>
                  <pubid idtype="doi">10.1093/nar/gng015</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>A comparison of normalization methods for high density oligonucleotide array data based on variance and bias</p>
            </title>
            <aug>
               <au>
                  <snm>Bolstad</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Irizarry</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Astrand</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>2</issue>
            <fpage>185</fpage>
            <lpage>193</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/19.2.185</pubid>
                  <pubid idtype="pmpid" link="fulltext">12538238</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <source>Guide to Probe Logarithmic Intensity Error (PLIER) estimation</source>
            <pubdate>2005</pubdate>
            <url>http://www.affymetrix.com/support/technical/technotes/plier_technote.pdf</url>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hung Wong</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <issue>8</issue>
            <fpage>RESEARCH0032</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55329</pubid>
                  <pubid idtype="pmpid" link="fulltext">11532216</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <source>Statistical Algorithms Description Document</source>
            <pubdate>2002</pubdate>
            <url>http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf</url>
         </bibl>
         <bibl id="B44">
            <title>
               <p>A benchmark for Affymetrix GeneChip expression measures</p>
            </title>
            <aug>
               <au>
                  <snm>Cope</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Irizarry</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Jaffee</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>3</issue>
            <fpage>323</fpage>
            <lpage>331</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg410</pubid>
                  <pubid idtype="pmpid" link="fulltext">14960458</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <aug>
               <au>
                  <cnm>R Development Core Team</cnm>
               </au>
            </aug>
            <source>R: A Language and Environment for Statistical Computing</source>
            <publisher>R Foundation for Statistical Computing, Vienna, Austria</publisher>
            <pubdate>2006</pubdate>
            <url>http://www.R-project.org</url>
            <note>[ISBN 3-900051-07-0]</note>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Bioconductor: open software development for computational biology and bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Gentleman</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Carey</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Bates</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Bolstad</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Dettling</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dudoit</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gautier</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ge</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Gentry</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hornik</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hothorn</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Huber</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Iacus</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Irizarry</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Leisch</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Maechler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rossini</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Sawitzki</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Smyth</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tierney</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>JYH</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>10</issue>
            <fpage>R80</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545600</pubid>
                  <pubid idtype="pmpid" link="fulltext">15461798</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-10-r80</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Bioconductor</p>
            </title>
            <url>http://www.bioconductor.org</url>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Affymetrix ATH1 GeneChip</p>
            </title>
            <url>http://www.affymetrix.com/products/arrays/specific/arab.affx</url>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Affymetrix ATH1 GeneChip annotation file</p>
            </title>
            <url>https://www.affymetrix.com/Auth/analysis/downloads/na21/ivt/ATH1-121501.na21.annot.csv.zip</url>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Automated generation of heuristics for biological sequence comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Slater</snm>
                  <fnm>GSC</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>31</fpage>
            <url>http://www.ebi.ac.uk/~guy/exonerate/</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">553969</pubid>
                  <pubid idtype="pmpid" link="fulltext">15713233</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-31</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>TAIR6 sequences repository</p>
            </title>
            <url>ftp://ftp.arabidopsis.org/home/tair/home/tair/Sequences/blast_datasets/</url>
         </bibl>
         <bibl id="B52">
            <title>
               <p>TAIR6 cDNA sequence file</p>
            </title>
            <url>ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR6_genome_release/TAIR6_cdna_20060907</url>
         </bibl>
         <bibl id="B53">
            <title>
               <p>AtGenExpress development dataset</p>
            </title>
            <url>http://www.weigelworld.org/resources/microarray/AtGenExpress/AtGE_dev_samples.pdf</url>
         </bibl>
         <bibl id="B54">
            <title>
               <p>AtGenExpress tissue dataset</p>
            </title>
            <url>http://www.weigelworld.org/resources/microarray/AtGenExpress/Sample%20list%20%28Abiotic%20stress%29</url>
         </bibl>
      </refgrp>
   </bm>
</art>
