<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-6-250</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>MASQOT: a method for cDNA microarray spot quality control</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Bylesj&#246;</snm>
               <fnm>Max</fnm>
               <insr iid="I1"/>
               <email>max.bylesjo@chem.umu.se</email>
            </au>
            <au id="A2">
               <snm>Eriksson</snm>
               <fnm>Daniel</fnm>
               <insr iid="I2"/>
               <email>daniel.eriksson@genfys.slu.se</email>
            </au>
            <au id="A3">
               <snm>Sj&#246;din</snm>
               <fnm>Andreas</fnm>
               <insr iid="I3"/>
               <email>andreas.sjodin@plantphys.umu.se</email>
            </au>
            <au id="A4">
               <snm>Sj&#246;str&#246;m</snm>
               <fnm>Michael</fnm>
               <insr iid="I1"/>
               <email>michael.sjostrom@chem.umu.se</email>
            </au>
            <au id="A5">
               <snm>Jansson</snm>
               <fnm>Stefan</fnm>
               <insr iid="I3"/>
               <email>stefan.jansson@plantphys.umu.se</email>
            </au>
            <au id="A6">
               <snm>Antti</snm>
               <fnm>Henrik</fnm>
               <insr iid="I1"/>
               <email>henrik.antti@chem.umu.se</email>
            </au>
            <au id="A7">
               <snm>Trygg</snm>
               <fnm>Johan</fnm>
               <insr iid="I1"/>
               <email>johan.trygg@chem.umu.se</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Research group for Chemometrics, Department of Chemistry, Ume&#229; University, SE-901 87 Ume&#229;, Sweden</p>
            </ins>
            <ins id="I2">
               <p>Ume&#229; Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-901 83 Ume&#229;, Sweden</p>
            </ins>
            <ins id="I3">
               <p>Ume&#229; Plant Science Centre, Department of Plant Physiology, Ume&#229; University, SE-901 87 Ume&#229;, Sweden</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2005</pubdate>
         <volume>6</volume>
         <issue>1</issue>
         <fpage>250</fpage>
         <url>http://www.biomedcentral.com/1471-2105/6/250</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16223442</pubid>
               <pubid idtype="doi">10.1186/1471-2105-6-250</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>30</day>
               <month>6</month>
               <year>2005</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>13</day>
               <month>10</month>
               <year>2005</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>13</day>
               <month>10</month>
               <year>2005</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2005</year>
         <collab>Bylesj&#246; et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>cDNA microarray technology has emerged as a major player in the parallel detection of biomolecules, but still suffers from fundamental technical problems. Identifying and removing unreliable data is crucial to prevent the risk of receiving illusive analysis results. Visual assessment of spot quality is still a common procedure, despite the time-consuming work of manually inspecting spots in the range of hundreds of thousands or more.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>A novel methodology for cDNA microarray spot quality control is outlined. Multivariate discriminant analysis was used to assess spot quality based on existing and novel descriptors. The presented methodology displays high reproducibility and was found superior in identifying unreliable data compared to other evaluated methodologies.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The proposed methodology for cDNA microarray spot quality control generates non-discrete values of spot quality which can be utilized as weights in subsequent analysis procedures as well as to discard spots of undesired quality using the suggested threshold values. The MASQOT approach provides a consistent assessment of spot quality and can be considered an alternative to the labor-intensive manual quality assessment process.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>At present, the DNA microarray technology allows simultaneous monitoring of the expression levels of thousands of genes. The technique produces large and complex datasets that are relatively easy to generate but non-trivial to analyze and extract information from. So far, much of the data mining efforts have been focused on the statistical analysis (see, for instance <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>) and less on acquiring high quality data from the image analysis. Image analysis is the process of extracting information from the scanned microarray images, which is an important step due to the sequential nature of the microarray analysis <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Consequently, problems in the initial steps have large impact on the interpretation of the final results from the experiment.</p>
         <p>A number of technical issues during microarray preparation potentially affect the spot quality.</p>
         <p>&#8226; <it>Low signal intensity </it>is perhaps the most generally acknowledged property that affects spot quality due to the subsequent problems in distinguishing signal from noise for spots with weak signals. Weak signal intensities should result from physiologically low expression levels but might also be related to surface properties of the slide, signal bleaching, scanner problems or incomplete or irregular hybridization.</p>
         <p>&#8226; <it>Intensity distribution issues </it>appear as regions of pixels containing signals that clearly deviate from the average signal, typically as distinct sub-areas within the foreground region. As the signal of any given spot on a microarray slide is expected to be uniform over the entire spot foreground area, intensity distribution issues are usually a consequence of non-specific binding or irregular distribution of the printed DNA on the slide.</p>
         <p>&#8226; <it>Morphological issues </it>refer to unexpected shape-related variations of the spot foreground region. This includes very small or very large spot sizes, low spot circularities or spot mixing. Size aberrations might be a consequence of precipitates or impurities in the printing solution or needle clogging during printing. Spots are expected to be roughly circular in shape, but manufacturing issues might result in deviation from the circularity norm. Furthermore, imperfections on the slide or washing problems might cause the dye from several spots to mix, referred to as <it>bleeding</it>, making the separation of these signals difficult or even impossible.</p>
         <p>&#8226; <it>Background issues </it>appear as intensity fluctuations in the local background region immediately surrounding the foreground region. An increase in local background intensity or variance compared to the global slide background typically result from dye contaminants due to non-specific binding or incomplete washing.</p>
         <p>Microarray spot quality control is essentially the identification and removal of spots with properties that cause the subsequent interpretation of the signal from these spots to be unreliable or misleading. Analogously, it is the recognition of characteristics that enable dependable interpretations and conclusions from the generated data. The field of microarray spot quality control has been largely neglected in the past but has recently become an area of interest. Existing documented semi-automatic methodologies include Bayesian networks <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> as well as linear combinations of quality parameters allegedly related to the quality of the spot <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. Interestingly, manual evaluation is still a common quality assessment procedure, despite the labor-intensive nature of visually inspecting spots in the order of hundreds of thousands. Increasing availability and usage of microarrays in the field of transcriptomics has caused experiments to involve an escalating number of slides, which in turn has highlighted the primary bottleneck of manual quality assessment.</p>
         <p>Partial least squares (PLS) is a generalized regression method which aims to maximize the covariance between the <b>X </b>(descriptor) and <b>Y </b>(response) matrices. PLS can handle large data sets of multi-collinear and noisy data with moderate amounts of missing data in both <b>X </b>and <b>Y</b>. PLS-DA can be seen as a special case of PLS where the response matrix <b>Y </b>is categorical (numerically represented as 0 or 1) and determines class belonging of observations. PLS-DA has been widely applied in microarray analysis (see, for instance <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>) as well as other areas of life science (see, for instance <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>). For a more detailed description of the properties of PLS, please consult <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> and references therein.</p>
         <p>Here, we propose the <ul>m</ul>icro<ul>a</ul>rray <ul>s</ul>pot <ul>q</ul>uality c<ul>o</ul>n<ul>t</ul>rol (MASQOT) methodology for assessment of cDNA microarray spot quality, outlined in figure <figr fid="F1">1</figr>. A set of existing and novel spot descriptors were identified that aimed to characterize spot quality in terms of physical attributes of the spot. Prior to the extraction of descriptors, manual assessment of the spot quality was performed independently by three experienced microarray users on roughly eighty thousand spots in order to provide a sufficiently large data set of known quality.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Flowchart of the classification procedure</p>
            </caption>
            <text>
               <p><b>Flowchart of the classification procedure</b>. The classification process involves an 8-bit image, optimized for segmentation, as well as a 32-bit image, used for information extraction. During the training phase, visual classification results are required while this is not necessary for external data.</p>
            </text>
            <graphic file="1471-2105-6-250-1"/>
         </fig>
         <p>Spot descriptors were subsequently subjected to multivariate discriminant analysis by means of PLS-DA with the aim to categorize spots of low quality and spots of high quality by treating these spots as separate classes. The utilized descriptors aimed to describe foreground and background irregularity measures, spot morphology and foreground density attributes that were potentially useful for discriminating between the reliable (not bad) and unreliable (bad) spots. For instance, the <it>circularity </it>measure is a descriptor ranging from 0% to 100%. If the circularity descriptor approaches its minimum value, the predicted class belonging should typically be higher (closer to 1) for the bad class compared to the not bad class. The <it>coefficient of variation </it>for the foreground and background regions is an example of an employed descriptor that illustrates reverse characteristics; higher values should typically provide greater class conformity with the bad class compared to the not bad class. However, all employed descriptors together contribute to the regression model and consequently to the final class determination at varying degrees depending on the properties of the spot.</p>
         <p>The MASQOT approach aims to provide a consistent assessment of spot quality, applicable to various types of microarray data, thus avoiding the labor-intensive manual quality assessment process. The methodology generates continuous values of spot quality which can be utilized to discard spots of undesired quality or used as weights in subsequent analysis procedures.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>Five cDNA microarray slides using the <it>Populus </it>second generation microarray slide layout (POP2) where the samples originate from a previous investigation of leaf development (Sj&#246;din <it>et al</it>, in preparation) were used for classification training. Five additional POP2 slides, not included in the training set, were employed for external validation. Segmentation of raw images was performed using an implementation of the Seeded Region Growing (SRG) algorithm <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The properties of each spot were subsequently characterized using a large set of descriptors allegedly linked to spot quality. These properties include foreground and background variability measures, spot morphology and foreground intensity distribution measures. Please consult <supplr sid="S1">Additional file 1</supplr> for a complete list of all the utilized descriptors.</p>
         <suppl id="S1">
            <title>
               <p>Additional File 1</p>
            </title>
            <text>
               <p>Definition of the employed spot descriptors. Provides a definition of the employed spot descriptors used to assess the quality of each spot.</p>
            </text>
            <file name="1471-2105-6-250-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p>Following segmentation, spots were inspected by three experienced microarray users and independently assigned to the two quality categories {bad, not bad}. Spots in the bad category consisted of all the spots that were classified as bad by at least one of the experienced users while the remaining spots were categorized as not bad. For classification and evaluation purposes, the spots in the bad category were subsequently partitioned into different sub-classes based on visual properties as described in table <tblr tid="T1">1</tblr>. This can be seen as characterizing each spot as exhibiting</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>The different sub-classes of bad spots.</p>
            </caption>
            <tblbdy cols="2">
               <r>
                  <c ca="left">
                     <p>
                        <b>Class</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Description</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>not bad</p>
                  </c>
                  <c ca="left">
                     <p>No issue. Contains all spots with no apparent problems according to the classification by the three experienced users.</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIFI</p>
                  </c>
                  <c ca="left">
                     <p>High-Intensity Foreground Issue. Typically intensity distribution issues, such a dye debris in the foreground region or donut-shaped spots, with very distinct characteristics.</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LIFI</p>
                  </c>
                  <c ca="left">
                     <p>Low-Intensity Foreground Issue. Weak intensity distribution issues in the foreground region or morphological issues.</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIBI</p>
                  </c>
                  <c ca="left">
                     <p>High-Intensity Background Issue. Typically intensity distribution issues, such a dye debris in the background region, with very distinct characteristics.</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LIBI</p>
                  </c>
                  <c ca="left">
                     <p>Low-Intensity Background Issue. Weak intensity distribution issues or faint increases in noise level in the background region.</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIFI/HIBI</p>
                  </c>
                  <c ca="left">
                     <p>A combination of HIFI and HIBI.</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIFI/LIBI</p>
                  </c>
                  <c ca="left">
                     <p>A combination of HIFI and LIBI.</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LIFI/HIBI</p>
                  </c>
                  <c ca="left">
                     <p>A combination of LIFI and HIBI.</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LIFI/LIBI</p>
                  </c>
                  <c ca="left">
                     <p>A combination of LIFI and LIBI.</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIFI/LIFI</p>
                  </c>
                  <c ca="left">
                     <p>A combination of HIFI and LIFI.</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIFI/LIFI/HIBI</p>
                  </c>
                  <c ca="left">
                     <p>A combination of HIFI and LIFI and HIBI.</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>1. No issues (not bad); or</p>
         <p>2. Foreground issues (FI); or</p>
         <p>3. Background issues (BI); or</p>
         <p>4. Any combination of 2 and 3</p>
         <p>To avoid confounding of properties, only spots displaying pure issues (entries 1&#8211;3) were used in the classification training, although all spots were used in the model evaluation. The three-class problem was subjected to multivariate analysis by means of PLS <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> regression coupled with discriminant analysis (PLS-DA) with the aim to discriminate not bad spots from FI spots and BI spots. The result from the PLS-DA regression model is a predicted class conformity (CC) value for each of the classes: not bad (CC<sub>nb</sub>), foreground issues (CC<sub>FI</sub>) and background issues (CC<sub>BI</sub>) with the added restriction that CC<sub>nb </sub>= 1 - (CC<sub>FI </sub>+ CC<sub>BI</sub>). Due to this restriction, only the conformity value of the not bad spots (CC<sub>nb</sub>) will be interpreted in the upcoming sections. A CC<sub>nb </sub>value approaching 1 denotes high compliance with the not bad class, which can be interpreted as a quality measure of the spot. Spots visually categorized as bad should thus exhibit a value of CC<sub>nb </sub>close to 0 whereas spots categorized as not bad should exhibit a value of CC<sub>nb </sub>close to 1.</p>
         <p>Receiver Operating Characteristics (ROC) plot of the classifications of the POP2 training set and POP2 test set, respectively, is available in figure <figr fid="F2">2</figr>. A density plot of the CC<sub>nb </sub>value for the bad and not bad spots in the POP2 training set is shown in figure <figr fid="F3">3</figr>, showing a partial overlap between the discrimination of the two classes. Due to this overlap, discrimination accuracy was dependent on a threshold value <it>t </it>denoting the separation point between the two classes.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Receiver Operating Characteristics (ROC) plot</p>
            </caption>
            <text>
               <p><b>Receiver Operating Characteristics (ROC) plot</b>. The relation between true positives (bad spots classified as bad) and false positives (not bad spots classified as bad) for the training and test data. The solid line denotes training data whereas the dashed line denotes test data.</p>
            </text>
            <graphic file="1471-2105-6-250-2"/>
         </fig>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Density plot of the predicted class conformity of the not bad class</p>
            </caption>
            <text>
               <p><b>Density plot of the predicted class conformity of the not bad class</b>. A class conformity value of 1 signifies perfect class conformity while a value of 0 signifies no class conformity. The dashed line illustrates the density for the prediction of the bad spots in the POP2 training set whereas the solid line illustrates the density of the prediction of the not bad spots in the POP2 training set.</p>
            </text>
            <graphic file="1471-2105-6-250-3"/>
         </fig>
         <p>Spots with a predicted CC<sub>nb </sub>value below <it>t </it>were classified as bad whereas the remaining spots were classified as not bad. The threshold value can be set more or less stringently depending on the quality filtering requirements. This is illustrated in figures <figr fid="F4">4a&#8211;b</figr>, depicting different views of interpreting the classification accuracy of the POP2 training set and the POP2 test set. The threshold values were set either to maximize the overall classification accuracy or to maximize the class-wise classification accuracy. The overall classification accuracies for the POP2 training set (38 627 spots) and POP2 test set (39 421 spots) were calculated using equation 1 for all <it>t </it>in the interval (0,1) using CC<sub>nb </sub>values for the {bad, not bad} spots. The classification accuracy peak at a level of approximately 98% where <it>t </it>= 0.4 (see figure <figr fid="F4">4a</figr>). Predicted class-wise accuracies were calculated using equation 1 employing CC<sub>nb </sub>values for the bad spots and CC<sub>nb </sub>values for the not bad spots separately. The predicted class-wise accuracies intersect at a level of 95% (see figure <figr fid="F4">4b</figr>) for <it>t </it>= 0.5. Exact classification accuracies per sub-class, based on the intersection threshold value <it>t </it>= 0.5 as illustrated in figure <figr fid="F4">4b</figr>, are available in table <tblr tid="T2">2</tblr>.</p>
         <fig id="F4">
            <title>
               <p>Figure 4</p>
            </title>
            <caption>
               <p>Relationship between classification accuracy and threshold value for the POP2 data</p>
            </caption>
            <text>
               <p><b>Relationship between classification accuracy and threshold value for the POP2 data. </b>The threshold value <it>t </it>defines the boundary between bad and not bad spots for the POP2 training set (38 627 spots) and the POP2 test set (39 421 spots). Spots with a predicted class conformity value for the not bad class (CC<sub>nb</sub>) below the threshold value <it>t </it>are classified as bad while the remaining spots are classified as not bad. <b>a) </b>Overall classification accuracy <it>vs</it>. threshold value calculated as the fraction of correctly classified spots in the data set for a given threshold value. The solid line represents the POP2 training set whereas the dashed line represents the POP2 test set. The dotted vertical line at threshold value <it>t </it>= 0.4 illustrates an approximate maximum. <b>b) </b>Classification accuracy of the bad and not bad spots <it>vs</it>. threshold value. For the POP2 training set, the solid line represents the classification accuracy of the not bad spots and the dashed line represents the classification accuracy of the bad spots. For the POP2 test set, the dot-dashed line represents the classification accuracy of the not bad spots and the long-dashed line represents the classification accuracy of the bad spots. The dotted vertical line at threshold value <it>t </it>= 0.5 denotes the intersection point.</p>
            </text>
            <graphic file="1471-2105-6-250-4"/>
         </fig>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>Classification accuracy of the POP2 training data. The classification accuracy for each sub-class as calculated using threshold value <it>t </it>= 0.5.</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c ca="left">
                     <p>
                        <b>Class</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Number of spots</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Classification accuracy (%)</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>not bad</p>
                  </c>
                  <c ca="left">
                     <p>35983</p>
                  </c>
                  <c ca="left">
                     <p>94.7</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIFI</p>
                  </c>
                  <c ca="left">
                     <p>942</p>
                  </c>
                  <c ca="left">
                     <p>98.7</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LIFI</p>
                  </c>
                  <c ca="left">
                     <p>76</p>
                  </c>
                  <c ca="left">
                     <p>86.8</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIBI</p>
                  </c>
                  <c ca="left">
                     <p>987</p>
                  </c>
                  <c ca="left">
                     <p>96.5</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LIBI</p>
                  </c>
                  <c ca="left">
                     <p>284</p>
                  </c>
                  <c ca="left">
                     <p>85.9</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIFI/HIBI</p>
                  </c>
                  <c ca="left">
                     <p>81</p>
                  </c>
                  <c ca="left">
                     <p>97.5</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIFI/LIBI</p>
                  </c>
                  <c ca="left">
                     <p>69</p>
                  </c>
                  <c ca="left">
                     <p>98.6</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LIFI/HIBI</p>
                  </c>
                  <c ca="left">
                     <p>66</p>
                  </c>
                  <c ca="left">
                     <p>98.5</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LIFI/LIBI</p>
                  </c>
                  <c ca="left">
                     <p>44</p>
                  </c>
                  <c ca="left">
                     <p>77.3</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIFI/LIFI</p>
                  </c>
                  <c ca="left">
                     <p>29</p>
                  </c>
                  <c ca="left">
                     <p>89.7</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HIFI/LIFI/HIBI</p>
                  </c>
                  <c ca="left">
                     <p>62</p>
                  </c>
                  <c ca="left">
                     <p>100.0</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>The presented MASQOT approach was compared to three existing quality control methods: the composite quality score q<sub>com </sub>proposed by Wang <it>et al </it><abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, the mean-median correlation factor mm<sub>corr </sub>evaluated by Tran <it>et al </it><abbrgrp><abbr bid="B9">9</abbr></abbrgrp> and the coefficient of variation (CV) parameter CV<sub>spot </sub>evaluated by Sauer <it>et al </it><abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Threshold values for all quality control parameters were set to achieve maximum overall classification accuracy. The result, shown in table <tblr tid="T3">3</tblr>, demonstrates that the MASQOT approach provides a greater level of class discrimination for the POP2 test set compared to the remaining evaluated quality control methods.</p>
         <tbl id="T3">
            <title>
               <p>Table 3</p>
            </title>
            <caption>
               <p>Comparison to other quality control methods. The presented quality control parameter CC<sub>nb </sub>was compared to the composite quality score q<sub>com</sub>, the mean-median correlation factor mm<sub>corr </sub>and the CV<sub>spot </sub>value. Threshold values for all quality control parameters were set to maximize overall classification accuracy. The classification accuracy was determined from classification of the POP2 test set.</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c ca="left">
                     <p>
                        <b>Quality control parameter</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Threshold</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Classification accuracy (%)</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>CC<sub>nb</sub></p>
                  </c>
                  <c ca="left">
                     <p>0.40</p>
                  </c>
                  <c ca="left">
                     <p>98.1%</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>q<sub>com</sub></p>
                  </c>
                  <c ca="left">
                     <p>0.32</p>
                  </c>
                  <c ca="left">
                     <p>94.5%</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>mm<sub>corr</sub></p>
                  </c>
                  <c ca="left">
                     <p>0.65</p>
                  </c>
                  <c ca="left">
                     <p>94.3%</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>CV<sub>spot</sub></p>
                  </c>
                  <c ca="left">
                     <p>1.05</p>
                  </c>
                  <c ca="left">
                     <p>95.0%</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>cDNA microarray spot quality control is, in many aspects, a complex problem. Naturally, the automatic assessment of quality of each spot is highly reliant on the characterization of the spot. Incorrect approximations of the spatial location will affect the properties of the segmented foreground region, which in turn will influence the values of the quality control descriptors. In such a sequential process, where each step is dependent on the preceding steps, errors will propagate down-stream at a high rate. However, the most striking intricacy is perhaps the visual assessment, which is the foundation of this computer-based classification, where even experienced microarray users tend to disagree. As shown from the results presented here, it is the spots with unanimous visual quality assessment that are the most complicated to reproduce accurately. This disagreement stems from the more fundamental issue of defining 'quality' and in understanding the basal aspects that affect this quality.</p>
         <p>The approach described here aims to assess the technical precision of each spot, which is believed to be linked to the biological accuracy (see <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> for a discussion regarding precision and accuracy in the microarray field). It should thus be stated, in this context, that lack of precision in a microarray spot measurement does not necessarily infer lack of accuracy. However, it is arguably reasonable to handle spots of questionable precision with specific care during the analysis procedure to aid the concluding biological interpretations.</p>
         <p>The spot quality control assessment is commonly treated as a discrete problem (essentially, separating 'bad' spots from 'good' spots) but the spot quality varies on a continuous scale, ranging from very bad to very good. Instead of discarding spots, one might weight the spots according to the quality assessment. The concept of relative spot weights has previously been acknowledged in microarray normalization techniques (see, for instance <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B20">20</abbr></abbrgrp>) but might also prove to be valuable in additional analysis steps. However, evaluation of the usage and validity of spot weights based on the quality assessments provided here remains the scope of a future paper.</p>
         <p>The rate of accuracy in prediction of the true positives (bad spots predicted as bad) and the true negatives (not bad spots predicted as not bad) are illustrated separately since these accuracies are not consistently of equal importance. For instance, depending on the user and the question at hand, it might be more important to avoid the risk of removing spots of decent quality than to eliminate all of the bad spots from the data set. Simply using the overall classification accuracy could be rather illusive, merely since the number of bad spots is much lower than the number of not bad spots in a typical data set.</p>
         <p>The methodology presented here is based on the scaled sum of the intensities from both channels but can, with minor adjustments, also be based on single channel intensities. By using the scaled channel intensity levels, one avoids the risk of drowning information, in particular when there is a great difference in intensity level between the channels. In addition, the presented approach greatly resembles the visual illustrations of the spots which, by design, will provide an advantage in finding correlation between the visual quality assessment and the spot descriptors. Furthermore, it is more feasible for the average user to achieve a per-spot quality measure than a per-channel quality measure since this avoids raising questions with regard to what to do when only one channel is of low or moderate quality.</p>
         <p>The recent advances in spot quality control have clearly shown that a good explanation of training data is possible using several different methodologies of varying complexity. However, very few efforts have been made to evaluate further aspects of the quality (for instance, more refined descriptors) and, most importantly, the reproducibility of the classification on external data. External reproducibility has been the major aim here, partly overshadowing the aim of internal reproducibility on the training data, which is shown by the clear agreement in accuracy between the independent POP2 training set and the POP2 test set.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The presented MASQOT technique provides a robust methodology for semi-automated cDNA microarray spot quality control with high accuracy of training data as well as external data compared to other evaluated methods. The MASQOT methodology generates non-discrete values of spot quality which can be utilized as weights in subsequent analysis procedures as well as to discard spots of undesired quality using the proposed threshold values.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Microarray preparation</p>
            </st>
            <p>Samples for the microarray slides used in this paper originate from an experiment of <it>Populus tremula </it>leaves, investigating regulation of leaf development (Sj&#246;din <it>et al</it>, in preparation). The utilized microarray layout, referred to as POP2, consist of 25 278 single spotted cDNA clones from a recent assembly of more than 100 000 expressed sequence tags (ESTs) from the <it>Populus </it>genus <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. All sequence information is available in the online sequence resource PopulusDB <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> and a full array layout is available for download from the online microarray resource UPSC-BASE <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
            <p>Ten out of a total of 28 POP2 slides were randomly chosen for classification and were subsequently grouped into two equally large sets of five slides each; the POP2 training set and POP2 external test set. See <supplr sid="S2">additional file 2</supplr> for a complete list of the POP2 microarray slides used here. All POP2 slides were printed using a QArray arrayer (Genetix, Hampshire, U.K.). The preparation, labeling and hybridization of cDNA clones and mRNA samples were carried out according to the protocol described by Smith <it>et al </it><abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The arrays were scanned on a ScanArray 4000 (Perkin-Elmer Wellesley, MA) at 5 &#956;m resolution to obtain raw image files for the red-fluorescent dye Cy5 and the green-fluorescent dye Cy3. All POP2 raw image files are available online for download at the UPSC-BASE microarray database <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> from experiment number 0013.</p>
            <suppl id="S2">
               <title>
                  <p>Additional File 2</p>
               </title>
               <text>
                  <p>A list of the employed POP2 slides. Provides a list of the employed POP2 slides.</p>
               </text>
               <file name="1471-2105-6-250-S2.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Image analysis</p>
            </st>
            <p>The workflow from scanned cDNA images to computer-based classification was separated into seven sub-procedures, outlined below and illustrated in figure <figr fid="F1">1</figr>.</p>
            <p><it>1. Image merging </it>generated a combined image from the intensity measurements of both the red-fluorescent dye Cy5 and the green-fluorescent dye Cy3, which was used in the subsequent gridding and segmentation steps.</p>
            <p><it>2. Gridding </it>attempted to identify the precise spatial center of the spots on the scanned microarray images.</p>
            <p><it>3. Segmentation </it>classified the pixels as either representing the cDNA expression level (foreground pixels) or an estimation of the local noise level (background pixels). In addition, a thin strip of pixels in the boundary region between the two segments (border pixels) was identified.</p>
            <p><it>4. Information extraction </it>refers to the characterization of the foreground and background regions from the segmentation process. In general terms, information extraction should provide a description of each region that is relevant in some sense (for instance, the spatial location of the foreground region or the foreground intensity level.) The focus here was on features that captured the overall quality of the spot.</p>
            <p><it>5. Manual classification </it>provided a measure of the spot quality by means of visual inspection carried out by three experienced microarray users.</p>
            <p><it>6. Computer-based classification </it>of spot quality (the <it>training phase</it>) generated a model for the differences between the spot quality classes using discriminant analysis based on the PLS regression method (PLS-DA).</p>
            <p><it>7. Verification </it>of the computer-based classification (the <it>test phase</it>) validated the predictive ability of the model using processed data not included in the training phase.</p>
         </sec>
         <sec>
            <st>
               <p>Image merging</p>
            </st>
            <p>Both segmentation and gridding were based on a combined eight-bit image constructed from the intensity measurements of the red-fluorescent dye Cy5 and the green-fluorescent dye Cy3. The merged eight-bit image lacks some details of the original images but is computationally efficient, in particular concerning memory requirements. Details of the utilized damping and scaling procedures are described by Yang <it>et al </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and briefly outlined below.</p>
            <p>&#8226; The intensity levels in both images were square-root transformed. The square-root transform utilizes damping, which ensures that the relative impact of high-intensity pixels is decreased during gridding and segmentation.</p>
            <p>&#8226; Median intensity values were computed from the transformed images.</p>
            <p>&#8226; A joint intensity value was calculated using the sum of the square-root transformed intensities from both channels scaled according to the median values, respectively.</p>
            <p>&#8226; Intensity values greater than 255 were truncated.</p>
         </sec>
         <sec>
            <st>
               <p>Gridding</p>
            </st>
            <p>Approximate spatial centers of each spot, referred to as the <it>grid points</it>, were manually located using an in-house developed Java application. This procedure is the only step in the classification process that requires user intervention. A more precise midpoint of the foreground region was found using a square pixel mask with the expected spot diameter (100 &#956;m for the POP2 data) surrounding the initial grid point. The pixel mask was spatially reallocated in all directions, deviating at most 30 &#956;m from the initial grid point, and the center position of the square pixel mask containing the highest total sum of intensities was selected as seed point.</p>
         </sec>
         <sec>
            <st>
               <p>Segmentation</p>
            </st>
            <p>The employed segmentation method was an implementation of the seeded region growing (SRG) algorithm, initially proposed by Adams and Bischof <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The SRG method has earlier been utilized in microarray spot segmentation by Yang <it>et al </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Implementation details are available in <supplr sid="S3">additional file 3</supplr>. The result from the segmentation process was a pixel mask categorizing each pixel into one of the four groups {foreground, border, background, un-assigned}. Each spot thus consisted of a distinct foreground region with the following characteristics:</p>
            <suppl id="S3">
               <title>
                  <p>Additional File 3</p>
               </title>
               <text>
                  <p>Implementation details of the segmentation process. Provides in-depth information regarding the implementation of the seeded region growing (SRG) algorithm.</p>
               </text>
               <file name="1471-2105-6-250-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>&#8226; All pixels within the foreground region were spatially connected.</p>
            <p>&#8226; No pixels overlapped with the foreground region of another spot.</p>
            <p>&#8226; Minor fluctuations in intensity level within the region were accepted.</p>
            <p>&#8226; The maximum Euclidean distance between any two pixels in the foreground region was restricted.</p>
            <p>&#8226; Spot circularity was <it>not </it>assumed.</p>
         </sec>
         <sec>
            <st>
               <p>Information extraction</p>
            </st>
            <p>The data utilized in the information extraction originates from the sum of the raw intensities of both the Cy5 and the Cy3 channels scaled according to the respective median intensity value. The scaling was applied to decrease the impact of the channel demonstrating the highest median intensity value. A large set of different features were extracted which were believed to be linked to spot quality and these were subsequently used in the upcoming computer-based classification. For purposes of repeatability and applicability to various types of microarray slides, all descriptors were corrected by an approximation of the slide background mean based on the mean intensity level of the local background regions from all spots on the slide.</p>
            <p>Furthermore, spots where the saturation contents in at least one of the channels exceeded 10% of the total number of pixels, as suggested by Wang <it>et al </it><abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, were not included in the classification.</p>
            <p>A complete table of all extracted descriptors including a description or definition is available in <supplr sid="S1">additional file 1</supplr>. The descriptors aimed to capture foreground and background variability properties, spot morphology and foreground intensity and density properties.</p>
         </sec>
         <sec>
            <st>
               <p>Manual classification</p>
            </st>
            <p>The spots from ten POP2 slides were independently inspected by three experienced microarray users and assigned to the two quality categories {bad, not bad}. The spots in the bad category consisted of all the spots that were classified as bad by at least one of the experienced users while the remaining spots were categorized as not bad.</p>
            <p>During the visual classification, the experienced users worked according to four basic rules of thumb related to the technical precision of each spot.</p>
            <p>&#8226; The signal within the foreground region should have low variability.</p>
            <p>&#8226; The foreground region should be circular.</p>
            <p>&#8226; The foreground region should be spatially located at the expected position.</p>
            <p>&#8226; The background region should have low variability and low intensity level compared to the global slide background.</p>
            <p>The relation between these, that is, how much each factor was allowed to deviate, alone and in combination with other factors, was the task for the multivariate classification model. The utilized data sets, subsequent to segmentation, are available at <supplr sid="S4">additional file 4</supplr> (training set) and <supplr sid="S5">additional file 5</supplr> (test set). A summary of the manual classifications as performed by the experienced users is available in <supplr sid="S6">additional file 6</supplr>.</p>
            <suppl id="S4">
               <title>
                  <p>Additional File 4</p>
               </title>
               <text>
                  <p>The processed POP2 training data. Provides the processed POP2 training data set, which contains per-spot values of all descriptors employed here.</p>
               </text>
               <file name="1471-2105-6-250-S4.zip">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S5">
               <title>
                  <p>Additional File 5</p>
               </title>
               <text>
                  <p>The processed POP2 test data. Provides the processed POP2 test data set, which contains per-spot values of all descriptors employed here.</p>
               </text>
               <file name="1471-2105-6-250-S5.zip">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S6">
               <title>
                  <p>Additional File 6</p>
               </title>
               <text>
                  <p>Manual quality assessments. Provides a summary of the quality assessments as classified by the three experienced microarray users.</p>
               </text>
               <file name="1471-2105-6-250-S6.zip">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>The POP2 slides were randomly partitioned into two equally large sets of five slides each; the POP2 training set and the POP2 test set. For classification and evaluation purposes, the bad spots of the training set were subsequently divided into different sub-classes based on visual properties as described in table <tblr tid="T1">1</tblr>. The HIFI and LIFI sub-classes were used as representatives of the pure foreground issues (FI) during classification training. Analogously, the HIBI and LIBI sub-classes were used as representatives of the pure background issues (BI) during classification training. Typical examples of the described sub-classes can be found in <supplr sid="S7">additional file 7</supplr>.</p>
            <suppl id="S7">
               <title>
                  <p>Additional File 7</p>
               </title>
               <text>
                  <p>Visual representations of the sub-classes of bad spots. Provides images of typical examples of the 4 main sub-classes of bad spots.</p>
               </text>
               <file name="1471-2105-6-250-S7.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Computer-based classification</p>
            </st>
            <p>The computer-based classification was performed using PLS-DA as implemented in SIMCA-P+ 10.0 (Umetrics AB, Ume&#229;, Sweden). Cross-validation <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> with seven groups was used to determine the number of latent variables. Prior to analysis, all descriptors were column-wise mean-centered and scaled to unit variance (UV) by dividing each descriptor with the standard deviation of the descriptor. The UV scaling procedure in combination with mean-centering translates the distribution of each descriptor to unit variance. Results and model statistics from the PLS-DA training phase are described in <supplr sid="S8">additional file 8</supplr>.</p>
            <suppl id="S8">
               <title>
                  <p>Additional File 8</p>
               </title>
               <text>
                  <p>Details of the PLS-DA model. Provides details and statistics from the utilized PLS-DA model.</p>
               </text>
               <file name="1471-2105-6-250-S8.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Classification training was based on discriminant analysis of subsets of the foreground issues (FI) class, the background issues (BI) class and the not bad class. See table <tblr tid="T1">1</tblr> for a more detailed description of the available sub-classes. Prior to discriminant analysis, a representative subset of each class consisting of 355 spots each was selected using D-optimal design <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> in order to eliminate the large differences in data set size between the three classes. See <supplr sid="S9">additional file 9</supplr> for the designed data set and <supplr sid="S10">additional file 10</supplr> for details regarding the D-optimal design.</p>
            <suppl id="S9">
               <title>
                  <p>Additional File 9</p>
               </title>
               <text>
                  <p>The designed subset of the POP2 training data. Provides the processed and filtered POP2 training data set, containing only the spots from the not bad, FI and BI classes which were selected according to the D-optimal design.</p>
               </text>
               <file name="1471-2105-6-250-S9.zip">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S10">
               <title>
                  <p>Additional File 10</p>
               </title>
               <text>
                  <p>Description of the utilized D-optimal design. Provides information regarding generation of the D-optimal design used in to select subsets of the three classes.</p>
               </text>
               <file name="1471-2105-6-250-S10.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Classification accuracies were calculated using equation 1, where <it>n </it>is the number of observations; <it>t </it>is a threshold value and <it>x </it>the predicted class conformity values for a given set of spots. Equation 1 utilizes the <it>corr</it><sub><it>pred</it></sub><it>(i, y, t) </it>function that returns 1 if <it>i </it>&#8712; {bad} and y &lt; t <it>or </it>if <it>i </it>&#8712; {not bad} and y &#8805; t; or 0 otherwise.</p>
            <p>
               <m:math name="1471-2105-6-250-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>p</m:mi>
                        <m:mi>r</m:mi>
                        <m:mi>e</m:mi>
                        <m:msub>
                           <m:mi>d</m:mi>
                           <m:mrow>
                              <m:mi>a</m:mi>
                              <m:mi>c</m:mi>
                              <m:mi>c</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>x</m:mi>
                        <m:mo>,</m:mo>
                        <m:mi>t</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mn>100</m:mn>
                           </m:mrow>
                           <m:mi>n</m:mi>
                        </m:mfrac>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mi>n</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:mi>c</m:mi>
                              <m:mi>o</m:mi>
                              <m:mi>r</m:mi>
                              <m:msub>
                                 <m:mi>r</m:mi>
                                 <m:mrow>
                                    <m:mi>p</m:mi>
                                    <m:mi>r</m:mi>
                                    <m:mi>e</m:mi>
                                    <m:mi>d</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>i</m:mi>
                              <m:mo>,</m:mo>
                              <m:msub>
                                 <m:mi>x</m:mi>
                                 <m:mi>i</m:mi>
                              </m:msub>
                              <m:mo>,</m:mo>
                              <m:mi>t</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                        </m:mstyle>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqabeGadaaakeaacqWGWbaCcqWGYbGCcqWGLbqzcqWGKbazdaWgaaWcbaGaemyyaeMaem4yamMaem4yamgabeaakiabcIcaOiabdIha4jabcYcaSiabdsha0jabcMcaPiabg2da9maalaaabaGaeGymaeJaeGimaaJaeGimaadabaGaemOBa4gaamaaqahabaGaem4yamMaem4Ba8MaemOCaiNaemOCai3aaSbaaSqaaiabdchaWjabdkhaYjabdwgaLjabdsgaKbqabaGccqGGOaakcqWGPbqAcqGGSaalcqWG4baEdaWgaaWcbaGaemyAaKgabeaakiabcYcaSiabdsha0jabcMcaPaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemOBa4ganiabggHiLdGccaWLjaGaaCzcamaabmaabaGaeGymaedacaGLOaGaayzkaaaaaa@6051@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>List of abbreviations used</p>
         </st>
         <p><b>PLS </b>Partial Least Squares</p>
         <p><b>PLS-DA </b>Partial Least Squares Discriminant Analysis</p>
         <p><b>SRG </b>Seeded Region Growing</p>
         <p><b>ROC </b>Receiver Operating Characteristics</p>
         <p><b>EST </b>Expressed Sequence Tag</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>MB implemented the segmentation and gridding tools, performed visual classification, implemented the D-optimal design, conceived and generated the classification model and drafted the manuscript. DE generated the microarray slides, performed visual classification and helped to draft the manuscript. AS conceived the study, collected leaf samples, performed visual classification and helped to draft the manuscript. MS, SJ, HA and JT supervised the project. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported by grants from</p>
            <p>&#8226; The Swedish Foundation for Strategic Research (MB, HA)</p>
            <p>&#8226; The Knut and Alice Wallenberg Foundation (JT)</p>
            <p>&#8226; The European Commission through the Directorate General Research within the Fifth Framework for Research &#8211; Quality of Life and Management of the Living Resources Programme, contract No.QLK5-CT-2002-00953 coordinated by the University of Southampton (AS, SJ).</p>
            <p>&#8226; The Swedish Research Council (AS, SJ, DE, MS)</p>
            <p>&#8226; EU-strategic funding (DE)</p>
            <p>&#8226; The Functional Genomics Initiative at Swedish University of Agricultural Sciences (DE)</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Analysis of variance for gene expression microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Kerr</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Churchill</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <issue>6</issue>
            <fpage>819</fpage>
            <lpage>837</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270050514954</pubid>
                  <pubid idtype="pmpid" link="fulltext">11382364</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Normalization of cDNA microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Smyth</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Methods</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>4</issue>
            <fpage>265</fpage>
            <lpage>273</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1046-2023(03)00155-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">14597310</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Assessing gene significance from cDNA microarray expression data via mixed models</p>
            </title>
            <aug>
               <au>
                  <snm>Wolfinger</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wolfinger</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Bennett</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hamadeh</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bushel</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Afshari</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Paules</snm>
                  <fnm>RS</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2001</pubdate>
            <volume>8</volume>
            <issue>6</issue>
            <fpage>625</fpage>
            <lpage>637</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652701753307520</pubid>
                  <pubid idtype="pmpid" link="fulltext">11747616</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Empirical Bayes analysis of a microarray experiment</p>
            </title>
            <aug>
               <au>
                  <snm>Efron</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Storey</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Tusher</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>J Am Stat Assoc</source>
            <pubdate>2001</pubdate>
            <volume>96</volume>
            <issue>456</issue>
            <fpage>1151</fpage>
            <lpage>1160</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1198/016214501753382129</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Comparison of methods for image analysis on cDNA microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Buckley</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Dudoit</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>J Comput Graph Stat</source>
            <pubdate>2002</pubdate>
            <volume>11</volume>
            <issue>1</issue>
            <fpage>108</fpage>
            <lpage>136</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1198/106186002317375640</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A novel strategy for microarray quality control using Bayesian networks</p>
            </title>
            <aug>
               <au>
                  <snm>Hautaniemi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Edgren</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Vesanen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jarvinen</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Yli-Harja</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Astola</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kallioniemi</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Monni</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>16</issue>
            <fpage>2031</fpage>
            <lpage>2038</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg275</pubid>
                  <pubid idtype="pmpid" link="fulltext">14594707</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Ratio statistics of gene expression levels and applications to microarray data analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Kamat</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Dougherty</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Bittner</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Meltzer</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Trent</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>9</issue>
            <fpage>1207</fpage>
            <lpage>1215</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.9.1207</pubid>
                  <pubid idtype="pmpid" link="fulltext">12217912</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Quantitative quality control in microarray image processing and data acquisition</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Ghosh</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Guo</snm>
                  <fnm>SW</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>15</issue>
            <fpage>e75</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55840</pubid>
                  <pubid idtype="pmpid" link="fulltext">11470890</pubid>
                  <pubid idtype="doi">10.1093/nar/29.15.e75</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals</p>
            </title>
            <aug>
               <au>
                  <snm>Tran</snm>
                  <fnm>PH</fnm>
               </au>
               <au>
                  <snm>Peiffer</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Shin</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Meek</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Brody</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Cho</snm>
                  <fnm>KW</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <issue>12</issue>
            <fpage>e54</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">117296</pubid>
                  <pubid idtype="pmpid" link="fulltext">12060692</pubid>
                  <pubid idtype="doi">10.1093/nar/gnf053</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Quick and simple: quality control of microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Sauer</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Preininger</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hany-Schmatzberger</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>8</issue>
            <fpage>1572</fpage>
            <lpage>1578</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti238</pubid>
                  <pubid idtype="pmpid" link="fulltext">15615693</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Evaluating methods for classifying expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Man</snm>
                  <fnm>MZ</fnm>
               </au>
               <au>
                  <snm>Dyson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Liao</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Biopharm Stat</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>4</issue>
            <fpage>1065</fpage>
            <lpage>1084</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1081/BIP-200035491</pubid>
                  <pubid idtype="pmpid">15587980</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach</p>
            </title>
            <aug>
               <au>
                  <snm>Perez-Enciso</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tenenhaus</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Hum Genet</source>
            <pubdate>2003</pubdate>
            <volume>112</volume>
            <issue>5-6</issue>
            <fpage>581</fpage>
            <lpage>592</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12607117</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm)</p>
            </title>
            <aug>
               <au>
                  <snm>Eriksson</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Antti</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Gottfries</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Holmes</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Johansson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lindgren</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lundstedt</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Trygg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wold</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Anal Bioanal Chem</source>
            <pubdate>2004</pubdate>
            <volume>380</volume>
            <issue>3</issue>
            <fpage>419</fpage>
            <lpage>429</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00216-004-2783-y</pubid>
                  <pubid idtype="pmpid" link="fulltext">15448969</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets</p>
            </title>
            <aug>
               <au>
                  <snm>Jonsson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bruce</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Moritz</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Trygg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sj&#246;str&#246;m</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Plumb</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Granger</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Maibaum</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Nicholson</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Holmes</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Antti</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Analyst</source>
            <pubdate>2005</pubdate>
            <volume>130</volume>
            <issue>5</issue>
            <fpage>701</fpage>
            <lpage>707</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1039/b501890k</pubid>
                  <pubid idtype="pmpid" link="fulltext">15852140</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Multivariate Calibration</p>
            </title>
            <aug>
               <au>
                  <snm>Martens</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Naes</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <publisher>Chichester , John Wiley &amp; Sons</publisher>
            <pubdate>1992</pubdate>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Latent variable multivariate regression modeling</p>
            </title>
            <aug>
               <au>
                  <snm>Burnham</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>MacGregor</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Viveros</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Chemom Intell Lab Syst</source>
            <pubdate>1999</pubdate>
            <volume>48</volume>
            <issue>2</issue>
            <fpage>167</fpage>
            <lpage>180</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0169-7439(99)00018-0</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Multi- and Megavariate Data Analysis: Principles and Applications</p>
            </title>
            <aug>
               <au>
                  <snm>Eriksson</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Johansson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kettaneh-Wold</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Wold</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <publisher>Ume&#229; , Umetrics Academy</publisher>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Seeded Region Growing</p>
            </title>
            <aug>
               <au>
                  <snm>Adams</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bischof</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>IEEE T Pattern Anal</source>
            <pubdate>1994</pubdate>
            <volume>16</volume>
            <issue>6</issue>
            <fpage>641</fpage>
            <lpage>647</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/34.295913</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>In control: systematic assessment of microarray performance</p>
            </title>
            <aug>
               <au>
                  <snm>van Bakel</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Holstege</snm>
                  <fnm>FC</fnm>
               </au>
            </aug>
            <source>EMBO Rep</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>10</issue>
            <fpage>964</fpage>
            <lpage>969</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.embor.7400253</pubid>
                  <pubid idtype="pmpid" link="fulltext">15459748</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Linear models and empirical Bayes methods for assessing differential expression in microarray experiments</p>
            </title>
            <aug>
               <au>
                  <snm>Smyth</snm>
                  <fnm>GK</fnm>
               </au>
            </aug>
            <source>Stat Appl Genet Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <issue>1</issue>
            <fpage>Article 3</fpage>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A Populus EST resource for plant functional genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Sterky</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bhalerao</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Unneberg</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Segerman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Nilsson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brunner</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Charbonnel-Campaa</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lindvall</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Tandre</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Strauss</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Sundberg</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gustafsson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Uhlen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bhalerao</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Nilsson</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Sandberg</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Karlsson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lundeberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jansson</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <issue>38</issue>
            <fpage>13951</fpage>
            <lpage>13956</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">518859</pubid>
                  <pubid idtype="pmpid" link="fulltext">15353603</pubid>
                  <pubid idtype="doi">10.1073/pnas.0401641101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>PopulusDB: A Populus EST resource for plant functional genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Sterky</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bhalerao</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Unneberg</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Segerman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Nilsson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brunner</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Charbonnel-Campaa</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lindvall</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Tandre</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Strauss</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Sundberg</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gustafsson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Uhlen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bhalerao</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Nilsson</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Sandberg</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Karlsson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lundeberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jansson</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <url>http://www.populus.db.umu.se/</url>
         </bibl>
         <bibl id="B23">
            <title>
               <p>UPSC-BASE: Populus transcriptomics online</p>
            </title>
            <aug>
               <au>
                  <snm>Sj&#246;din</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bylesj&#246;</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <url>http://www.upscbase.db.umu.se/</url>
         </bibl>
         <bibl id="B24">
            <title>
               <p>The response of the poplar transcriptome to wounding and subsequent infection by a viral pathogen</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Rodriguez-Buey</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Karlsson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>MM</fnm>
               </au>
            </aug>
            <source>New Phytol</source>
            <pubdate>2004</pubdate>
            <volume>164</volume>
            <issue>1</issue>
            <fpage>123</fpage>
            <lpage>136</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1111/j.1469-8137.2004.01151.x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Cross Validatory Estimation of the Number of Components in Factor and Principal Components Models.</p>
            </title>
            <aug>
               <au>
                  <snm>Wold</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Technometrics</source>
            <pubdate>1978</pubdate>
            <volume>20</volume>
            <fpage>397</fpage>
            <lpage>406</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>D-optimal designs</p>
            </title>
            <aug>
               <au>
                  <snm>deAguiar</snm>
                  <fnm>PF</fnm>
               </au>
               <au>
                  <snm>Bourguignon</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Khots</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Massart</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>PhanThanLuu</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Chemom Intell Lab Syst</source>
            <pubdate>1995</pubdate>
            <volume>30</volume>
            <issue>2</issue>
            <fpage>199</fpage>
            <lpage>210</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0169-7439(94)00076-X</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
