<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-11-15</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Celton</snm>
               <fnm>Magalie</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>celton@supagro.inra.fr</email>
            </au>
            <au id="A2">
               <snm>Malpertuy</snm>
               <fnm>Alain</fnm>
               <insr iid="I3"/>
               <email>alain.malpertuy@atragene.com</email>
            </au>
            <au id="A3">
               <snm>Lelandais</snm>
               <fnm>Ga&#235;lle</fnm>
               <insr iid="I1"/>
               <insr iid="I4"/>
               <email>gaelle.lelandais@univ-paris-diderot.fr</email>
            </au>
            <au ca="yes" id="A4">
               <snm>de Brevern</snm>
               <mi>G</mi>
               <fnm>Alexandre</fnm>
               <insr iid="I1"/>
               <insr iid="I4"/>
               <email>alexandre.debrevern@univ-paris-diderot.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>INSERM UMR-S 726, Equipe de Bioinformatique G&#233;nomique et Mol&#233;culaire (EBGM), DSIMB, Universit&#233; Paris Diderot - Paris 7, 2, place Jussieu, 75005, France</p>
            </ins>
            <ins id="I2">
               <p>UMR 1083 Sciences pour l'&#338;nologie INRA, 2 place Viala, 34060 Montpellier cedex 1, France</p>
            </ins>
            <ins id="I3">
               <p>Atragene Informatics, 33-35, Rue Ledru-Rollin 94200 Ivry-sur-Seine, France</p>
            </ins>
            <ins id="I4">
               <p>INSERM UMR-S 665, DSIMB, Universit&#233; Paris Diderot - Paris 7, Institut National de Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2010</pubdate>
         <volume>11</volume>
         <issue>1</issue>
         <fpage>15</fpage>
         <url>http://www.biomedcentral.com/1471-2164/11/15</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/1471-2164-11-15</pubid>
               <pubid idtype="pmpid">20056002</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>2</day>
               <month>9</month>
               <year>2009</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>7</day>
               <month>1</month>
               <year>2010</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>7</day>
               <month>1</month>
               <year>2010</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2010</year>
         <collab>Celton et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Microarray technologies produced large amount of data. In a previous study, we have shown the interest of <it>k-Nearest Neighbour </it>approach for restoring the missing gene expression values, and its positive impact of the gene clustering by hierarchical algorithm. Since, numerous replacement methods have been proposed to impute missing values (MVs) for microarray data. In this study, we have evaluated twelve different usable methods, and their influence on the quality of gene clustering. Interestingly we have used several datasets, both kinetic and non kinetic experiments from yeast and human.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We underline the excellent efficiency of approaches proposed and implemented by Bo and co-workers and especially one based on expected maximization (<it>EM_array</it>). These improvements have been observed also on the imputation of extreme values, the most difficult predictable values. We showed that the imputed MVs have still important effects on the stability of the gene clusters. The improvement on the clustering obtained by hierarchical clustering remains limited and, not sufficient to restore completely the correct gene associations. However, a common tendency can be found between the quality of the imputation method and the gene cluster stability. Even if the comparison between clustering algorithms is a complex task, we observed that <it>k-means </it>approach is more efficient to conserve gene associations.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>More than 6.000.000 independent simulations have assessed the quality of 12 imputation methods on five very different biological datasets. Important improvements have so been done since our last study. The <it>EM_array </it>approach constitutes one efficient method for restoring the missing expression gene values, with a lower estimation error level. Nonetheless, the presence of MVs even at a low rate is a major factor of gene cluster instability. Our study highlights the need for a systematic assessment of imputation methods and so of dedicated benchmarks. A noticeable point is the specific influence of some biological dataset.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Numerous genomes from species of the three kingdoms are now available <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. A major current aim of biological research is to characterize the function of genes, for instance their cellular regulation pathways and implications in pathology <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. High-throughput analyses (<it>e.g</it>., Microarrays) combined with statistical and bioinformatics data analyses are necessary to decipher such complex biological process <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. Microarrays technologies allow the characterization of a whole-genome expression by measuring the relative transcript levels of thousand of genes in one experiment <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. For instance, their relevancies were proved for the classification/identification of cancer subtype or diseases <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>.</p>
         <p>However, technical limitations or hazards (dust, scratches) lead to corrupted spots on microarray <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. During the image analysis phase, corrupted or suspicious spots are filtered <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, generating missing data <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. These missing values (MVs) disturb the gene clustering obtained by classical clustering methods, <it>e.g</it>., hierarchical clustering <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, <it>k-means </it>clustering <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, Kohonen Maps <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp> or projection methods, <it>e.g</it>., Principal Component Analysis <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. In practice, three different options can be considered. The first method leads to the elimination of genes, <it>i.e</it>., information loss <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The eliminated genes may be numerous and among them some may be essential for the analysis of the studied mechanism <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The second method corresponds to the replacement by zero <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>; it brings up a different problem in the analysis. Indeed, real data close to 0 will be confused with the MVs. Thus to limit skews related to the MVs, several methodologies using the values present in the data file to replace the MVs by estimated values have been developed <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
         <p>The most classical method to estimate these values is the <it>k</it>-nearest neighbours approach (<it>kNN</it>), which computes the estimated value from the <it>k </it>closest expression profiles among the dataset <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. This approach was applied to DNA chips by Troyanskaya and collaborators <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and rapidly became one of the most popular methods. Since this pioneer study, more sophisticated approaches have been proposed, like Sequential <it>kNN </it>(<it>SkNN</it>) <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>.</p>
         <p>Simple statistical methods have been also proposed as the <it>Row Mean </it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp>/<it>Row Average </it>method <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, or approaches based on the Expectation Maximisation algorithm (EM), <it>e.g</it>., <it>EM_gene </it>and <it>EM_array </it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Principle of least square (LS) has been also widely used, <it>e.g</it>., <it>LSI_gene</it>, <it>LSI_array</it>, <it>LSI_combined </it>and <it>LSI_adaptative </it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Kim and co-workers have extended the Least Square Imputation to Local Least Square Imputation (<it>LLSI</it>) <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. However this method is only based on the similarity of genes for estimating the missing data. Others more sophisticated methods like the Bayesian Principal Component Analysis (<it>BPCA</it>) <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> combines a principal component regression, a Bayesian estimation and a variational Bayes (VB) algorithm.</p>
         <p>The MVs replacement in microarrays data is a recent research field and numerous new and innovative methodologies are developed. We can noticed the work of Bar-Joseph <it>et al</it>. who described a model-based spline fitting method for time-series data <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and Schliep <it>et al</it>. who used hidden Markov models for imputation <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Tuikkala and co-workers have investigated the interest to use GO annotation to increase the imputation accuracy of missing values <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> as Kim <it>et al</it>. <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Hu <it>et al</it>. and J&#246;rnsten <it>et al</it>. have incorporated information from multiple reference microarray dataset to improve the estimation <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>, while Gan co-workers takes into consideration the biological characteristics of the data <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Hua and Lai did not propose a new method, but assess the quality of imputation on the concordance of gene prioritization and estimation of true/false positives <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
         <p>In addition we can list the following relevant methodologies applied in MVs replacement for microarray analysis: <it>Support Vector Regression </it><abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, <it>Factor Analysis Regression </it><abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, <it>Ordinary Least Square Impute </it><abbrgrp><abbr bid="B41">41</abbr></abbrgrp>,<it> Gaussian Mixture Clustering </it><abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, <it>LinCmb </it><abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, <it>Collateral Missing Value Estimation </it><abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, <it>Linear based model imputation </it><abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, <it>Dynamic Time Warping </it><abbrgrp><abbr bid="B46">46</abbr></abbrgrp> or <it>iterative kNN </it><abbrgrp><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>.</p>
         <p>In a previous study, we estimated the influence of MVs on hierarchical clustering results and evaluated the effectiveness of <it>kNN </it>approach <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. We observed that even a low rate of missing data can have important effects on the clusters obtain by hierarchical clustering methods. Recently, this phenomenon was confirmed by Wong and co-workers for other particular clustering methods <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>.</p>
         <p>Since our work, numerous replacement methods (see Table <tblr tid="T1">1</tblr> and previous paragraphs) have been developed to estimate MVs for microarray data. Most of the time, the new approaches are only compared to <it>kNN</it>. In this study, we decided to evaluate the quality of MV imputations with all usable methods, and their influence on the quality of gene clustering. The present paper undertakes a large benchmark of MVs replacement methods to analyze the quality of the MVs evaluation according to experimental type (kinetic or not), percentage of MVs, gene expression levels and data source (<it>Saccharomyces cerevisiae </it>and human).</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Different missing values replacement methods.</p>
            </caption>
            <tblbdy cols="6">
               <r>
                  <c ca="left">
                     <p>
                        <b>Methods</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>Author</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>Availability</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>Language</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>Used</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>Year</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p><it>K</it>-Nearest Neighbors (<it>kNN</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Troyanskaya O.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>C</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2001</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Bayesian Pricipal Component Analysis (<it>BPCA</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Oba S.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>JAVA</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2003</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Row Mean</it>
                        <sup>1</sup>
                     </p>
                  </c>
                  <c ca="center">
                     <p>B&#248; T.H.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>JAVA</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2004</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>EM_gene</it>
                        <sup>1</sup>
                     </p>
                  </c>
                  <c ca="center">
                     <p>B&#248; T.H.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>JAVA</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2004</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>EM_array</it>
                        <sup>1</sup>
                     </p>
                  </c>
                  <c ca="center">
                     <p>B&#248; T.H.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>JAVA</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2004</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>LSI_gene</it>
                        <sup>1</sup>
                     </p>
                  </c>
                  <c ca="center">
                     <p>B&#248; T.H.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>JAVA</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2004</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>LSI_array</it>
                        <sup>1</sup>
                     </p>
                  </c>
                  <c ca="center">
                     <p>B&#248; T.H.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>JAVA</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2004</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>LSI_combined</it>
                        <sup>1</sup>
                     </p>
                  </c>
                  <c ca="center">
                     <p>B&#248; T.H.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>JAVA</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2004</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>LSI_adaptative</it>
                        <sup>1</sup>
                     </p>
                  </c>
                  <c ca="center">
                     <p>B&#248; T.H.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>JAVA</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2004</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Sequential KNN (<it>SkNN</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Kim K.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>R</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2004</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Local Least Square Impute<sup>2 </sup>(<it>LLSI</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Kim H.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>MATLAB</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2005</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Row Average</it>
                        <sup>2</sup>
                     </p>
                  </c>
                  <c ca="center">
                     <p>Kim H.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>MATLAB</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>2005</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Linear model based Imputation (<it>LinImp</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Scheel I</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>R</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2005</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>FAR, Factor Analysis Regression (<it>FAR</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Feten.</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>-</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2005</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Ordinary Least Square Impute (<it>OLSI</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Nguyen D.V.</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>-</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2004</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Support Vector Regression (<it>SVR</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Wang X.</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>C++</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2006</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Gaussian Mixture Clustering (<it>GMC</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Ouyang M.</p>
                  </c>
                  <c ca="center">
                     <p>On demand</p>
                  </c>
                  <c ca="center">
                     <p>MATLAB</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2004</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Singular Value Decomposition (<it>SVD</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Troyanskaya O.</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>C</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2001</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>ghmm</it>
                     </p>
                  </c>
                  <c ca="center">
                     <p>Schielp, A</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2003</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Collateral Missing Value Estimation (<it>CMVE</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Sehgal M.</p>
                  </c>
                  <c ca="center">
                     <p>On demand</p>
                  </c>
                  <c ca="center">
                     <p>MATLAB</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2005</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>GO-based imputation</it>
                     </p>
                  </c>
                  <c ca="center">
                     <p>Tuikkala</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>-</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2005</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>LinCmb</it>
                     </p>
                  </c>
                  <c ca="center">
                     <p>J&#246;rnsten, R</p>
                  </c>
                  <c ca="center">
                     <p>On demand</p>
                  </c>
                  <c ca="center">
                     <p>MATLAB</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2005</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Integrative Missing value Estimation <it>(iMISS)</it></p>
                  </c>
                  <c ca="center">
                     <p>Hu, J</p>
                  </c>
                  <c ca="center">
                     <p>Y</p>
                  </c>
                  <c ca="center">
                     <p>C++</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2006</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Projection Onto convex sets (<it>POCS</it>)</p>
                  </c>
                  <c ca="center">
                     <p>Gan, X</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>-</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2006</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Iterative kNN</it>
                     </p>
                  </c>
                  <c ca="center">
                     <p>Bras</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>-</p>
                  </c>
                  <c ca="center">
                     <p>N</p>
                  </c>
                  <c ca="center">
                     <p>2007</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p> Is given the name of the methods, the authors, its availability, if we have used it (Y) or not (N) and the publication year.</p>
               <p><sup>1 </sup><it>Package </it>B&#248; T.H.</p>
               <p><sup>2 </sup><it>Package </it>Kim H.</p>
            </tblfn>
         </tbl>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>General principle</p>
            </st>
            <p>Figure <figr fid="F1">1</figr> shows the general principle of the analysis. From the initial gene expression datasets, the series of observations with missing values are eliminated to create a <it>Reference matrix</it>. Then <it>simulated </it>missing values are generated for a fixed &#964; percentage and are included in the <it>Reference matrix</it>. In a second step, these <it>simulated </it>missing values are imputed using the different available methods. Difference between the replaced values and the original true values is finally evaluated using the root mean square error (<it>RMSE</it>) (see Methods). In this work, we chose 5 microarray datasets, very different one from the other, <it>i.e</it>., coming from yeasts and human cells, and with or without kinetics (see Table <tblr tid="T2">2</tblr>). The idea was to have the broadest possible vision types of expression data [see Additional file <supplr sid="S1">1</supplr> for more details <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr></abbrgrp>].</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Principle of the method</p>
               </caption>
               <text>
                  <p><b>Principle of the method</b>. The initial data matrix is analyzed. Each gene associated to at least one missing value (<it>in pink</it>) is excluded given a <it>Reference matrix </it>without any missing value. Then missing values are simulated (<it>in red</it>) with a fixed rate &#964;. This rate &#964; goes from 0.5% to 50% of missing values by step of 0.5%. 100 independent simulations are done each time. Missing values are then imputed (<it>in blue</it>) for each simulations by the selected methods. <it>RMSE </it>is computed between the estimated values of missing values and their true values.</p>
               </text>
               <graphic file="1471-2164-11-15-1"/>
            </fig>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>The different datasets used</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>Ogawa et <it>al</it>., 2000</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>Gasch et <it>al</it>., 2000</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Bohen S.P et <it>al</it>., 2002</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Lelandais <it>et al</it>., 2005</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>Organism</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>
                              <it>Saccharomyces cerevisiae</it>
                           </b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>
                              <it>Saccharomyces cerevisiae</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>human</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>Saccharomyces cerevisiae</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Initial gene number</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>6013</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>6153</p>
                     </c>
                     <c ca="center">
                        <p>16523</p>
                     </c>
                     <c ca="center">
                        <p>5261</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Initial number of conditions</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>8</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>178</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Missing values (%)</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>0.8</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>7.6</p>
                     </c>
                     <c ca="center">
                        <p>11.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Genes with missing values (%)</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>3,8</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>87,7</p>
                     </c>
                     <c ca="center">
                        <p>63,6</p>
                     </c>
                     <c ca="center">
                        <p>88.29</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Genes erased from the study</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>230</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>NA</p>
                     </c>
                     <c ca="center">
                        <p>NA</p>
                     </c>
                     <c ca="center">
                        <p>616</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Conditions erased from the study</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>0</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>136</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Ogawa_Complet (OC)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Ogawa_subset (OS)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Gasch HEAT (GHeat)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Gasch H2O2 (GH<sub>2</sub>O<sub>2</sub>)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Bohen (B)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Lelandais (L)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Kinetics</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>Y</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>Y</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Final gene numbers</p>
                     </c>
                     <c ca="center">
                        <p>5783</p>
                     </c>
                     <c ca="center">
                        <p>827</p>
                     </c>
                     <c ca="center">
                        <p>523</p>
                     </c>
                     <c ca="center">
                        <p>717</p>
                     </c>
                     <c ca="center">
                        <p>861</p>
                     </c>
                     <c ca="center">
                        <p>4645</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Final condition number</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Dataset details</b>.</p>
               </text>
               <file name="1471-2164-11-15-S1.DOC">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Our goals were also (i) to evaluate methods that experimental scientists could use without intervention, (ii) to select only published methods, and (iii) to analyse influence of the gene clusters. Indeed, some studies have been done to compare numerous methods, <it>e.g</it>., <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>, but does not go through the clustering; while less frequent researches goes through the clustering, but test only a limited number of imputation methods as <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. We so have searched all kinds of published imputation methods with available dedicated softwares or codes, whenever the Operating System, language or software. From this search, we selected 12 available replacement methods, which were compatible with high-throughput computation. Others methods had not been used due to the unavailability of the program despite the indication in the corresponding papers or to impossibility to modify the source code to used our microarrays data.</p>
         </sec>
         <sec>
            <st>
               <p>Error rate for each replacement method</p>
            </st>
            <p>Figure <figr fid="F2">2</figr> shows the dispersion of expected and true values, for three given imputation methods. On one hand, <it>kNN </it>and <it>EM_gene </it>approaches exhibit a high dispersion between expected and true values; the correlations <it>R </it>equal respectively 0.33 and 0.32 (see Figures <figr fid="F2">2a</figr> and <figr fid="F2">2b</figr>). On the other hand <it>EM_array </it>approach presents a highly better agreement with a <it>R </it>value of 0.97 (see Figure <figr fid="F2">2c</figr>). Figure <figr fid="F3">3</figr> shows the evolution of RMSE values for &#964; ranging between 0.5 and 50% using the two datasets G<sub>Heat </sub>and OS. These two examples are good illustrations of the different behaviours observed with the different replacement methods. Some have initial high RMSE values and remains quite consistent, while others have lower initial RMSE values but are very sensitive to an increased rate of MVs. Moreover, performances for the different methods appeared to be dependant of the used dataset.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Example of three methods</p>
               </caption>
               <text>
                  <p><b>Example of three methods</b>. Distribution of predicted values (y-axis) in regards to true values (x-axis). Estimation of the missing values has been done (a) by <it>kNN </it>approach, (b) <it>EM_gene </it>and (c) <it>EM_array</it>. The dataset used is the Bohen set with &#964; values ranging from 0.5% to 50% of missing values with a step of 0.5. 10 independent simulations have been done for each &#964; value.</p>
               </text>
               <graphic file="1471-2164-11-15-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Missing value imputation</p>
               </caption>
               <text>
                  <p><b>Missing value imputation</b>. RMSE value for (a) G<sub>Heat </sub>subset and (b) for OS for rate of missing value going from 0.5% to 50% by step of 0.5%. (b) 100 independent simulations are done at each level.</p>
               </text>
               <graphic file="1471-2164-11-15-3"/>
            </fig>
            <p indent="1">&#8226; <it><ul>EM_gene</ul></it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp>: This method is always associated to very high RMSE values, which range in an interval from 0.6 to 0.7 for a rate &#964; ranging from 0.5 to 3.0% (see Figure <figr fid="F3">3b</figr>) and decrease for values from 0.30 to 0.40. Such a curved profile is observed for the datasets OS and GH<sub>2</sub>O<sub>2 </sub>(see Figure <figr fid="F3">3a</figr>). For the other dataset, RMSE increases as expected (see Figure <figr fid="F3">3a</figr>), but is always associated to high RMSE values.</p>
            <p indent="1">&#8226; <it><ul>kNN</ul></it><abbrgrp><abbr bid="B27">27</abbr></abbrgrp>: Its RMSE values for all six data files always range between 0.3 and 0.4. The increase of &#964; only affects slightly the <it>kNN </it>approximation, at most 0.05 for the datasets B and OS. This constancy of RMSE values implies that for high rates of missing data (more than 20% of missing data) the RMSE values remain acceptable.</p>
            <p indent="1">&#8226; <it><ul>SkNN</ul></it><abbrgrp><abbr bid="B28">28</abbr></abbrgrp>: Despite the fact that <it>SkNN </it>is an improvement of <it>kNN</it>, their RMSE values are surprisingly always higher than the one of <it>kNN </it>(from 0.01 to 0.08). Only with the dataset B, <it>SkNN </it>performs slightly better than <it>kNN </it>(RMSE difference of 0.076).</p>
            <p indent="1">&#8226; <it><ul>LLSI</ul></it><abbrgrp><abbr bid="B57">57</abbr></abbrgrp>: The average RMSE values of <it>LLSI </it>ranges mainly from 0.34 to 0.41 for most of the dataset. Its performance could be considered as median and its effectiveness is close to the <it>LSI_gene </it>method. Its RMSE values increase gradually with the increase of &#964;, <it>i.e</it>., 0.1 from 0.5 to 50% of missing data. It is the less efficient method based on least square regressions. However for the dataset L, this method is the most powerful after the <it>LSIs </it>methods (see below).</p>
            <p indent="1">&#8226; <it><ul>LSI_gene</ul></it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp>: The effectiveness of <it>LSI_gene </it>is slightly affected by the increase in the percentage of missing data. For each data file, the values of RMSE range between 0.3 and 0.4. These results are close to those observed for methods <it>LLSI </it>and <it>kNN</it>, <it>i.e</it>., methods giving of the medium results ranging between the best (<it>LSI_array</it>) and the less efficient methods (<it>EM_gene</it>).</p>
            <p indent="1">&#8226; <it><ul>Row Mean</ul></it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp> and <it>Row Average </it><abbrgrp><abbr bid="B57">57</abbr></abbrgrp>: Low RMSE values are observed for L (0.23) and B (0.28) datasets. Only for dataset GHeat, the RMSE value is high (0.54). Strikingly this method shows equivalent and or better results than more elaborated approaches.</p>
            <p indent="1">&#8226; <it><ul>BPCA</ul></it><abbrgrp><abbr bid="B30">30</abbr></abbrgrp>: For the OC, OS and GH<sub>2</sub>O<sub>2 </sub>datasets, and for &#964; comprises in the range 0.5 to 10-15% of missing data, <it>BPCA </it>appears to have one of the lowest RMSE values [see Additional file <supplr sid="S2">2</supplr>], only bypass by two other approaches. This method is powerful for low rates of missing values. However it should be noted that the efficiency of <it>BPCA </it>is strongly reduced when the rate of missing data increases. This is particularly notable in the case of the GHeat dataset. The values of the RMSE increases from 0.2 to 1.1 (see Figure <figr fid="F3">3a</figr>). For a &#964; value higher than 30%, <it>BPCA </it>performs worst than most of the imputation methods. This observation is less striking for the other datasets. For B and OS datasets, RMSE values increase by a maximum of 0.1 for &#964; increasing from 0.5 to 50%. It is a good illustration of the dataset specificity related to the quality of the imputation methods.</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p><b>RMSE of OS with BPCA imputing method</b>. RMSE value for OS for rate of missing value going from 0.5% to 20% by step of 0.5% with the L dataset.</p>
               </text>
               <file name="1471-2164-11-15-S2.DOC">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p indent="1">&#8226; <ul><it>LSI_array</it>, <it>LSI_combined</it>, <it>LSI_adaptative </it>and <it>EM_array</it></ul><abbrgrp><abbr bid="B29">29</abbr></abbrgrp>: Their RMSE values are always lower than 0.1. Remarkably, it is true even for a rate of missing data that equals 50%. The average RMSE values of <it>EM_array </it>are slightly lower than the ones of the three other methods. It is striking when the rate of missing data exceeds 20%. A pair-wise comparison shows that <it>EM_array </it>is better than the three other methods; its approximation is better in 2/3 of the case. If &#964; is higher than of 33%, this method remains the best one in 80% of the cases (see Table <tblr tid="T3">3</tblr> for two examples).</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Pairwise comparison of imputation method.</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="left">
                        <p>(a)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>kNN</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>BPCA</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>Row Mean</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>EM_gene</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>EM_array</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>LSI_gene</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>LSI_array</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>LSI_combined</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>LSI_adaptative</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>SkNN</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>kNN</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>23.47</p>
                     </c>
                     <c ca="right">
                        <p>47.65</p>
                     </c>
                     <c ca="right">
                        <p>60.82</p>
                     </c>
                     <c ca="right">
                        <p>4.59</p>
                     </c>
                     <c ca="right">
                        <p>38.06</p>
                     </c>
                     <c ca="right">
                        <p>5.00</p>
                     </c>
                     <c ca="right">
                        <p>5.41</p>
                     </c>
                     <c ca="right">
                        <p>7.25</p>
                     </c>
                     <c ca="right">
                        <p>47.14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>BPCA</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>75.41</p>
                     </c>
                     <c ca="right">
                        <p>81.33</p>
                     </c>
                     <c ca="right">
                        <p>11.12</p>
                     </c>
                     <c ca="right">
                        <p>67.04</p>
                     </c>
                     <c ca="right">
                        <p>12.76</p>
                     </c>
                     <c ca="right">
                        <p>14.49</p>
                     </c>
                     <c ca="right">
                        <p>16.63</p>
                     </c>
                     <c ca="right">
                        <p>75.51</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Row Mean</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>64.69</p>
                     </c>
                     <c ca="right">
                        <p>4.49</p>
                     </c>
                     <c ca="right">
                        <p>40.82</p>
                     </c>
                     <c ca="right">
                        <p>5.10</p>
                     </c>
                     <c ca="right">
                        <p>5.71</p>
                     </c>
                     <c ca="right">
                        <p>6.12</p>
                     </c>
                     <c ca="right">
                        <p>52.45</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>EM_gene</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>3.67</p>
                     </c>
                     <c ca="right">
                        <p>29.49</p>
                     </c>
                     <c ca="right">
                        <p>4.08</p>
                     </c>
                     <c ca="right">
                        <p>4.39</p>
                     </c>
                     <c ca="right">
                        <p>5.31</p>
                     </c>
                     <c ca="right">
                        <p>37.04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>EM_array</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>92.45</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>60.04</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>63.89</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>63.36</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>95.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>LSI_gene</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>7.86</p>
                     </c>
                     <c ca="right">
                        <p>7.65</p>
                     </c>
                     <c ca="right">
                        <p>7.45</p>
                     </c>
                     <c ca="right">
                        <p>61.53</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>LSI_array</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>37.24</p>
                     </c>
                     <c ca="right">
                        <p>38.27</p>
                     </c>
                     <c ca="right">
                        <p>94.79</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>LSI_combined</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>44.39</p>
                     </c>
                     <c ca="right">
                        <p>93.78</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>LSI_adaptative</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>92.96</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>SkNN</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(b)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>kNN</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>BPCA</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>Row Mean</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>EM_gene</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>EM_array</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>LSI_gene</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>LSI_array</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>LSI_combined</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>LSI_adaptative</it>
                           </b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>
                              <it>SkNN</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>kNN</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>42.59</p>
                     </c>
                     <c ca="right">
                        <p>44.02</p>
                     </c>
                     <c ca="right">
                        <p>55.90</p>
                     </c>
                     <c ca="right">
                        <p>6.32</p>
                     </c>
                     <c ca="right">
                        <p>45.45</p>
                     </c>
                     <c ca="right">
                        <p>18.74</p>
                     </c>
                     <c ca="right">
                        <p>18.74</p>
                     </c>
                     <c ca="right">
                        <p>18.74</p>
                     </c>
                     <c ca="right">
                        <p>50.09</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>BPCA</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>52.02</p>
                     </c>
                     <c ca="right">
                        <p>63.49</p>
                     </c>
                     <c ca="right">
                        <p>7.84</p>
                     </c>
                     <c ca="right">
                        <p>53.04</p>
                     </c>
                     <c ca="right">
                        <p>23.37</p>
                     </c>
                     <c ca="right">
                        <p>23.37</p>
                     </c>
                     <c ca="right">
                        <p>23.37</p>
                     </c>
                     <c ca="right">
                        <p>58.03</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Row Mean</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>62.18</p>
                     </c>
                     <c ca="right">
                        <p>6.69</p>
                     </c>
                     <c ca="right">
                        <p>24.88</p>
                     </c>
                     <c ca="right">
                        <p>14.01</p>
                     </c>
                     <c ca="right">
                        <p>14.01</p>
                     </c>
                     <c ca="right">
                        <p>14.01</p>
                     </c>
                     <c ca="right">
                        <p>56.58</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>EM_gene</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>5.06</p>
                     </c>
                     <c ca="right">
                        <p>39.27</p>
                     </c>
                     <c ca="right">
                        <p>15.67</p>
                     </c>
                     <c ca="right">
                        <p>15.67</p>
                     </c>
                     <c ca="right">
                        <p>15.67</p>
                     </c>
                     <c ca="right">
                        <p>44.54</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>EM_array</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>92.97</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>79.65</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>79.65</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>79.65</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>93.61</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>LSI_gene</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>14.85</p>
                     </c>
                     <c ca="right">
                        <p>14.85</p>
                     </c>
                     <c ca="right">
                        <p>14.85</p>
                     </c>
                     <c ca="right">
                        <p>55.52</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>LSI_array</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>39.24</p>
                     </c>
                     <c ca="right">
                        <p>43.29</p>
                     </c>
                     <c ca="right">
                        <p>81.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>LSI_combined</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>46.49</p>
                     </c>
                     <c ca="right">
                        <p>81.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>LSI_adaptative</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>81.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>SkNN</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                     <c ca="right">
                        <p>-----</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p> Is given the percentage of better approximation of one method versus another for a rate of missing value t equal to (a) 32% and (b) 48.5% with the OS dataset. The percentage is given in regards to the method given at the left.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>The different datasets influence the quality of the imputation</p>
            </st>
            <p>Table <tblr tid="T4">4</tblr> shows the average RMSE values for each imputation methods. They are given as the average of all the simulations ranging from &#964; = 0.5 to 50% (50,000 independent simulations per imputation method). This table highlights the differences that were observed between the datasets. Nonetheless, it allowed us to rank the methods in term of efficiency. Roughly, we could identify three groups: The first one comprise four methods (<it>EM_array</it>, <it>LSI_array</it>, <it>LSI_combined </it>and <it>LSI_adaptative</it>) for which small RMSE values were always observed (<it>EM_array </it>always exhibited the best performances); (2) the second group comprised 4 methods, <it>i.e., BPCA</it>, <it>Row Mean</it>, <it>LSI_gene </it>and <it>LLSI</it>; (3) and finally the third group, which can be considered as the last group, comprised three methods, <it>i.e</it>.,<it> kNN</it>, <it>SkNN </it>and <it>EM_gene</it>.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Mean RMSE value for the different datasets</p>
               </caption>
               <tblbdy cols="12">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="9" ca="center">
                        <p>
                           <b>methods</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>mean</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>EM_gene</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>SkNN</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>kNN</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>LLSI</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>LSI_gene</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>Row Mean</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>BPCA</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>LSI_array</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>EM_array</it>
                           </b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="12">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>datasets</p>
                     </c>
                     <c ca="center">
                        <p>B</p>
                     </c>
                     <c ca="center">
                        <p>0.334</p>
                     </c>
                     <c ca="center">
                        <p>0.390</p>
                     </c>
                     <c ca="center">
                        <p>0.455</p>
                     </c>
                     <c ca="center">
                        <p>0.344</p>
                     </c>
                     <c ca="center">
                        <p>0.320</p>
                     </c>
                     <c ca="center">
                        <p>0.283</p>
                     </c>
                     <c ca="center">
                        <p>0.194</p>
                     </c>
                     <c ca="center">
                        <p>0.098</p>
                     </c>
                     <c ca="center">
                        <p>0.053</p>
                     </c>
                     <c ca="center">
                        <p>0.275</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>GH<sub>2</sub>O<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>0.586</p>
                     </c>
                     <c ca="center">
                        <p>0.445</p>
                     </c>
                     <c ca="center">
                        <p>0.431</p>
                     </c>
                     <c ca="center">
                        <p>0.452</p>
                     </c>
                     <c ca="center">
                        <p>0.358</p>
                     </c>
                     <c ca="center">
                        <p>0.319</p>
                     </c>
                     <c ca="center">
                        <p>0.334</p>
                     </c>
                     <c ca="center">
                        <p>0.068</p>
                     </c>
                     <c ca="center">
                        <p>0.028</p>
                     </c>
                     <c ca="center">
                        <p>0.336</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>OS</p>
                     </c>
                     <c ca="center">
                        <p>0.444</p>
                     </c>
                     <c ca="center">
                        <p>0.369</p>
                     </c>
                     <c ca="center">
                        <p>0.383</p>
                     </c>
                     <c ca="center">
                        <p>0.379</p>
                     </c>
                     <c ca="center">
                        <p>0.377</p>
                     </c>
                     <c ca="center">
                        <p>0.263</p>
                     </c>
                     <c ca="center">
                        <p>0.257</p>
                     </c>
                     <c ca="center">
                        <p>0.077</p>
                     </c>
                     <c ca="center">
                        <p>0.036</p>
                     </c>
                     <c ca="center">
                        <p>0.287</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>L</p>
                     </c>
                     <c ca="center">
                        <p>0.388</p>
                     </c>
                     <c ca="center">
                        <p>0.292</p>
                     </c>
                     <c ca="center">
                        <p>0.300</p>
                     </c>
                     <c ca="center">
                        <p>0.078</p>
                     </c>
                     <c ca="center">
                        <p>0.261</p>
                     </c>
                     <c ca="center">
                        <p>0.215</p>
                     </c>
                     <c ca="center">
                        <p>0.250</p>
                     </c>
                     <c ca="center">
                        <p>0.028</p>
                     </c>
                     <c ca="center">
                        <p>0.020</p>
                     </c>
                     <c ca="center">
                        <p>0.204</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>GHeat</p>
                     </c>
                     <c ca="center">
                        <p>0.703</p>
                     </c>
                     <c ca="center">
                        <p>0.426</p>
                     </c>
                     <c ca="center">
                        <p>0.350</p>
                     </c>
                     <c ca="center">
                        <p>0.412</p>
                     </c>
                     <c ca="center">
                        <p>0.403</p>
                     </c>
                     <c ca="center">
                        <p>0.541</p>
                     </c>
                     <c ca="center">
                        <p>0.690</p>
                     </c>
                     <c ca="center">
                        <p>0.091</p>
                     </c>
                     <c ca="center">
                        <p>0.054</p>
                     </c>
                     <c ca="center">
                        <p>0.408</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>mean</p>
                     </c>
                     <c ca="center">
                        <p>0.491</p>
                     </c>
                     <c ca="center">
                        <p>0.384</p>
                     </c>
                     <c ca="center">
                        <p>0.384</p>
                     </c>
                     <c ca="center">
                        <p>0.333</p>
                     </c>
                     <c ca="center">
                        <p>0.344</p>
                     </c>
                     <c ca="center">
                        <p>0.324</p>
                     </c>
                     <c ca="center">
                        <p>0.345</p>
                     </c>
                     <c ca="center">
                        <p>0.072</p>
                     </c>
                     <c ca="center">
                        <p>0.038</p>
                     </c>
                     <c ca="center">
                        <p>0.302</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Notably, this order depends on the dataset, but still the changes are often limited. For instance, <it>EM_gene </it>performs better than <it>kNN </it>and <it>SkNN </it>for B dataset, but does not perform better than the others. Strong changes could be noted for OS that allows <it>SkNN </it>to be better than <it>LLSI </it>and <it>LSI</it>_<it>gene</it>. Nonetheless, it is mainly due to the poor quality of the estimation of these two methods with this dataset. For the L dataset, we observed that <it>LLSI </it>method performs well and remains better than other <it>LSIs </it>and <it>EM_array </it>methods. GHeat dataset that is associated to the highest average RMSE values has strong particularities as (i) <it>kNN </it>performs better than <it>BPCA</it>, <it>Row Mean</it>, <it>LSI_gene </it>and <it>LLSI</it>, and (ii) <it>BPCA </it>and <it>Row Mean </it>performs poorly compared to other methods, being only slightly better than <it>EM_gene</it>. Hence, it appears that GHeat is a more difficult dataset to impute.</p>
         </sec>
         <sec>
            <st>
               <p>Extreme values</p>
            </st>
            <p>The same methodology was followed to analyze the extreme values, <it>i.e</it>., 1% of the microarray measurements with the highest absolute values. They have major biological key role as they represent the highest variations in regards to the expression reference [see Additional file <supplr sid="S3">3</supplr>]. Figure <figr fid="F4">4</figr> presents similar examples to these of Figure <figr fid="F3">3</figr>, but this time, only extreme values were used in the analysis. Thus, the percentage of missing values &#964; can be differently apprehend, <it>i.e</it>., &#964; = 10% corresponds to 10% of the extreme missing values, so 0.1% of the values of the dataset. At one exception, all the replacement methods decrease in effectiveness for the estimate of the extreme values. Performance of the methods also greatly depends on the used dataset and especially -in agreement with previous observation - in the case of the GHeat dataset. A description of the behaviour of each method is presented in Additional file <supplr sid="S3">3</supplr>. <it>kNN </it><abbrgrp><abbr bid="B27">27</abbr></abbrgrp> is the less powerful method in most of the case (see Figures <figr fid="F4">4a</figr> and <figr fid="F4">4b</figr>). Its average RMSE value is often 0.5 higher than the second poorest imputation method. Interestingly, in the case of the extreme values, <it>SkNN </it>improved greatly. <it>EM_gene </it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp> remains one of the less powerful methods for the imputation of missing values. <it>LLSI </it><abbrgrp><abbr bid="B57">57</abbr></abbrgrp> method effectiveness remains similar compared to the other methods of its group. <it>Row Mean </it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp> and <it>Row Average </it><abbrgrp><abbr bid="B57">57</abbr></abbrgrp> have RMSE values increased by 0.2 to 0.4 for the yeast dataset, which is correct in regards to other methods (see Figures <figr fid="F6">6</figr>). Their efficiencies are median compared to the other methods. <it>BPCA </it><abbrgrp><abbr bid="B30">30</abbr></abbrgrp> has a correct behaviour. But contrary to most of them, it is very sensitive to the datasets. <it>LSI_gene </it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp> has the lowest RMSE values observed after <it>EM_array</it>, <it>LSI_array</it>, <it>LSI_combined </it>and <it>LSI_adaptative</it>. This result shows that <it>LSIs</it>, whatever the specificity of their implementations, are effective to impute the values missing.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Extreme values (representing 1% of the missing values)</p>
               </caption>
               <text>
                  <p><b>Extreme values (representing 1% of the missing values)</b>. Evolution of RMSE according to &#964; ranging (a) from 0.5% to 30% of the extreme values for the Bohen dataset and (b) from 0.5% to 50% of the extreme values) for the Ogawa dataset.</p>
               </text>
               <graphic file="1471-2164-11-15-4"/>
            </fig>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p><b>Extreme values</b>. Distribution of the values observed in OS dataset. The extreme values are highlighted on each size of the histogram.</p>
               </text>
               <file name="1471-2164-11-15-S3.DOC">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p><it>EM_array </it>method is again the most performing method (see previous section). Its RMSE values are almost identical to the ones previously computed. <it>LSI_array</it>, <it>LSI_combined </it>and <it>LSI_adaptative </it>are slightly less efficient than previously seen. Thus, the clustering we have proposed remains pertinent when only the extreme values are implicated. <it>LSI</it>_<it>array</it>, <it>LSI_combined</it>, <it>LSI_adaptative </it>and <it>EM_array </it>are always good, and the less efficient methods can be associated now to considerable RMSE values. Noticeably, <it>kNN </it>efficiency collapses and the influence of datasets on the imputation quality is sharpened.</p>
         </sec>
         <sec>
            <st>
               <p>Clustering in question</p>
            </st>
            <p>A critical point in the analysis of DNA data is the clustering of genes according to their expression values. Missing values have an important influence on the stability of the gene clusters <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B58">58</abbr></abbrgrp>. Imputations of missing values have been used both to do hierarchical clustering (with seven different algorithms) and <it>k-means </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp> (see Methods).</p>
            <p>Figure <figr fid="F5">5a</figr> shows the Cluster Pair Proportions (<it>CPP</it>, <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> see Methods section) of OS using hierarchical clustering with <it>complete linkage</it>, <it>average linkage</it>, <it>McQuitty </it>and <it>Ward </it>algorithm. <it>CPP </it>values of <it>average linkage </it>ranges between 78 and 68%, those of <it>McQuitty </it>between 58 and 45%, those of <it>Ward </it>between 57 and 35% and finally those of <it>complete linkage </it>between 50 and 41%. We obtain for the 7 hierarchical clustering algorithms the same behaviours than previously observed <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>: ranging from high <it>CPP </it>values for <it>single linkage </it>to low <it>CPP </it>values for <it>Ward</it>. This observation can be explained by the topology given by each algorithm, <it>e.g</it>., <it>Ward </it>gives well equilibrated clusters whereas <it>single linkage </it>creates few major clusters and numerous adjacent singletons.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p><it>CPP </it>of hierarchical clustering approach algorithm</p>
               </caption>
               <text>
                  <p><b><it>CPP </it>of hierarchical clustering approach algorithm</b>. (a) with complete, average, ward and McQuitty algorithm for OS with <it>kNN </it>and (b) with Ward algorithm for Ogawa dataset for the different imputation methods.</p>
               </text>
               <graphic file="1471-2164-11-15-5"/>
            </fig>
            <p>For every hierarchical clustering methods the <it>CPP </it>values are different, but the general tendencies remain the same: (i) imputation of small rate &#964; of MVs has always a strong impact on the <it>CPP </it>values, and (ii) the <it>CPP </it>values slowly decreased with the increased of &#964;. Between 0.5 and 3% of MVs and the CPP values decrease by 1 to 3% per step of 0.5% of MVs. From &#964; equals 3.5 to 20% of MVs, the values of <it>CPP </it>decrease overall by 10%. For higher rate of MVs the decreasing of CPP is slower. This loss of stability is present in the case of the <it>k-means </it>method and for each type of hierarchical classification (except for the methods <it>single linkage </it>and <it>centroid linkage</it>, due to the building of the clusters).</p>
            <p>Individual evaluation of the methods highlights the lack of efficiency of the <it>EM_gene </it>imputation method; it obtains always the lowest <it>CPP </it>values, <it>i.e</it>., 1.37 to 5.34% less than other approaches. At the opposite, <it>EM_array</it>, <it>LSI_array</it>, <it>LSI_combined </it>and <it>LSI_adaptative </it>are associated to the highest <it>CPP </it>values. In the case of the methods with a median efficiency, <it>e.g</it>., <it>Row_Mean</it>, their <it>CPP </it>values could be assigned as median compared to the values of the other methods. Figure <figr fid="F5">5b</figr> shows the particular example of OS dataset. <it>CPP </it>values of BPCA (average value equals 42.6%) are close to the most powerful methods (42.8% for the four methods). Moreover, in the classical range of &#964; less than 20%, it is the best. As seen in Table <tblr tid="T4">4</tblr>, BPCA is one of the best approaches for this dataset. Hence, common trends can be found between the quality of the imputation method and the gene cluster stability.</p>
            <p>In addition, evaluation of imputation methods shows that the cluster quality depends on the dataset. For instance, with the dataset OS, imputation of missing values with <it>kNN </it>method gives an average <it>CPP </it>value (for the <it>Ward </it>algorithm) that equals 42.9%, while the average <it>CPP </it>values for all the other methods only equals 40.6% whereas its RMSE value is one of highest (see Table <tblr tid="T4">4</tblr>). The <it>CPP </it>differences are mainly bellow 5%. These results show that an improvement has been obtained since last study. Nonetheless, no new approaches had drastically improved the quality of the clustering. Interestingly, <it>k-means </it>approach had similar tendencies, underlining that this low improvement is not due to hierarchical clustering.</p>
            <p>Another question is the comparison between hierarchical clustering algorithms and <it>k-means</it>. Nonetheless, comparison only between hierarchical clustering algorithms is already a difficult task. Comparison with <it>k-means </it>is so more difficult. Indeed, the use of the same number of clusters to compare the hierarchical clustering algorithms with <it>k-means </it>can leads to a wrong conclusion. Indeed, for an equivalent number of clusters, most of the <it>CPP </it>values of <it>k-means </it>are lower than <it>CPP </it>values obtained with hierarchical clustering algorithms. However, it is only due to the dispersion of observations within the clusters obtained by <it>k-means </it>approach. Thus, to have an unbiased comparison, the dispersion of genes within cluster between <it>k-means </it>and hierarchical clustering algorithms must be computed. It had been done, as previously described <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. Following this approach, <it>Ward </it>and <it>complete </it>linkages were defined as the best approaches to assess an unbiased comparison. They have both <it>CPP </it>values lower than <it>k-means CPP </it>values. The differences were often higher than 5% underlining the interest of <it>k-means </it>approach to cluster gene expression profiles.</p>
         </sec>
         <sec>
            <st>
               <p>Distribution of the observations</p>
            </st>
            <p>When index <it>CPP </it>is calculated, only one group is taken into account. To go further, we used another index, named <it>CPP<sub>f </sub></it>that allows to take into account the five closer groups, and to check the pairs of genes remaining joint partners. The values of <it>CPP<sub>f </sub></it>are higher than those of the <it>CPP</it>, <it>e.g</it>., 20% for the Ward. Methods associated to high <it>CPP </it>values have also high <it>CPP<sub>f </sub></it>values, while methods with low <it>CPP </it>values have also a lower <it>CPP<sub>f </sub></it>values. These weak variations shows that often a part of the observations, not associated to the original cluster could be find in its vicinity. These results are entirely in agreement with our previous results <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. It shows here that the novel imputation methods have not permit to get closer related genes with better improvement.</p>
            <p>The analysis of associations could also take into account the non-associations. For this purpose, Clustering Agreement Ratio (<it>CAR</it>, see Methods section) has been used which considers both associated and non-associated genes. <it>CAR </it>values are higher than the one of the <it>CPP </it>due to the calculation of the pairs of genes remaining dissociated. Indeed, it is more probable than the genes are dissociated than associated according to the number of treated genes and the number of generated groups. For the OS dataset, the highest values of the index CAR concerns <it>Ward </it>classification and are ranging between 88.2 and 91.2%. For the GHeat dataset, it ranges between 91.0 and 94.1%. <it>Complete linkage</it>, <it>average linkage </it>and <it>McQuitty </it>have lower CAR values (80%). For <it>k-means </it>classification, the values are higher 1 to 2% compared to <it>Ward </it>classification, 10% better than <it>McQuitty </it>and <it>Complete linkage </it>and 13% to <it>average linkage</it>. This results underlines that <it>K-means </it>allows so a better stability of gene clusters.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Imputation</p>
            </st>
            <p>Since our previous analysis <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, numerous new MVs imputation methods have been proposed. Some appeared to be true improvements in regards to the computation of RMSE. In particular, <it>EM_array </it>is clearly the most efficient methods we tested. For &#964; &lt; 35%, it is the best imputation method for 60% of the values, and for &#964; > 35%, in 80%. This feature was confirmed by the analysis of extreme values. <it>LSI_array</it>, <it>LSI_combined </it>and <it>LSI_adaptative </it>follow closely the efficiency of <it>EM_array</it>. We have unsuccessfully tried to combine these four different methods to improve the RMSE values. No combination performs better than <it>EM_array</it>.</p>
            <p>We can underline four interesting points:</p>
            <p indent="1">i. As expected, the imputation quality is greatly affected by the rate of missing data, but surprisingly it is also related to the kind of data. <it>BPCA </it>is a perfect illustration. For non-kinetic human dataset, MVs estimations were correct, whereas for the GHeat dataset the error rate appeared to be more important.</p>
            <p indent="1">ii. The efficiency of <it>Row_Mean </it>(and <it>Row_Average</it>) is surprisingly good in regards to the simplicity of the methodology used (with the exception of GHeat dataset).</p>
            <p indent="1">iii. Even if <it>kNN </it>is the most popular imputation method; it is one of the less efficient, compared to other methods tested in this study. It is particularly striking when analyzing the extreme values. <it>SkNN </it>is an improvement of <it>kNN </it>method, but we observed that RMSE values of <it>SkNN </it>were not better than ones of <it>kNN</it>. It could be due to the use of non-optimal number of neighbours (<it>k</it>), as for <it>kNN</it>. It must be noticed we used <it>k</it><sub>opt </sub>defined by <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, this choice has a direct impact on the imputation values.</p>
            <p>Extreme values are the ones that are the most valuable for the experiments. The imputation of extreme value missing data shows that -except for <it>EM_array</it>- the effectiveness of all the methods is affected.</p>
            <p>Our results are so in good accordance with the results of Brock and co-workers <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> who found that methods from Bo and co-workers <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, Kim and co-workers <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> and Oba and co-workers <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> are highly competitive. However, they consider "that no method is uniformly superior in all datasets" <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Our results are simpler to summarize as we observe -thanks to our distance criteria- a grading between the effectiveness of the methods. <it>LLSI </it>of Kim and co-workers <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> has a correct behavior for all datasets while <it>BPCA </it>of Oba and co-workers <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> is strongly dependant of the dataset. At the opposite, the methods implemented by Bo and co-workers <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> remain the most efficient in all cases. Moreover, some implemented methods of Bo and co-workers <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> have not been tested by <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>, but are the most efficient. All these results are reinforced by the analyses of extreme value imputations.</p>
            <p>An important point must be not forgotten, we have, as the other authors, <it>e.g</it>., <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>, used the entire dataset, <it>i.e</it>., no specific selection of interesting profile gene had been done. It could have importance in terms of quality of the imputation values and consequence on the clustering.</p>
         </sec>
         <sec>
            <st>
               <p>Clustering</p>
            </st>
            <p>A strong assumption of the microarray data analysis is that genes with similar expression profiles are likely to be co-regulated and thus involved in the same or similar biological processes. Different types of clustering and classification methods have been applied to microarray data, <it>e</it>. <it>g</it>., some classical as <it>k-means </it>clustering <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, self-organizing maps <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B59">59</abbr></abbrgrp>, hierarchical clustering <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B60">60</abbr></abbrgrp>, Self Organizing Tree Algorithm <abbrgrp><abbr bid="B61">61</abbr><abbr bid="B62">62</abbr><abbr bid="B63">63</abbr></abbrgrp>, and some dedicated approaches as DSF_Clust <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>, re-sampling based tight clustering <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>, cluster affinity search technique <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>, multivariate Gaussian mixtures <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>, model-based clustering algorithms <abbrgrp><abbr bid="B68">68</abbr><abbr bid="B69">69</abbr></abbrgrp>, clustering of change patterns using Fourier coefficients <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>, Nearest Neighbor Networks <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>, Fuzzy clustering by local Approximation of membership <abbrgrp><abbr bid="B72">72</abbr></abbrgrp> or Multi-Dimensional Scaling <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>.</p>
            <p>Given one particular dataset, different clustering algorithms are very likely to generate different clusters <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>. This is true when large-scale gene expression data from microarrays are analyzed <abbrgrp><abbr bid="B58">58</abbr><abbr bid="B75">75</abbr><abbr bid="B76">76</abbr></abbrgrp>. Comparison of different clusters even obtained with the same classification approach is still a difficult task [see Additional file <supplr sid="S4">4</supplr><abbrgrp><abbr bid="B69">69</abbr><abbr bid="B77">77</abbr><abbr bid="B78">78</abbr><abbr bid="B79">79</abbr></abbrgrp>]. Thus, to assess the relevance of missing value imputation methods, we observed the behaviours of different hierarchical clustering methods and <it>k-means </it>clustering using <it>CPP</it>, <it>CPP</it><sub>f </sub><abbrgrp><abbr bid="B49">49</abbr></abbrgrp> and newly introduce <it>CAR </it>index. Results follow exactly the observations done on RMSE values (see previous section). Only one method seems ambiguous: <it>kNN</it>. Indeed, its <it>CPP </it>and <it>CPP</it><sub>f </sub>are higher than expected. It is mainly due to the selection of the genes in the different datasets. We have decided at the beginning to not discard any genes, <it>i</it>. <it>e</it>., we have absolute no <it>a priori</it>. Thus very flat profiles have been conserved and empower <it>kNN </it>that prefers to predict values closer to zero than the other methods (see Figure <figr fid="F4">4</figr> of <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>). It generates clusters with lot of zero, these clusters are so stable. For the majority of the methods, the order of effectiveness of the methods for the maintenance of stability within the groups between various classifications is identical. Combination of <it>CPP</it>, <it>CPP</it><sub>f </sub>and <it>CAR </it>index underlines the interest of <it>k-means </it>clustering in regards to hierarchical clustering methods. For comparable clusters, <it>k-means </it>gives better values.</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p><b>Comparing clustering algorithms</b>.</p>
               </text>
               <file name="1471-2164-11-15-S4.DOC">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Wang and co-workers does not found a strong difference between the three imputation methods they used, <it>i.e., kNN</it>, <it>BPCA </it>and <it>LLS</it>, in the classification performance <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The only comparable extensive study has been done by Tuikkala and co-workers <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>, they have focussed interestingly on the GO term class and use <it>k-means</it>. They have tested six different methods with less simulation per missing value rates and less missing value rates. But, the important point is they have not tested the methods found the most efficient by our approach. We also slightly disagree with their conclusion about the quality of BPCA <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. It can be easily understand as only a very limited number of clusters has been tested (5 clusters); in our case, we have supervised the choice of cluster numbers (see Method section), leading to a higher number of clusters. This higher number is so more sensitive to the quality of clustering. It must be noticed we have used Euclidean distance and not Pearson correlation, it was mainly to (i) stay consistent with our previous research, and (ii) as we have not filtered the data, Pearson correlation could have aggregated very different profiles. As the time computation was very important, it was not possible to test the two possibilities.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>The DNA microarrays generate high volume of data. However they have some technical skews. Microarrays studies must take into account the important problem of missing values for the validity of biological results. Numerous methods exist to replace them, but no systematic and drastic comparisons have been performed before our present work. In this study, we have done more than 6.000.000 independent simulations, to assess the quality of these imputation methods. Figure <figr fid="F6">6</figr> summarizes the results of our assessment. The method <it>EM_array</it>, <it>LSI_array</it>, <it>LSI_combined </it>and <it>LSI_adaptative </it>are the most performing methods. <it>BPCA </it>is very effective when the rate of missing values is lower than 15%, <it>i.e</it>., for classical experiments. The values estimated by the <it>Row_Mean </it>are quite correct in regards to the simplicity of the approach. <it>kNN </it>(and <it>SkNN</it>) does not give impressive results, it is an important conclusion for a method used by numerous scientists. The methods <it>LSI_gene </it>and <it>EM_gene </it>are not effective but they are to be tested with data files made up of little of genes and a great number of experiments. These conclusions are to be taken carefully because the quality of the imputations depends on the used datasets.</p>
         <fig id="F6">
            <title>
               <p>Figure 6</p>
            </title>
            <caption>
               <p>Summary of the comparison</p>
            </caption>
            <text>
               <p><b>Summary of the comparison</b>.</p>
            </text>
            <graphic file="1471-2164-11-15-6"/>
         </fig>
         <p>A major disadvantage of numerous methods is their accessibilities. We have tested here only a part of the methods as some are unavailable and others had not worked properly. Some methods used here could not be used easily by a non-specialist. It could be interesting so to have implementation of all the different methods in a useful manner with the standardized input and output file format. In the second time, graphic interfaces for the methods could be helpful. These remarks are particularly relevant in regards to recent papers that proposed novel approaches as SLLSimpute <abbrgrp><abbr bid="B80">80</abbr></abbrgrp> or interesting comparison <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp> that do not compare with the methods that had been considered as the most efficient in this study.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Datasets</p>
            </st>
            <p>We used 5 data sets for the analysis [see Additional file <supplr sid="S1">1</supplr>]; they were mainly coming from the SMD database <abbrgrp><abbr bid="B81">81</abbr></abbrgrp>. The first one, named Ogawa set, was initially composed of <it>N </it>= 6013 genes and <it>n </it>= 8 experimental conditions about the phosphate accumulation and the polyphosphate metabolism of the yeast <it>Saccharomyces cerevisiae </it><abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. The second one corresponds to various environmental stress responses in <it>S. cerevisiae </it><abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. This set, named Gasch set, contains <it>N </it>= 6153 genes and <it>n </it>= 178 experimental conditions. Due to the diversity of conditions in this set, we focused on two experimental subsets corresponding to heat shock and H<sub>2</sub>O<sub>2 </sub>osmotic shock respectively. Bohen and co-workers have analyzed the patterns of gene expression in human follicular lymphomas and the interest of treatment by rituximab <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. This dataset is composed of <it>N </it>= 16.523 genes and <it>n </it>= 16 experimental conditions. The last dataset has been obtained by Lucau-Danila, Lelandais and co-workers <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. To precisely describe the very early genomic response developed by yeast to accommodate a chemical stress, they performed a time course analyses of the yeast gene expression which follows the addition of the antimitotic drug benomyl. The dataset is a kinetic that comprised <it>N </it>= 5.621 genes for <it>n </it>= 6 kinetic time (30 seconds, 2, 4, 10, 20 and 40 minutes).</p>
         </sec>
         <sec>
            <st>
               <p>Datasets refinement: missing values enumeration</p>
            </st>
            <p>From the original datasets, we built complete datasets without MVs. All the genes containing at least one missing value were eliminated from the Ogawa set (noted OS). The resulting OS set contains <it>N </it>= 5783 genes and <it>n </it>= 8 experimental conditions. The second set without MVs was taken from Gasch et al. and called GS. The experimental conditions (column) containing more than 80 MVs were removed. The resulting GS matrix contains <it>N </it>= 5843 genes and n = 42 experimental conditions. Two subsets were generated from GS and have been noted GHeat and GH<sub>2</sub>O<sub>2</sub>. They correspond to specific stress conditions as described previously. GHeat and GH<sub>2</sub>O<sub>2 </sub>contain respectively <it>N </it>= 3643 genes with <it>n </it>= 8 experimental conditions and <it>N </it>= 5007 genes with n = 10 experimental conditions.</p>
            <p>To test the influence of the matrix size, <it>i.e</it>., the number of genes, we built smaller sets corresponding to 1/7 of OS, GS, GHeat and GH<sub>2</sub>O<sub>2</sub>. Principles are described in <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. For the dataset of Bohen et <it>al</it>. (noted B), we have done the same protocol and used a subset representing 1/7 of B, <it>i.e</it>., N = 861 genes. For the dataset of Lucau-Danila <it>et al.: </it><abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, 11.4% of the genes have at least one missing values. The dataset with no missing values (noted L) was so composed of <it>N </it>= 4645 genes.</p>
         </sec>
         <sec>
            <st>
               <p>Missing values generation</p>
            </st>
            <p>From the sets without MVs, we introduced a rate &#964; of genes containing MVs (&#964; = 1 to 50.0%), these MVs are randomly drawn. Each random simulation is generated at least 100 times per experiment to ensure a correct sampling. It must be notices that contrary to our previous work, each gene could contain more than one MV <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Replacement methods</p>
            </st>
            <p>The different packages have been downloaded from the authors' websites (see Table <tblr tid="T1">1</tblr>). <it>kNN </it>has been computed using the well-known <it>KNNimpute </it>developed by Troyanskaya and co-workers <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. The determined <it>k</it><sub>opt </sub>value is associated with a minimal global error rate as defined by Troyanskaya and co-workers <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. <it>BPCA </it>was used without its graphical interface <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> as for the Bo et al. package (Java) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. For <it>LLSI </it>and <it>Row_Average</it>, we have modified the original Matlab code to use our own microarray datasets <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. <it>SkNN </it>was performed with R software <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Hierarchical Clustering</p>
            </st>
            <p>The hierarchical clustering (HC) algorithm allows the construction of a dendogram of nested clusters based on proximity information <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The HC have been performed using the "hclust" package in R software <abbrgrp><abbr bid="B82">82</abbr></abbrgrp>. Seven hierarchical clustering algorithms have been tested: <it>average linkage</it>, <it>complete linkage</it>, <it>median linkage</it>, <it>McQuitty</it>, <it>centroid linkage</it>, <it>single linkage </it>and <it>Ward minimum variance </it><abbrgrp><abbr bid="B83">83</abbr></abbrgrp>.</p>
            <p>The distance matrix between all the vectors (<it>i.e</it>., genes) is calculated by using an external module written in C language. We used the normalized Euclidean distance d* to take account of the MVs:</p>
            <p>
               <display-formula id="M1">
                  <graphic file="1471-2164-11-15-i1.gif"/>
               </display-formula>
            </p>
            <p><it>v </it>and <it>w </it>are two distinct vectors and m is the number of MVs between the two vectors. Thus, (<it>v</it><sub>i </sub>- <it>w</it><sub>i</sub>) is not computed if <it>v</it><sub>i </sub>and/or <it>w</it><sub>i </sub>is a missing value</p>
         </sec>
         <sec>
            <st>
               <p>An index for clustering results comparison: Conserved Pairs Proportion (<it>CPP</it>)</p>
            </st>
            <p>To assess the influence of missing data rates and different replacement methods into clustering results (see Figure <figr fid="F1">1</figr>), we have analysed the co-associated genes of an original dataset (without MVs) compared to these genes location in a set with MVs. A similar approach has been used by Meunier <it>et al</it>. on proteomic data <abbrgrp><abbr bid="B84">84</abbr></abbrgrp>.</p>
            <p>Hence, we realized in a first step the clustering with the data sets without MV by each aggregative clustering algorithm. The results obtained by these first analyses are denoted reference clustering (<it>RC</it>). In a second step, we generated MVs in data. The MVs are replaced by using the different replacement methods. Then we performed the hierarchical clustering for each new set. The results obtained by these second analyses are denoted generated clustering (<it>GC</it>).We compared the resulting clusters defined in <it>RC </it>and <it>GC </it>and assessed the divergence by using an index named Conserved Pair Proportions (<it>CPP</it>). The <it>CPP </it>is the maximal proportion of genes belonging to two clusters, one from the RC and the other one from the GC (cf. Figure <figr fid="F1">1</figr> of <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> and Additional file <supplr sid="S5">5</supplr> for more details).</p>
            <suppl id="S5">
               <title>
                  <p>Additional file 5</p>
               </title>
               <text>
                  <p><b>Details of <it>CPP </it>and <it>CPPf</it></b>.</p>
               </text>
               <file name="1471-2164-11-15-S5.DOC">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Clustering Agreement Ratio (CAR)</p>
            </st>
            <p>The Clustering Agreement Ration (CAR) is the concordance index measuring the proportion of genes pairs, either belonging to a same cluster (resp. different clusters) in the reference clustering (RC) and found again in a same cluster (resp. different clusters) in the clustering (GC) obtained without or after replacing the MVs.</p>
            <p>The index CAR is defined by the following equation:</p>
            <p>
               <display-formula id="M2">
                  <graphic file="1471-2164-11-15-i2.gif"/>
               </display-formula>
            </p>
            <p>where <inline-formula><graphic file="1471-2164-11-15-i3.gif"/></inline-formula> and <inline-formula><graphic file="1471-2164-11-15-i4.gif"/></inline-formula> specify the co-presence of two genes in a same cluster, i.e., they take the value 1 when the genes <it>i </it>and <it>j </it>belong to a same cluster in the clustering <it>RC </it>and <it>GC </it>respectively. The numbers of pairs in <it>G </it>genes is <it>G</it>.(<it>G </it>- 1)/2. The first term of the numerator corresponds to the co-presence of the pair (<it>i</it>, <it>j</it>) in a same cluster for <it>RC </it>and <it>GC</it>, and, the second term the co-absence of this pair in a same cluster.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>MC done all the computational and analysis works. AdB wrote the paper, conceived of the study and carried out the MVs generation. AM and GL participated in the design of the study and coordination. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We would like to thanks all the scientists who have deposited their experiments and make them freely available to the scientific community. In the same way, we would like to thanks all the scientists who have developed and distributed missing value replacement methods. This work was supported by grants from the Minist&#232;re de la Recherche, from French Institute for Health and Medical Research (INSERM), Universit&#233; Paris Diderot - Paris 7, Institut National de Transfusion Sanguine (INTS) and Genopole<sup>&#174;</sup>. Clustering Agreement Ration (CAR) was proposed by late Pr. Serge Hazout.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide</p>
            </title>
            <aug>
               <au>
                  <snm>Liolios</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tavernarakis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kyrpides</snm>
                  <fnm>NC</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <issue>34 Database</issue>
            <fpage>D332</fpage>
            <lpage>334</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkj145</pubid>
                  <pubid idtype="pmcid">1347507</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381880</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Genomes OnLine Database (GOLD): a monitor of genome projects world-wide</p>
            </title>
            <aug>
               <au>
                  <snm>Bernal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ear</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Kyrpides</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>1</issue>
            <fpage>126</fpage>
            <lpage>127</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/29.1.126</pubid>
                  <pubid idtype="pmcid">29859</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125068</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Identification of expressed genes linked to malignancy of human colorectal carcinoma by parametric clustering of quantitative expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Muro</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Takemasa</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Oba</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Matoba</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ueno</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Maruyama</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sekimoto</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yamamoto</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nakamori</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>3</issue>
            <fpage>R21</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2003-4-3-r21</pubid>
                  <pubid idtype="pmcid">153461</pubid>
                  <pubid idtype="pmpid" link="fulltext">12620106</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Molecular portraits of human breast tumours</p>
            </title>
            <aug>
               <au>
                  <snm>Perou</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Sorlie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Rijn</snm>
                  <mnm>van de</mnm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jeffrey</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Rees</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Pollack</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Ross</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Johnsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Akslen</snm>
                  <fnm>LA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>406</volume>
            <issue>6797</issue>
            <fpage>747</fpage>
            <lpage>752</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35021093</pubid>
                  <pubid idtype="pmpid" link="fulltext">10963602</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis</p>
            </title>
            <aug>
               <au>
                  <snm>Statnikov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Aliferis</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Tsamardinos</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Hardin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>5</issue>
            <fpage>631</fpage>
            <lpage>643</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti033</pubid>
                  <pubid idtype="pmpid" link="fulltext">15374862</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Temporal change in mKIAA gene expression during the early stage of retinoic acid-induced neurite outgrowth</p>
            </title>
            <aug>
               <au>
                  <snm>Imai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kawai</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tada</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nagase</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ohara</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Koga</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2005</pubdate>
            <volume>364</volume>
            <fpage>114</fpage>
            <lpage>122</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2005.05.037</pubid>
                  <pubid idtype="pmpid" link="fulltext">16169686</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Incorporating genome-scale tools for studying energy homeostasis</p>
            </title>
            <aug>
               <au>
                  <snm>Raab</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Nutr Metab (Lond)</source>
            <pubdate>2006</pubdate>
            <volume>3</volume>
            <fpage>40</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1743-7075-3-40</pubid>
                  <pubid idtype="pmcid">1636640</pubid>
                  <pubid idtype="pmpid" link="fulltext">17081308</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Systematic interpretation of microarray data using experiment annotations</p>
            </title>
            <aug>
               <au>
                  <snm>Fellenberg</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Busold</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Witt</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Bauer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Beckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hauser</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Frohme</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Winter</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dippon</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hoheisel</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>319</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2164-7-319</pubid>
                  <pubid idtype="pmcid">1774576</pubid>
                  <pubid idtype="pmpid" link="fulltext">17181856</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Microarray technology: beyond transcript profiling and genotype analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Hoheisel</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>3</issue>
            <fpage>200</fpage>
            <lpage>210</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1809</pubid>
                  <pubid idtype="pmpid" link="fulltext">16485019</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Exploring the metabolic and genetic control of gene expression on a genomic scale</p>
            </title>
            <aug>
               <au>
                  <snm>DeRisi</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Iyer</snm>
                  <fnm>VR</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>278</volume>
            <issue>5338</issue>
            <fpage>680</fpage>
            <lpage>686</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.278.5338.680</pubid>
                  <pubid idtype="pmpid" link="fulltext">9381177</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Microarray analysis of the transcriptome as a stepping stone towards understanding biological systems: practical considerations and perspectives</p>
            </title>
            <aug>
               <au>
                  <snm>Clarke</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Plant J</source>
            <pubdate>2006</pubdate>
            <volume>45</volume>
            <issue>4</issue>
            <fpage>630</fpage>
            <lpage>650</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-313X.2006.02668.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16441353</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Rekaya</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bertrand</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>3</issue>
            <fpage>317</fpage>
            <lpage>325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti738</pubid>
                  <pubid idtype="pmpid" link="fulltext">16267079</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling</p>
            </title>
            <aug>
               <au>
                  <snm>Alizadeh</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lossos</snm>
                  <fnm>IS</fnm>
               </au>
               <au>
                  <snm>Rosenwald</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Boldrick</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Sabet</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tran</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>X</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <issue>6769</issue>
            <fpage>503</fpage>
            <lpage>511</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35000501</pubid>
                  <pubid idtype="pmpid" link="fulltext">10676951</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Analysis of microarray gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Pham</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wells</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Crane</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Current Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>1</volume>
            <issue>1</issue>
            <fpage>37</fpage>
            <lpage>53</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2174/157489306775330642</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Gene expression profile classification: A review</p>
            </title>
            <aug>
               <au>
                  <snm>Asyali</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Colak</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Demirkaya</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Inan</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>Current Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>1</volume>
            <issue>1</issue>
            <fpage>55</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2174/157489306775330615</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Prediction of clinical outcome using gene expression profiling and artificial neural networks for patients with neuroblastoma</p>
            </title>
            <aug>
               <au>
                  <snm>Wei</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Greer</snm>
                  <fnm>BT</fnm>
               </au>
               <au>
                  <snm>Westermann</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Steinberg</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Son</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>QR</fnm>
               </au>
               <au>
                  <snm>Whiteford</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Bilke</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Krasnoselsky</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Cenacchi</snm>
                  <fnm>N</fnm>
               </au>
               <etal/>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2004</pubdate>
            <volume>64</volume>
            <issue>19</issue>
            <fpage>6883</fpage>
            <lpage>6891</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1158/0008-5472.CAN-04-0695</pubid>
                  <pubid idtype="pmcid">1298184</pubid>
                  <pubid idtype="pmpid" link="fulltext">15466177</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Interactive gene clustering - a case study of breast cancer microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Gruzdz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ihnatowicz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Slezak</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Inf Syst Front</source>
            <pubdate>2006</pubdate>
            <volume>8</volume>
            <fpage>21</fpage>
            <lpage>27</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/s10796-005-6100-x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Normalization strategies for cDNA microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Schuchhardt</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Beule</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Malik</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wolski</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Eickhoff</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lehrach</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Herzel</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <issue>10</issue>
            <fpage>E47</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/28.10.e47</pubid>
                  <pubid idtype="pmcid">105386</pubid>
                  <pubid idtype="pmpid" link="fulltext">10773095</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Cluster Analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Everitt</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <publisher>Heinemann Educ</publisher>
            <pubdate>1974</pubdate>
         </bibl>
         <bibl id="B20">
            <title>
               <p>k-means</p>
            </title>
            <aug>
               <au>
                  <snm>Hartigan</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Applied Statistics</source>
            <pubdate>1979</pubdate>
            <volume>28</volume>
            <fpage>100</fpage>
            <lpage>115</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2346830</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Self-organized formation of topologically correct feature maps</p>
            </title>
            <aug>
               <au>
                  <snm>Kohonen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Biol Cybern</source>
            <pubdate>1982</pubdate>
            <volume>43</volume>
            <fpage>59</fpage>
            <lpage>69</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/BF00337288</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Self-Organizing Maps</p>
            </title>
            <aug>
               <au>
                  <snm>Kohonen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <publisher>Springer</publisher>
            <edition>3</edition>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Multivariate Analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Mardia</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bibby</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <publisher>Academic Press</publisher>
            <pubdate>1979</pubdate>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lv</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Guo</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rao</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>23</issue>
            <fpage>2883</fpage>
            <lpage>2889</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl339</pubid>
                  <pubid idtype="pmpid" link="fulltext">16809389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Gene Expression Clustering: Dealing with the Missing Values</p>
            </title>
            <aug>
               <au>
                  <snm>Gru&#380;d&#378;</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ihnatowicz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>&#346;l&#281;zak</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Intelligent Information Processing and Web Mining</source>
            <pubdate>2005</pubdate>
            <fpage>521</fpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Discriminatory analysis, nonparametric discrimination: Consistency properties</p>
            </title>
            <aug>
               <au>
                  <snm>Fix</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hodges</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Technical Report 4, USAF School of Aviation Medicine</source>
            <publisher>Randolph Field, Texas</publisher>
            <pubdate>1951</pubdate>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Missing value estimation methods for DNA microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Troyanskaya</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Cantor</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>6</issue>
            <fpage>520</fpage>
            <lpage>525</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.6.520</pubid>
                  <pubid idtype="pmpid" link="fulltext">11395428</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Reuse of imputed data in microarray analysis increases imputation efficiency</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Yi</snm>
                  <fnm>GS</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>160</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-5-160</pubid>
                  <pubid idtype="pmcid">528735</pubid>
                  <pubid idtype="pmpid" link="fulltext">15504240</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>LSimpute: accurate estimation of missing values in microarray data with least squares methods</p>
            </title>
            <aug>
               <au>
                  <snm>Bo</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Dysvik</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Jonassen</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>3</issue>
            <fpage>e34</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gnh026</pubid>
                  <pubid idtype="pmcid">374359</pubid>
                  <pubid idtype="pmpid" link="fulltext">14978222</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>A Bayesian missing value estimation method for gene expression profile data</p>
            </title>
            <aug>
               <au>
                  <snm>Oba</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sato</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Takemasa</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Monden</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Matsubara</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ishii</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>16</issue>
            <fpage>2088</fpage>
            <lpage>2096</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg287</pubid>
                  <pubid idtype="pmpid" link="fulltext">14594714</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Continuous representations of time-series gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Bar-Joseph</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gerber</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Gifford</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Jaakkola</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <issue>3-4</issue>
            <fpage>341</fpage>
            <lpage>356</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270360688057</pubid>
                  <pubid idtype="pmpid" link="fulltext">12935332</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Using hidden Markov models to analyze gene expression time course data</p>
            </title>
            <aug>
               <au>
                  <snm>Schliep</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schonhuth</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Steinhoff</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>Suppl 1</issue>
            <fpage>i255</fpage>
            <lpage>263</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg1036</pubid>
                  <pubid idtype="pmpid" link="fulltext">12855468</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Improving missing value estimation in microarray data with gene ontology</p>
            </title>
            <aug>
               <au>
                  <snm>Tuikkala</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Elo</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Nevalainen</snm>
                  <fnm>OS</fnm>
               </au>
               <au>
                  <snm>Aittokallio</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>5</issue>
            <fpage>566</fpage>
            <lpage>572</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btk019</pubid>
                  <pubid idtype="pmpid" link="fulltext">16377613</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Towards clustering of incomplete microarray data without the use of imputation</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>1</issue>
            <fpage>107</fpage>
            <lpage>113</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl555</pubid>
                  <pubid idtype="pmpid" link="fulltext">17077099</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Integrative missing value estimation for microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Hu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Waterman</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>XJ</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>449</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-7-449</pubid>
                  <pubid idtype="pmcid">1622759</pubid>
                  <pubid idtype="pmpid" link="fulltext">17038176</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A meta-data based method for DNA microarray imputation</p>
            </title>
            <aug>
               <au>
                  <snm>Jornsten</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ouyang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>HY</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>109</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-109</pubid>
                  <pubid idtype="pmcid">1852325</pubid>
                  <pubid idtype="pmpid" link="fulltext">17394658</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Microarray missing data imputation based on a set theoretic framework and biological knowledge</p>
            </title>
            <aug>
               <au>
                  <snm>Gan</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Liew</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Yan</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>5</issue>
            <fpage>1608</fpage>
            <lpage>1619</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkl047</pubid>
                  <pubid idtype="pmcid">1409680</pubid>
                  <pubid idtype="pmpid" link="fulltext">16549873</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>An ensemble approach to microarray data-based gene prioritization after missing value imputation</p>
            </title>
            <aug>
               <au>
                  <snm>Hua</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lai</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>6</issue>
            <fpage>747</fpage>
            <lpage>754</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btm010</pubid>
                  <pubid idtype="pmpid" link="fulltext">17267438</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>32</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-7-32</pubid>
                  <pubid idtype="pmcid">1403803</pubid>
                  <pubid idtype="pmpid" link="fulltext">16426462</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Prediction of missing values in microarray and use of mixed models to evaluate the predictors</p>
            </title>
            <aug>
               <au>
                  <snm>Feten</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Almoy</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Aastveit</snm>
                  <fnm>AH</fnm>
               </au>
            </aug>
            <source>Stat Appl Genet Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>4</volume>
            <fpage>Article10</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16646827</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Evaluation of Missing Value Estimation for Microarray Data</p>
            </title>
            <aug>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>DV</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Carroll</snm>
                  <fnm>RJ</fnm>
               </au>
            </aug>
            <source>Journal of Data Science</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <fpage>347</fpage>
            <lpage>370</lpage>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Gaussian mixture clustering and imputation of microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Ouyang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Welsh</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Georgopoulos</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>6</issue>
            <fpage>917</fpage>
            <lpage>923</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth007</pubid>
                  <pubid idtype="pmpid" link="fulltext">14751970</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>DNA microarray data imputation and significance analysis of differential expression</p>
            </title>
            <aug>
               <au>
                  <snm>Jornsten</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>HY</fnm>
               </au>
               <au>
                  <snm>Welsh</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Ouyang</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>22</issue>
            <fpage>4155</fpage>
            <lpage>4161</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti638</pubid>
                  <pubid idtype="pmpid" link="fulltext">16118262</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Sehgal</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Gondal</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Dooley</snm>
                  <fnm>LS</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>10</issue>
            <fpage>2417</fpage>
            <lpage>2423</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti345</pubid>
                  <pubid idtype="pmpid" link="fulltext">15731210</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>The influence of missing value imputation on detection of differentially expressed genes from microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Scheel</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Aldrin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Glad</snm>
                  <fnm>IK</fnm>
               </au>
               <au>
                  <snm>Sorum</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lyng</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Frigessi</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>23</issue>
            <fpage>4272</fpage>
            <lpage>4279</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti708</pubid>
                  <pubid idtype="pmpid" link="fulltext">16216830</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Two-pass imputation algorithm for missing value estimation in gene expression time series</p>
            </title>
            <aug>
               <au>
                  <snm>Tsiporkova</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Boeva</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>J Bioinform Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <issue>5</issue>
            <fpage>1005</fpage>
            <lpage>1022</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1142/S0219720007003053</pubid>
                  <pubid idtype="pmpid" link="fulltext">17933008</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Dealing with gene expression missing data</p>
            </title>
            <aug>
               <au>
                  <snm>Bras</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Menezes</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Syst Biol (Stevenage)</source>
            <pubdate>2006</pubdate>
            <volume>153</volume>
            <issue>3</issue>
            <fpage>105</fpage>
            <lpage>119</lpage>
            <xrefbib>
               <pubid idtype="pmpid">16984085</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Improving cluster-based missing value estimation of DNA microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Bras</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Menezes</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Biomol Eng</source>
            <pubdate>2007</pubdate>
            <volume>24</volume>
            <issue>2</issue>
            <fpage>273</fpage>
            <lpage>282</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.bioeng.2007.04.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">17493870</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering</p>
            </title>
            <aug>
               <au>
                  <snm>de Brevern</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Hazout</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Malpertuy</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>114</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-5-114</pubid>
                  <pubid idtype="pmcid">514701</pubid>
                  <pubid idtype="pmpid" link="fulltext">15324460</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>A multi-stage approach to clustering and imputation of gene expression profiles</p>
            </title>
            <aug>
               <au>
                  <snm>Wong</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>FK</fnm>
               </au>
               <au>
                  <snm>Wood</snm>
                  <fnm>GR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>8</issue>
            <fpage>998</fpage>
            <lpage>1005</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btm053</pubid>
                  <pubid idtype="pmpid" link="fulltext">17308340</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>New components of a system for phosphate accumulation and polyphosphate metabolism in Saccharomyces cerevisiae revealed by genomic expression analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Ogawa</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>DeRisi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Mol Biol Cell</source>
            <pubdate>2000</pubdate>
            <volume>11</volume>
            <issue>12</issue>
            <fpage>4309</fpage>
            <lpage>4321</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">15074</pubid>
                  <pubid idtype="pmpid" link="fulltext">11102525</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Genomic expression programs in the response of yeast cells to environmental changes</p>
            </title>
            <aug>
               <au>
                  <snm>Gasch</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Kao</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Carmel-Harel</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Storz</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Mol Biol Cell</source>
            <pubdate>2000</pubdate>
            <volume>11</volume>
            <issue>12</issue>
            <fpage>4241</fpage>
            <lpage>4257</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">15070</pubid>
                  <pubid idtype="pmpid" link="fulltext">11102521</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Variation in gene expression patterns in follicular lymphoma and the response to rituximab</p>
            </title>
            <aug>
               <au>
                  <snm>Bohen</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Troyanskaya</snm>
                  <fnm>OG</fnm>
               </au>
               <au>
                  <snm>Alter</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Warnke</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <issue>4</issue>
            <fpage>1926</fpage>
            <lpage>1930</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0437875100</pubid>
                  <pubid idtype="pmcid">149935</pubid>
                  <pubid idtype="pmpid" link="fulltext">12571354</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Early expression of yeast genes affected by chemical stress</p>
            </title>
            <aug>
               <au>
                  <snm>Lucau-Danila</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lelandais</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kozovska</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Tanty</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Delaveau</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Devaux</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Jacq</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>2005</pubdate>
            <volume>25</volume>
            <issue>5</issue>
            <fpage>1860</fpage>
            <lpage>1868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/MCB.25.5.1860-1868.2005</pubid>
                  <pubid idtype="pmcid">549374</pubid>
                  <pubid idtype="pmpid" link="fulltext">15713640</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes</p>
            </title>
            <aug>
               <au>
                  <snm>Brock</snm>
                  <fnm>GN</fnm>
               </au>
               <au>
                  <snm>Shaffer</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Blakesley</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Lotz</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Tseng</snm>
                  <fnm>GC</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>12</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-9-12</pubid>
                  <pubid idtype="pmcid">2253514</pubid>
                  <pubid idtype="pmpid" link="fulltext">18186917</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Missing value imputation improves clustering and interpretation of gene expression microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Tuikkala</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Elo</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Nevalainen</snm>
                  <fnm>OS</fnm>
               </au>
               <au>
                  <snm>Aittokallio</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>202</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-9-202</pubid>
                  <pubid idtype="pmcid">2386492</pubid>
                  <pubid idtype="pmpid" link="fulltext">18423022</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Missing value estimation for DNA microarray gene expression data: local least squares imputation</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>2</issue>
            <fpage>187</fpage>
            <lpage>198</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth499</pubid>
                  <pubid idtype="pmpid" link="fulltext">15333461</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Integrating gene and protein expression data: pattern analysis and profile mining</p>
            </title>
            <aug>
               <au>
                  <snm>Cox</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kislinger</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Emili</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Methods</source>
            <pubdate>2005</pubdate>
            <volume>35</volume>
            <issue>3</issue>
            <fpage>303</fpage>
            <lpage>314</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ymeth.2004.08.021</pubid>
                  <pubid idtype="pmpid" link="fulltext">15722226</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation</p>
            </title>
            <aug>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Kitareewan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dmitrovsky</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <issue>6</issue>
            <fpage>2907</fpage>
            <lpage>2912</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.96.6.2907</pubid>
                  <pubid idtype="pmcid">15868</pubid>
                  <pubid idtype="pmpid" link="fulltext">10077610</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Cluster analysis and display of genome-wide expression patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <issue>25</issue>
            <fpage>14863</fpage>
            <lpage>14868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.95.25.14863</pubid>
                  <pubid idtype="pmcid">24541</pubid>
                  <pubid idtype="pmpid" link="fulltext">9843981</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>A hierarchical unsupervised growing neural network for clustering gene expression patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Herrero</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Valencia</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dopazo</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>2</issue>
            <fpage>126</fpage>
            <lpage>136</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.2.126</pubid>
                  <pubid idtype="pmpid" link="fulltext">11238068</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree</p>
            </title>
            <aug>
               <au>
                  <snm>Dopazo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Carazo</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1997</pubdate>
            <volume>44</volume>
            <issue>2</issue>
            <fpage>226</fpage>
            <lpage>233</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/PL00006139</pubid>
                  <pubid idtype="pmpid" link="fulltext">9069183</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Clustering of gene expression data: performance and similarity analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Yin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Ni</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>Suppl 4</issue>
            <fpage>S19</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-7-S4-S19</pubid>
                  <pubid idtype="pmcid">1780119</pubid>
                  <pubid idtype="pmpid" link="fulltext">17217511</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>Finding dominant sets in microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Fu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Teng</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Mao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>IF</fnm>
               </au>
               <au>
                  <snm>Xie</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Front Biosci</source>
            <pubdate>2005</pubdate>
            <volume>10</volume>
            <fpage>3068</fpage>
            <lpage>3077</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.2741/1763</pubid>
                  <pubid idtype="pmpid" link="fulltext">15970561</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Tight clustering: a resampling-based approach for identifying stable and tight patterns in data</p>
            </title>
            <aug>
               <au>
                  <snm>Tseng</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Biometrics</source>
            <pubdate>2005</pubdate>
            <volume>61</volume>
            <issue>1</issue>
            <fpage>10</fpage>
            <lpage>16</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.0006-341X.2005.031032.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15737073</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Clustering gene expression patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Ben-Dor</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yakhini</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>1999</pubdate>
            <volume>6</volume>
            <issue>3-4</issue>
            <fpage>281</fpage>
            <lpage>297</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652799318274</pubid>
                  <pubid idtype="pmpid" link="fulltext">10582567</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Supervised cluster analysis for microarray data based on multivariate Gaussian mixture</p>
            </title>
            <aug>
               <au>
                  <snm>Qu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>12</issue>
            <fpage>1905</fpage>
            <lpage>1913</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth177</pubid>
                  <pubid idtype="pmpid" link="fulltext">15044244</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Model-based clustering and data transformations for gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Yeung</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Fraley</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Murua</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Raftery</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Ruzzo</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>10</issue>
            <fpage>977</fpage>
            <lpage>987</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.10.977</pubid>
                  <pubid idtype="pmpid" link="fulltext">11673243</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>Validating clustering for gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Yeung</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Haynor</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Ruzzo</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>4</issue>
            <fpage>309</fpage>
            <lpage>318</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.4.309</pubid>
                  <pubid idtype="pmpid" link="fulltext">11301299</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>Clustering of Change Patterns Using Fourier Coefficients</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Nearest Neighbor Networks: clustering expression data based on gene neighborhoods</p>
            </title>
            <aug>
               <au>
                  <snm>Huttenhower</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Flamholz</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Landis</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Sahi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Olszewski</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Hibbs</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Siemers</snm>
                  <fnm>NO</fnm>
               </au>
               <au>
                  <snm>Troyanskaya</snm>
                  <fnm>OG</fnm>
               </au>
               <au>
                  <snm>Coller</snm>
                  <fnm>HA</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>250</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-250</pubid>
                  <pubid idtype="pmcid">1941745</pubid>
                  <pubid idtype="pmpid" link="fulltext">17626636</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Fu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Medico</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>3</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-3</pubid>
                  <pubid idtype="pmcid">1774579</pubid>
                  <pubid idtype="pmpid" link="fulltext">17204155</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>Comparing gene expression networks in a multi-dimensional space to extract similarities and differences between organisms</p>
            </title>
            <aug>
               <au>
                  <snm>Lelandais</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Vincens</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Badel-Chagnon</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vialette</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jacq</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hazout</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>11</issue>
            <fpage>1359</fpage>
            <lpage>1366</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl087</pubid>
                  <pubid idtype="pmpid" link="fulltext">16527831</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Evaluation of clustering algorithms for gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Datta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Datta</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>Suppl 4</issue>
            <fpage>S17</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-7-S4-S17</pubid>
                  <pubid idtype="pmcid">1780133</pubid>
                  <pubid idtype="pmpid" link="fulltext">17217509</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>Microarray data analysis: from disarray to consolidation and consensus</p>
            </title>
            <aug>
               <au>
                  <snm>Allison</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Cui</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Page</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Sabripour</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>1</issue>
            <fpage>55</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1749</pubid>
                  <pubid idtype="pmpid" link="fulltext">16369572</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B76">
            <title>
               <p>Computational cluster validation in post-genomic data analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Handl</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Knowles</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kell</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>15</issue>
            <fpage>3201</fpage>
            <lpage>3212</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti517</pubid>
                  <pubid idtype="pmpid" link="fulltext">15914541</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Davierwala</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Stoughton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Altschuler</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>31</volume>
            <issue>3</issue>
            <fpage>255</fpage>
            <lpage>265</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng906</pubid>
                  <pubid idtype="pmpid" link="fulltext">12089522</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B78">
            <title>
               <p>Evaluation and comparison of gene clustering methods in microarray analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Thalamuthu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mukhopadhyay</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Tseng</snm>
                  <fnm>GC</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>19</issue>
            <fpage>2405</fpage>
            <lpage>2412</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl406</pubid>
                  <pubid idtype="pmpid" link="fulltext">16882653</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B79">
            <title>
               <p>Consensus clustering and functional interpretation of gene-expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Swift</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tucker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vinciotti</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Orengo</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Kellam</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>11</issue>
            <fpage>R94</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2004-5-11-r94</pubid>
                  <pubid idtype="pmcid">545785</pubid>
                  <pubid idtype="pmpid" link="fulltext">15535870</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B80">
            <title>
               <p>Sequential local least squares imputation estimating missing value of microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Song</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Comput Biol Med</source>
            <pubdate>2008</pubdate>
            <volume>38</volume>
            <fpage>1112</fpage>
            <lpage>1120</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.compbiomed.2008.08.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">18828999</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B81">
            <title>
               <p>The Stanford Microarray Database: data access and quality assessment tools</p>
            </title>
            <aug>
               <au>
                  <snm>Gollub</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Binkley</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Demeter</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Finkelstein</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Hebert</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Hernandez-Boussard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Jin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kaloper</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Matese</snm>
                  <fnm>JC</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>1</issue>
            <fpage>94</fpage>
            <lpage>96</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkg078</pubid>
                  <pubid idtype="pmcid">165525</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519956</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B82">
            <title>
               <p>R: a language for data analysis and graphics</p>
            </title>
            <aug>
               <au>
                  <snm>Ihaka</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gentleman</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Comput Graph Stat</source>
            <pubdate>1996</pubdate>
            <volume>5</volume>
            <fpage>299</fpage>
            <lpage>314</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/1390807</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B83">
            <title>
               <p>Computational analysis of microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <issue>6</issue>
            <fpage>418</fpage>
            <lpage>427</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35076576</pubid>
                  <pubid idtype="pmpid" link="fulltext">11389458</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B84">
            <title>
               <p>Assessment of hierarchical clustering methodologies for proteomic data mining</p>
            </title>
            <aug>
               <au>
                  <snm>Meunier</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Dumas</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Piec</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bechet</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hebraud</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hocquette</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>J Proteome Res</source>
            <pubdate>2007</pubdate>
            <volume>6</volume>
            <issue>1</issue>
            <fpage>358</fpage>
            <lpage>366</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/pr060343h</pubid>
                  <pubid idtype="pmpid" link="fulltext">17203979</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
