<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-13-S7-S11</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Proceedings</dochead>
      <bibl>
         <title>
            <p>Pattern-driven neighborhood search for biclustering of microarray data</p>
         </title>
         <aug>
            <au id="A1"><snm>Ayadi</snm><fnm>Wassim</fnm><insr iid="I1"/><insr iid="I2"/><email>ayadi@info.univ-angers.fr</email></au>
            <au id="A2"><snm>Elloumi</snm><fnm>Mourad</fnm><insr iid="I2"/><email>mourad.elloumi@fsegt.rnu.tn</email></au>
            <au ca="yes" id="A3"><snm>Hao</snm><fnm>Jin-Kao</fnm><insr iid="I1"/><email>hao@info.univ-angers.fr</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>LERIA, Universit&#233; d'Angers, 2 Boulevard Lavoisier, 49045 Angers Cedex 01, France</p></ins>
            <ins id="I2"><p>LaTICE, Higher School of Sciences and Technologies of Tunis, 5 Avenue Taha Hussein, B. P. : 56, Bab Menara, 1008 Tunis, University of Tunis, Tunisia</p></ins>
         </insg>
         <source>BMC Bioinformatics</source>
         
         
         <supplement><title><p>Advanced intelligent computing theories and their applications in bioinformatics. Proceedings of the 2011 International Conference on Intelligent Computing (ICIC 2011)</p></title><editor>M Michael Gromiha and De-Shuang Huang</editor><sponsor><note>The conference and publication charges were partly funded by grants from the National Science Foundation of China Nos. 61133010, and 31071168.</note></sponsor><note>Proceedings</note><url>1471-2105-13-S7.pdf</url></supplement><conference><title><p>The 2011 International Conference on Intelligent Computing (ICIC 2011)</p></title><location>Zhengzhou, China</location><date-range>11-14 August 2011</date-range><url>http://www.ic-ic.org/2011/index.htm</url></conference><issn>1471-2105</issn>
         <pubdate>2012</pubdate>
         <volume>13</volume>
         <issue>Suppl 7</issue>
         <fpage>S11</fpage>
         <url>http://www.biomedcentral.com/1471-2105/13/S7/S11</url>
         <xrefbib><pubidlist><pubid idtype="pmpid">22594997</pubid><pubid idtype="doi">10.1186/1471-2105-13-S7-S11</pubid></pubidlist></xrefbib>
      </bibl>
      <history><pub><date><day>8</day><month>5</month><year>2012</year></date></pub></history>
      <cpyrt><year>2012</year><collab>Ayadi et al.; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Biclustering aims at finding subgroups of genes that show highly correlated behaviors across a subgroup of conditions. Biclustering is a very useful tool for mining microarray data and has various practical applications. From a computational point of view, biclustering is a highly combinatorial search problem and can be solved with optimization methods.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We describe a stochastic pattern-driven neighborhood search algorithm for the biclustering problem. Starting from an initial bicluster, the proposed method improves progressively the quality of the bicluster by adjusting some genes and conditions. The adjustments are based on the quality of each gene and condition with respect to the bicluster and the initial data matrix. The performance of the method was evaluated on two well-known microarray datasets (<it>Yeast cell cycle </it>and <it>Saccharomyces cerevisiae</it>), showing that it is able to obtain statistically and biologically significant biclusters. The proposed method was also compared with six reference methods from the literature.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>The proposed method is computationally fast and can be applied to discover significant biclusters. It can also used to effectively improve the quality of existing biclusters provided by other biclustering methods.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The DNA microarray technology permits to monitor and to measure gene expression levels for 10s of 1000s of genes simultaneously in a cell mixture in a single experiment under diverse experimental conditions. DNA microarray data are typically represented by a large matrix where each row contains the gene expression levels under specific conditions (columns). Since its invention, this technology has found many applications in biological and medical research. For instance, it is being used in cancer studies to better understand the biological mechanisms underlying oncogenesis, to discover new targets and new drugs, and to develop predictors for tailoring individualized treatments <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>.</p>
         <p>Microarray data analysis is a critical step in practical applications and often achieved with the help of data mining techniques <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Microarray data analysis can be performed according to at least two different and complementary approaches <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. The first approach is based on supervised classification (also called class prediction or class discrimination). This generally involves selecting predictive genes to build a classifier that can be used to predict the outcome of new samples based on their expression profiles. Various methods based on this approach have been proposed in the literature and examples can be found in <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>.</p>
         <p>Another general approach for microarray data analysis relies on non-supervised classification (or clustering) methods. These cluster analysis methods try to identify groups of genes, or/and groups of conditions (samples), that exhibit similar expression patterns <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. In the context of cluster analysis, biclustering is a particularly interesting approach which aims to identify simultaneously groups of genes and conditions (called biclusters) such that the genes of a bicluster show similar expression patterns across the selected conditions <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Formally, given a gene expression data matrix <it>M</it>(<it>I, J</it>) with gene index <it>i </it>&#8712; <it>I</it>={1, 2,..., <it>n</it>} and condition index <it>j </it>&#8712; <it>J</it>={1, 2,..., <it>m</it>} (<it>n </it>&gt;&gt;<it>m</it>), a bicluster <it>M</it>(<it>I', J'</it>) is a group of genes associated with a group of conditions such that <it>I' </it>&#8838; <it>I </it>and <it>J' </it>&#8838;<it>J</it>. This paper focuses on finding meaningful biclusters for a given microarray dataset.</p>
         <p>From a computational point of view, the biclustering problem is a highly combinatorial search problem and known to be NP-hard <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B24">24</abbr></abbrgrp>. A number of heuristic search algorithms have been proposed and some recent reviews can be found in <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>. Generally, existing biclustering algorithms belong to one of the following approaches.</p>
         <p>1. Greedy iterative search approach: Greedy biclustering algorithms build a solution by starting from the initial data matrix (or a transformed matrix) and iteratively remove bad genes/conditions according to a quality criterion. For instance, the algorithm presented in <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> (called Maximum Similarity Biclusters) starts by constructing a similarity matrix based on a reference gene. A greedy strategy is then iteratively applied to remove genes/conditions such that a maximum similarity is achieved in the remaining matrix (bicluster). Greedy algorithms can also proceed by extending greedily an initially empty bicluster. Examples of greedy biclustering algorithms can be found in <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. They differ essentially in the way genes/conditions are added/removed. Greedy algorithms are computationally fast, but the quality of the biclusters found may be mediocre.</p>
         <p>2. Biclusters enumeration approach: This approach tries to enumerate (implicitly) all the biclusters. The enumeration process is often represented by a search tree. During the construction of the search tree, some nodes are closed as soon as some pruning conditions are fulfilled. For instance, in <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, the authors propose the CE-Tree algorithm which builds its tree of biclusters by applying a special local breadth-first within a global depth-first search strategy in combination of exploring Maximum Dimension Sets for each pair of conditions. Representative examples of algorithms adopting this enumeration approach are given in <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B29">29</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. This approach has the advantage of achieving high quality solutions. However, algorithms using this approach are expensive in computing time and memory space.</p>
         <p>3. Stochastic search approach: This approach can be further divided into neighborhood search and evolutionary search. For neighborhood search, one begins with an initial candidate solution (bicluster) and improves iteratively its quality by replacing the bicluster with a neighboring bicluster. The neighboring bicluster is typically obtained by replacing a gene/condition by a better one. Cheng and Church <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> are probably the first to apply this approach to the biclustering problem. They employ the <it>Mean Squared Residue </it>(MSR) to measure the goodness of genes and conditions and to decide which genes/conditions are to be removed/added. Other biclustering algorithms based on local search are presented in <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>. Population-based evolutionary search generalizes neighborhood search by operating on a pool of candidate solutions. Candidate solutions are improved with operators like crossover and mutation. Examples of evolutionary biclustering algorithms can be found in <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>.</p>
         <p>In this paper we introduce a stochastic neighborhood search algorithm called <it>Pattern-Driven Neighborhood Search </it>(PDNS) for the biclustering problem. PDNS is based on a solution representation encoded as a behavior matrix and a dedicated neighborhood taking into account various patterns information. It also employs fast greedy algorithms to generate diversified initial biclusters of reasonable quality and a randomized perturbation strategy.</p>
      </sec>
      <sec>
         <st>
            <p>Method</p>
         </st>
         <sec>
            <st>
               <p>Preprocessing of gene expression matrix</p>
            </st>
            <p>Prior to the search by PDNS, our method first applies a preprocessing step to transform the input data matrix <it>M </it>to a Behavior Matrix <it>M'</it>. This preprocessing step aims to highlight the trajectory patterns of genes. Indeed, according to <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>, in microarray data analysis, genes are considered to be in the same cluster if their trajectory patterns of expression levels are similar across a set of conditions. Within the transformed matrix <it>M'</it>, each row represents the trajectory pattern of a gene across all the combined conditions while each column represents the trajectory pattern of all the genes under a pair of particular conditions in the data matrix <it>M</it>. The whole matrix <it>M' </it>provides thus useful information for the identification of relevant biclusters and the definition of a meaningful neighborhood of a local search algorithm.</p>
            <p>Formally, the behavior matrix <it>M' </it>is constructed progressively by merging a pair of columns (conditions) from the input data matrix <it>M</it>. Since <it>M </it>has <it>n </it>rows and <it>m </it>columns, there is <it>m</it>(<it>m</it>&#8722;1)/2 distinct combinations between columns, represented by <it>J''</it>. So, <it>M' </it>has <it>n </it>rows and <it>m</it>(<it>m</it>&#8722;1)/2 columns. <it>M' </it>is defined as follows:</p>
            <p>
               <display-formula id="M1">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S7-S11-i1"><m:mrow>
   <m:msup>
      <m:mrow>
         <m:mi>M</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>&#8242;</m:mi>
      </m:mrow>
   </m:msup>
   <m:mrow>
      <m:mo class="MathClass-open">[</m:mo>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>l</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">]</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfenced close="" open="{" separators="">
      <m:mrow>
         <m:mtable class="gathered">
            <m:mtr>
               <m:mtd>
                  <m:mn>1</m:mn>
                  <m:mspace class="tmspace" width="2.77695pt"/>
                  <m:mi>i</m:mi>
                  <m:mi>f</m:mi>
                  <m:mspace class="tmspace" width="2.77695pt"/>
                  <m:mi>M</m:mi>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow>
                        <m:mi>i</m:mi>
                        <m:mo class="MathClass-punc">,</m:mo>
                        <m:mi>k</m:mi>
                     </m:mrow>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
                  <m:mo class="MathClass-rel">&lt;</m:mo>
                  <m:mi>M</m:mi>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow>
                        <m:mi>i</m:mi>
                        <m:mo class="MathClass-punc">,</m:mo>
                        <m:mi>q</m:mi>
                     </m:mrow>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
               </m:mtd>
            </m:mtr>
            <m:mtr>
               <m:mtd>
                  <m:mn>0</m:mn>
                  <m:mspace class="tmspace" width="2.77695pt"/>
                  <m:mi>i</m:mi>
                  <m:mi>f</m:mi>
                  <m:mspace class="tmspace" width="2.77695pt"/>
                  <m:mi>M</m:mi>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow>
                        <m:mi>i</m:mi>
                        <m:mo class="MathClass-punc">,</m:mo>
                        <m:mi>k</m:mi>
                     </m:mrow>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
                  <m:mo class="MathClass-rel">=</m:mo>
                  <m:mi>M</m:mi>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow>
                        <m:mi>i</m:mi>
                        <m:mo class="MathClass-punc">,</m:mo>
                        <m:mi>q</m:mi>
                     </m:mrow>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
               </m:mtd>
            </m:mtr>
            <m:mtr>
               <m:mtd>
                  <m:mo class="MathClass-bin">-</m:mo>
                  <m:mn>1</m:mn>
                  <m:mspace class="tmspace" width="2.77695pt"/>
                  <m:mi>i</m:mi>
                  <m:mi>f</m:mi>
                  <m:mspace class="tmspace" width="2.77695pt"/>
                  <m:mi>M</m:mi>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow>
                        <m:mi>i</m:mi>
                        <m:mo class="MathClass-punc">,</m:mo>
                        <m:mi>k</m:mi>
                     </m:mrow>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
                  <m:mo class="MathClass-rel">&gt;</m:mo>
                  <m:mi>M</m:mi>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow>
                        <m:mi>i</m:mi>
                        <m:mo class="MathClass-punc">,</m:mo>
                        <m:mi>q</m:mi>
                     </m:mrow>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
               </m:mtd>
            </m:mtr>
            <m:mtr>
               <m:mtd/>
            </m:mtr>
         </m:mtable>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>with <it>i </it>&#8712; [1..<it>n</it>], <it>l </it>&#8712; [1..<it>J''</it>], <it>k </it>&#8712; [1..<it>m</it>- 1], <it>q </it>&#8712; [2..<it>m</it>] and <it>q </it>&#8805; <it>k </it>+ 1.</p>
            <p>Figure <figr fid="F1">1</figr> shows an illustrative example. We can observe, by considering each row of <it>M'</it>, the trajectory (or behavior) pattern of each gene through all the combined conditions, i.e., up (1), down (-1) and no change (0), of all rows (genes) over combined columns (combined conditions). Similarly, the combinations of all the paired conditions give useful information since a bicluster may be composed of a subset of non contiguous conditions. Our PDNS algorithm uses <it>M' </it>to define its search space as well as its neighborhood that is critical for the search process.</p>
            <fig id="F1"><title><p>Figure 1</p></title><caption><p>Construction of bicluster pattern</p></caption><text>
   <p><b>Construction of bicluster pattern</b>.</p>
</text><graphic file="1471-2105-13-S7-S11-1"/></fig>
         </sec>
         <sec>
            <st>
               <p>Pattern-driven neighborhood search for biclustering - general procedure</p>
            </st>
            <p>Our proposed PDNS method can be considered as an Iterated Local Search procedure <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. It alternates between two basic components: a descent-based improvement procedure and a perturbation operator. PDNS uses the descent procedure to discover locally optimal solutions and the perturbation operator to displace the search to a new starting point in an unexplored search region.</p>
            <p>The key originality of PDNS concerns the use of bicluster pattern both in its search space and neighborhood definition. The bicluster pattern is a characteristic representation of a bicluster. It is used to evaluate genes/conditions of bicluster. This representation is defined by the behavior matrix of the bicluster, i.e., the trajectory patterns of the genes under all combined conditions of the bicluster. This representation is important because it is well recognized that in microarray data, genes are considered to belong to the same cluster if they have similar trajectory patterns of expression levels <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr><abbr bid="B47">47</abbr></abbrgrp>.</p>
            <p>Starting from an initial bicluster (call it current solution s), PDNS uses the descent strategy to explore the pattern-based neighborhood and moves to an improving neighboring solution at each iteration. By using the bicluster pattern, we define a set of rules which allow us to qualify the goodness (or badness) of a gene and condition. Using these rules (explained in a later section "Neighborhood and its exploration"), PDNS iteratively replaces within the current bicluster bad genes/conditions by good ones, thus progressively improves the quality of the bicluster under consideration. This iterative improvement procedure stops when the last bicluster attains a fixed quality threshold according to the ASR evaluation function (see next section) or when a fixed number <it>Y </it>of iterations is reached. At this point, PDNS triggers a perturbation phase by replacing randomly 10% of genes and conditions of the recorded best bicluster found so far. This perturbed bicluster is used as a new starting point for the next round of the descent search.</p>
            <p>The whole PDNS algorithm stops when the best bicluster is not updated for a fixed number <it>Z </it>of perturbations. The general PDNS procedure is described in Figure <figr fid="F2">2</figr>. We describe in the following sections the ingredients of PDNS.</p>
            <fig id="F2"><title><p>Figure 2</p></title><caption><p>General PDNS procedure</p></caption><text>
   <p><b>General PDNS procedure</b>.</p>
</text><graphic file="1471-2105-13-S7-S11-2"/></fig>
         </sec>
         <sec>
            <st>
               <p>The ASR evaluation function</p>
            </st>
            <p>Many functions exist for bicluster evaluation. One of the most popular evaluation functions is the <it>Mean Squared Residue </it>(MSR) <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. It has been used by several biclustering algorithms <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B38">38</abbr><abbr bid="B42">42</abbr><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>. However, MSR is deficient to assess correctly the quality of certain types of biclusters like multiplicative models <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B33">33</abbr><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>.</p>
            <p>In this paper, we use the Average Spearman's Rho (ASR) function which avoids the drawback of MSR <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Let <it>(I', J') </it>be a bicluster in a data matrix <it>M(I, J)</it>, the ASR evaluation function is then defined by:</p>
            <p>
               <display-formula id="M2">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S7-S11-i2"><m:mrow>
   <m:mtext>ASR</m:mtext>
   <m:mo stretchy="false">(</m:mo>
   <m:msup>
      <m:mi>I</m:mi>
      <m:mo>&#8242;</m:mo>
   </m:msup>
   <m:mo>,</m:mo>
   <m:msup>
      <m:mi>J</m:mi>
      <m:mo>&#8242;</m:mo>
   </m:msup>
   <m:mo stretchy="false">)</m:mo>
   <m:mo>=</m:mo>
   <m:mn>2</m:mn>
   <m:mtext>&#8201;</m:mtext>
   <m:mtext>max</m:mtext>
   <m:mrow>
      <m:mo>{</m:mo>
      <m:mrow>
         <m:mfrac>
            <m:mrow>
               <m:mstyle displaystyle="true">
                  <m:msub>
                     <m:mo>&#8721;</m:mo>
                     <m:mrow>
                        <m:mi>i</m:mi>
                        <m:mi>&#949;</m:mi>
                        <m:msup>
                           <m:mi>I</m:mi>
                           <m:mo>&#8242;</m:mo>
                        </m:msup>
                     </m:mrow>
                  </m:msub>
                  <m:mrow>
                     <m:mstyle displaystyle="true">
                        <m:msub>
                           <m:mo>&#8721;</m:mo>
                           <m:mrow>
                              <m:mtable>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>j</m:mi>
                                          <m:mi>&#949;</m:mi>
                                          <m:msup>
                                             <m:mi>J</m:mi>
                                             <m:mo>&#8242;</m:mo>
                                          </m:msup>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>j</m:mi>
                                          <m:mo>&#8805;</m:mo>
                                          <m:mi>i</m:mi>
                                          <m:mo>+</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                              </m:mtable>
                           </m:mrow>
                        </m:msub>
                        <m:mrow>
                           <m:msub>
                              <m:mi>&#961;</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                           </m:msub>
                        </m:mrow>
                     </m:mstyle>
                  </m:mrow>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:mo>|</m:mo>
               <m:msup>
                  <m:mi>I</m:mi>
                  <m:mo>&#8242;</m:mo>
               </m:msup>
               <m:mo>|</m:mo>
               <m:mo stretchy="false">(</m:mo>
               <m:mo>|</m:mo>
               <m:msup>
                  <m:mi>I</m:mi>
                  <m:mo>&#8242;</m:mo>
               </m:msup>
               <m:mo>|</m:mo>
               <m:mo>&#8722;</m:mo>
               <m:mn>1</m:mn>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:mfrac>
         <m:mo>,</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mstyle displaystyle="true">
                  <m:msub>
                     <m:mo>&#8721;</m:mo>
                     <m:mrow>
                        <m:mi>k</m:mi>
                        <m:mi>&#949;</m:mi>
                        <m:msup>
                           <m:mi>J</m:mi>
                           <m:mo>&#8242;</m:mo>
                        </m:msup>
                     </m:mrow>
                  </m:msub>
                  <m:mrow>
                     <m:mstyle displaystyle="true">
                        <m:msub>
                           <m:mo>&#8721;</m:mo>
                           <m:mrow>
                              <m:mtable>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>l</m:mi>
                                          <m:mi>&#949;</m:mi>
                                          <m:msup>
                                             <m:mi>J</m:mi>
                                             <m:mo>&#8242;</m:mo>
                                          </m:msup>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>l</m:mi>
                                          <m:mo>&#8805;</m:mo>
                                          <m:mi>k</m:mi>
                                          <m:mo>+</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                              </m:mtable>
                           </m:mrow>
                        </m:msub>
                        <m:mrow>
                           <m:msub>
                              <m:mi>&#961;</m:mi>
                              <m:mrow>
                                 <m:mi>k</m:mi>
                                 <m:mi>l</m:mi>
                              </m:mrow>
                           </m:msub>
                        </m:mrow>
                     </m:mstyle>
                  </m:mrow>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:mo>|</m:mo>
               <m:msup>
                  <m:mi>J</m:mi>
                  <m:mo>&#8242;</m:mo>
               </m:msup>
               <m:mo>|</m:mo>
               <m:mo stretchy="false">(</m:mo>
               <m:mo>|</m:mo>
               <m:msup>
                  <m:mi>J</m:mi>
                  <m:mo>&#8242;</m:mo>
               </m:msup>
               <m:mo>|</m:mo>
               <m:mo>&#8722;</m:mo>
               <m:mn>1</m:mn>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:mfrac>
      </m:mrow>
      <m:mo>}</m:mo>
   </m:mrow>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>where <it>&#961;<sub>ij </sub></it>(<it>i &#8800; j</it>) is the Spearman's rank correlation <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> associated with the row indices <it>i </it>and <it>j </it>in the bicluster (<it>I', J'</it>), <it>&#961;<sub>kl </sub></it>(<it>k &#8800; l</it>) is the Spearman's rank correlation associated with the column indices <it>k </it>and <it>l </it>in the bicluster (<it>I', J'</it>). According to this definition, ASR(<it>I', J'</it>) &#8712;[-1..1].</p>
            <p>A high (resp. low) ASR value, close to 1 (resp. close to -1), indicates that the genes/conditions of the bicluster are strongly (resp. weakly) correlated.</p>
            <p>Let us notice that the existing evaluation functions can roughly be classified into two families: <it>numerical measures </it>and <it>qualitative measures</it>. <it>Numerical measures</it>, like <it>Pearson's correlation </it>or <it>Euclidean distance</it>, are easy to compute but they are quite sensitive toward outliers and noise. <it>Qualitative measures</it>, like measures that consider only ups, downs and no change of conditions, are very sensitive to precise the values of changes. As ASR is based on <it>Spearman's rank correlation </it>it can be considered as a good compromise between numerical and qualitative measures.</p>
         </sec>
         <sec>
            <st>
               <p>Configuration representation</p>
            </st>
            <p>PDNS uses a solution representation based on the behavior matrix <it>M' </it>obtained from the preprocessing step described previously. More precisely, given a bicluster <it>B </it>= (<it>I'</it>, <it>J'</it>), we encode the bicluster by its behaviour matrix <it>s </it>= (<it>I',K</it>) which is the sub-matrix of <it>M' </it>including only the set of genes in <it>I' </it>and all the combinations of paired conditions in <it>J' </it>(see example of Figure <figr fid="F1">1</figr>). It is clear that <it>s </it>has the same rows as <it>B</it>, its number <it>K </it>of columns is equal to |<it>J'</it>|(|<it>J'</it>| - 1). In the rest of this paper, <it>s </it>is called a configuration (or solution). As it is shown below in Section "Neighborhood and its exploration", such a configuration representation enables the definition of dedicated move operators to improve progressively the quality of the generated biclusters.</p>
         </sec>
         <sec>
            <st>
               <p>Initial solution</p>
            </st>
            <p>Our algorithm needs an initial bicluster to start its search. The initial bicluster can be provided by any means. For instance, this can be done randomly with a risk of starting with an initial solution of bad quality. A more interesting strategy is to employ a fast greedy algorithm to obtain rapidly a bicluster of reasonable quality. We use this strategy in this work and adopt two well-known algorithms: one is presented by Cheng and Church <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and the other is called OPSM which is introduced in <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. As explained above, each initial bicluster is encoded into its behavior matrix before being improved by PDNS.</p>
         </sec>
         <sec>
            <st>
               <p>Neighborhood and its exploration</p>
            </st>
            <p>The neighborhood is one of the most critical elements of any local search algorithm. The neighborhood can be defined by a move operator. Given a solution <it>s</it>, let <it>mv </it>be the move operator that can be applied to <it>s</it>. Then each application of <it>mv </it>transforms <it>s </it>into a new solution <it>s'</it>. This is typically denoted by <it>s' </it>&#8592; <it>s </it>&#8853;<it>mv</it>.</p>
            <p>In this paper, we devise two specially designed move operators operating respectively on rows (genes) and columns (combinations of pairwise conditions) of a given solution. Both operators are based on the general drop/add operation which removes some elements and adds new elements in the given solution. The critical issue here is the criterion that is employed to determine the elements to be removed and added. In our case, this decision is based on the "behavior pattern".</p>
            <p>Our first move operator, denoted by <it>mv<sub>g</sub></it>, performs changes by removing a number of rows (genes) of the bicluster and adding other genes in order to obtain more coherent biclusters. Let <it>s </it>= (<it>I', K</it>) be a solution, we first extract from the behavior matrix <it>M' </it>the associated sub-matrix <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S7-S11-i3"><m:mrow>
   <m:msup>
      <m:mrow>
         <m:mover accent="true">
            <m:mrow>
               <m:mi>M</m:mi>
            </m:mrow>
            <m:mo class="MathClass-op"> &#772;</m:mo>
         </m:mover>
      </m:mrow>
      <m:mrow>
         <m:mi>&#8242;</m:mi>
      </m:mrow>
   </m:msup>
</m:mrow>
</m:math></inline-formula>. Let <it>R </it>and <it>C </it>denote respectively the index set of rows and columns of <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S7-S11-i3"><m:mrow><m:msup><m:mrow><m:mover accent="true"><m:mrow><m:mi>M</m:mi></m:mrow><m:mo class="MathClass-op"> &#772;</m:mo></m:mover></m:mrow><m:mrow><m:mi>&#8242;</m:mi></m:mrow></m:msup></m:mrow></m:math></inline-formula>. From <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S7-S11-i3"><m:mrow><m:msup><m:mrow><m:mover accent="true"><m:mrow><m:mi>M</m:mi></m:mrow><m:mo class="MathClass-op"> &#772;</m:mo></m:mover></m:mrow><m:mrow><m:mi>&#8242;</m:mi></m:mrow></m:msup></m:mrow></m:math></inline-formula> we build the bicluster pattern <it>P </it>of <it>s </it>which is defined by a vector indexed by <it>C</it>. <it>P</it>[<it>j</it>], <it>j </it>&#8712; <it>C</it>, takes the dominating value <it>k </it>&#8712; {1, 0, -1} such that <it>k </it>has the highest appearances in the column <it>i </it>of <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S7-S11-i3"><m:mrow><m:msup><m:mrow><m:mover accent="true"><m:mrow><m:mi>M</m:mi></m:mrow><m:mo class="MathClass-op"> &#772;</m:mo></m:mover></m:mrow><m:mrow><m:mi>&#8242;</m:mi></m:mrow></m:msup></m:mrow></m:math></inline-formula> (see example of Figure <figr fid="F3">3</figr>).</p>
            <fig id="F3"><title><p>Figure 3</p></title><caption><p>Row move operator <it>mv<sub>g</sub></it></p></caption><text>
   <p><b>Row move operator <it>mv<sub>g</sub></it></b>. A bad gene (g<sub>4</sub>) is deleted since its quality (50%) is inferior to &#945; = 70%; A good g<sub>10 </sub>is selected and added which has a quality (83%) superior to &#945; = 70%.</p>
</text><graphic file="1471-2105-13-S7-S11-3"/></fig>
            <p>Now for each gene <it>g<sub>i</sub></it>, <it>i </it>&#8712; <it>R </it>of the solution <it>s</it>, we define the quality of <it>g<sub>i </sub></it>as the percentage of concordances between the behavior pattern of <it>g </it>and the behavior pattern <it>P </it>of bicluster <it>s</it>. Let &#945; be a fixed quality threshold of genes. Let <it>D </it>denote the set of bad genes of <it>s </it>such that their quality does not reach the quality threshold fixed by &#945;. Let <it>G </it>denote the set of good genes missing from <it>s </it>such that their quality surpasses the quality threshold &#945;. Then our first move operator <it>mv<sub>g </sub></it>removes from <it>s </it>all the bad genes of <it>D </it>and adds a number of genes selected from <it>G</it>.</p>
            <p>Figure <figr fid="F3">3</figr> shows an example where one bad gene (<it>g<sub>4</sub></it>) is deleted and one good gene (<it>g<sub>10</sub></it>) is added. <it>g<sub>4 </sub></it>is bad because its behavior pattern has a low concordance with the bicluster behavior pattern (only 50% which is inferior than the quality threshold &#945; = 70%). Similarly, <it>g<sub>10 </sub></it>is good because its quality (83%) is higher than &#945;. This replacement increases thus the coherence of the resulting bicluster. In the general case, the number of deleted gene may differ from the number of added genes. Notice that this move operator does not change the columns of the solution.</p>
            <p>Our second move operator, denoted by <it>mv<sub>c</sub></it>, performs changes by removing a number of columns (combined conditions) and adding other columns in order to obtain more coherent biclusters. Similar to the first move operator, <it>mv<sub>c </sub></it>uses a quality threshold <it>&#946; </it>for each column. The quality of each column is defined as the percentage of concordances between the column pattern and the value of this column in the bicluster pattern.</p>
            <p>Then, when our second move operator <it>mv<sub>c </sub></it>detects a bad condition from the current bicluster, we test if the dominating value of each condition of the current bicluster has the same value with the corresponding value in the bicluster pattern. If it is different, this condition is considered bad (and removed from the current bicluster). To add a good condition from the current bicluster, we select a condition under the same subset of genes from the "behavior matrix" <it>M' </it>which has a dominating value higher than a fixed threshold <it>&#946;</it>. Notice that this move operator does not change the rows of the solution (see example of Figure <figr fid="F4">4</figr>). In the general case, the number of deleted columns may differ from the number of added columns at each application of this move operator.</p>
            <fig id="F4"><title><p>Figure 4</p></title><caption><p>Columns move operator <it>mv<sub>c</sub></it></p></caption><text>
   <p><b>Columns move operator <it>mv<sub>c</sub></it></b>. Column c<sub>2</sub>c<sub>3 </sub>has a dominating value different to the column c<sub>2</sub>c<sub>3 </sub>in P and thus removed from s; c<sub>2</sub>c<sub>5 </sub>with a quality superior to &#946; = 70% in the same subset of genes is selected and added into s.</p>
</text><graphic file="1471-2105-13-S7-S11-4"/></fig>
            <p>For a given solution, our PDNS algorithm applies these two move operators to reach a local optimum <it>s </it>(with an ASR value higher than the fixed <it>threshold_ASR </it>threshold). This local optimum solution <it>s </it>is composed of a group of genes and columns, each column representing the trajectory pattern of two conditions across the group of genes. Among the combinations of conditions in <it>s</it>, some conditions may be combined with only a few other conditions. These conditions are in fact insignificant conditions for the extracted bicluster. For this reason, during the decoding process (transforming <it>s </it>into a bicluster <it>B</it>), we retain only conditions which are combined with at least 50% other selected conditions. For instance, if we have <it>s </it>= {(<it>g<sub>1</sub>, g<sub>2</sub>, g<sub>3</sub>, g<sub>4</sub></it>); (<it>c<sub>1</sub>c<sub>2</sub>, c<sub>1</sub>c<sub>3</sub>, c<sub>1</sub>c<sub>4</sub>, c<sub>2</sub>c<sub>3</sub></it>)}, condition <it>c<sub>4 </sub></it>will not be kept in the final bicluster because it is not combined at least with 50% of the other conditions, i.e., <it>c<sub>2 </sub></it>and <it>c<sub>3</sub></it>. The bicluster obtained is thus <it>B </it>= {(<it>g<sub>1</sub>, g<sub>2</sub>, g<sub>3</sub>, g<sub>4</sub></it>); (<it>c<sub>1</sub>, c<sub>2</sub>, c<sub>3</sub></it>)}.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Experimental protocol</p>
            </st>
            <p>We perform statistical and biological validations of the obtained biclusters and we evaluate our PDNS algorithm against the results of some prominent biclustering algorithms used by the community, namely, CC <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, OPSM <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, ISA <abbrgrp><abbr bid="B56">56</abbr></abbrgrp> and Bimax <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. For these reference methods, we use <it>Biclustering Analysis Toolbox </it>(BicAT) which is a recent software platform for clustering-based data analysis that integrates all these biclustering algorithms <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. We also compare our method with two additional methods (Samba <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and RMSBE <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>).</p>
            <p>For the experiments, we empirically fix &#945;, <it>&#946; </it>and <it>threshold_ASR </it>of the PDNS algorithm as follows. We experiment a number of combinations (typically several tens) and for each combination, we compute the <it>p</it>-values of the obtained biclusters. We pick the combination with the lowest <it>p</it>-value for the final experiment. For CC, OPSM, ISA and Bimax, the default values used in <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> are adopted for the Yeast Cell-Cycle dataset. For all the other experiments, we report the results of the compared algorithms from their original papers. The PDNS algorithm was implemented in Java and run on a PC Intel Core 2 Duo T6400 with 2.0GHz CPU and 3.5Gb RAM.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Datasets and results</p>
         </st>
         <sec>
            <st>
               <p>Saccharomyces Cerevisiae dataset</p>
            </st>
            <p>The Saccharomyces Cerevisiae dataset (available at <url>http://www.tik.ethz.ch/sop/bimax/</url>) <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> contains the expression levels of 2993 genes under 173 experimental conditions. For this experiment, the parameters of PDNS are experimentally set as follows: &#945; = 0.8, <it>&#946; </it>= 0.8, <it>threshold_ASR </it>= 0.7, <it>Y </it>=100 and <it>Z</it>=50. The average running time of PDNS to improve a bicluster was about 4 minutes.</p>
            <p>The results of PDNS are compared against the reported scores of RMSBE, Bimax, OPSM, ISA, Samba and CC from <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B57">57</abbr></abbrgrp>. In order to evaluate the statistical significance of a bicluster, we determine whether the set of genes contained in the bicluster shows significant enrichment with respect to a specific <it>Gene Ontology </it>(GO). We use the webtool <it>FuncAssociate </it>(available at <url>http://llama.mshri.on.ca/funcassociate/</url>) <abbrgrp><abbr bid="B60">60</abbr></abbrgrp> for this purpose. <it>FuncAssociate </it>computes the adjusted significance scores for each bicluster, i.e., adjusted <it>p</it>-values (<it>p </it>= 5%, 1%, 0.5%, 0.1% and 0.001%) which is the one-sided <it>p</it>-value of the association between attribute and query resulting from Fisher's Exact Test. The best biclusters have an adjusted <it>p</it>-value less than 0.001%.</p>
            <p>Figure <figr fid="F5">5</figr> presents different significant scores <it>p </it>for each algorithm over the percentage of total extracted biclusters. On the one hand, PDNS and RMSBE seem to outperform other algorithms. PDNS (resp. RMSBE) results show that 100% (resp. 98%) of discovered biclusters are statistically significant with <it>p </it>&lt; 0.001%. On the other hand, apart from CC, other algorithms have reasonably good performance. In particular, the best of the other compared algorithms, OPSM, 87% of its biclusters has <it>p </it>&lt; 0.001%. CC under-performs because it is unable to find coherent biclusters and its lack of robustness against noise.</p>
            <fig id="F5"><title><p>Figure 5</p></title><caption><p>Proportions of biclusters significantly enriched by GO on Saccharomyces Cerevisiae dataset</p></caption><text>
   <p><b>Proportions of biclusters significantly enriched by GO on Saccharomyces Cerevisiae dataset</b>.</p>
</text><graphic file="1471-2105-13-S7-S11-5"/></fig>
         </sec>
         <sec>
            <st>
               <p>Yeast Cell-Cycle dataset</p>
            </st>
            <p>The Yeast Cell-Cycle dataset (available at <url>http://arep.med.harvard.edu/biclustering/</url>) is described in <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. This dataset is processed in <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and publicly available from <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. It contains the expression profiles of more than 6000 yeast genes measured at 17 conditions over two complete cell cycles. In our experiments we use 2884 genes selected by <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
            <p>For this dataset, two criteria are used. First, we evaluate the statistical relevance of the extracted biclusters by computing the adjusted <it>p</it>-value like as for the Saccharomyces Cerevisiae dataset. Second, we identify the biological annotations for the obtained biclusters. For this experiment, the parameters &#945;, <it>&#946;</it>, <it>threshold_ASR</it>, <it>Y </it>and <it>Z </it>of PDNS are set as follows: &#945;=0.5, <it>&#946; </it>=0.7, <it>threshold_ASR </it>=0.5, <it>Y </it>=100 and <it>Z</it>=50. The average running time of PDNS to improve a bicluster was about 2 minutes.</p>
            <sec>
               <st>
                  <p>Statistical relevance</p>
               </st>
               <p>To evaluate the statistical relevance of PDNS, we use again the <it>p</it>-values and apply the web-tool <it>FuncAssociate </it><abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. The results of PDNS are compared against CC, ISA, Bimax and OPSM. Figure <figr fid="F6">6</figr> shows, for each significant score <it>p </it>(<it>p </it>= 5%, 1%, 0.5%, 0.1% and 0.001%) and for each compared algorithm, the percentage of the statistically significant biclusters extracted by the algorithm with the indicated <it>p</it>-value. We observe that PDNS outperforms the other algorithms on this dataset. 100% of discovered biclusters of PDNS are statistically significant with <it>p </it>&lt; 0.001%. However, the best of the compared algorithm (Bimax) has only a percentage of 64% for <it>p </it>&lt; 0.001%.</p>
               <fig id="F6"><title><p>Figure 6</p></title><caption><p>Proportions of biclusters significantly enriched by GO on Yeast Cell-Cycle dataset</p></caption><text>
   <p><b>Proportions of biclusters significantly enriched by GO on Yeast Cell-Cycle dataset</b>.</p>
</text><graphic file="1471-2105-13-S7-S11-6"/></fig>
            </sec>
            <sec>
               <st>
                  <p>Analysis of biological annotation enrichment of biclusters</p>
               </st>
               <p>To evaluate the biological significance of the obtained biclusters in terms of the associated biological processes, molecular functions and cellular components respectively, we use the Gene Ontology (GO) term finder <it>GOTermFinder </it>(available at <url>http://db.yeastgenome.org/cgi-bin/GO/goTermFinder</url>). Indeed, the GO project provides a controlled vocabulary to describe gene and gene product attributes in any organism, and it is a collaborative effort to address the need for consistent descriptions of gene products in different databases (cited from <url>http://www.geneontology.org</url>). <it>GOTermFinder </it>can find the significant shared GO terms for genes within the same bicluster.</p>
               <p>Table <tblr tid="T1">1</tblr> and Table <tblr tid="T2">2</tblr> report the top GO terms shared by the biclusters of CC (<it>id2<sub>CC</sub></it>, <it>id9<sub>CC</sub></it>) and OPSM (<it>id7<sub>OPSM</sub></it>, <it>id10<sub>OPSM</sub></it>), and their improvement by PDNS (<it>id2<sub>PDNS</sub></it>, <it>id9<sub>PDNS</sub></it>, <it>id7<sub>PDNS</sub></it>, <it>id10<sub>PDNS</sub></it>), in terms of biological process, molecular function and cellular component. For each GO, we list only the most significant shared term with the smallest <it>p</it>-value.</p>
               <tbl id="T1"><title><p>Table 1</p></title><caption><p>Most significant shared GO terms (process, function, component) of CC and PDNS for biclusters on Yeast Cell-Cycle dataset</p></caption><tblbdy cols="5">
      <r>
         <c ca="left">
            <p><b>Bic</b>.</p>
         </c>
         <c ca="left">
            <p><b>Algo</b>.</p>
         </c>
         <c ca="left">
            <p>
               <b>Biological process</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Molecular function</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Cellular component</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>id9<sub>CC</sub></it>
            </p>
            <p>
               <it>id9<sub>PDNS</sub></it>
            </p>
         </c>
         <c ca="left">
            <p>CC</p>
            <p>PDNS</p>
         </c>
         <c ca="left">
            <p>unknown</p>
            <p>glutamate biosynthetic process</p>
            <p>(10.2%, 8.62e-08)</p>
         </c>
         <c ca="left">
            <p>unknown</p>
            <p>isocitrate dehydrogenase (NAD+) activity</p>
            <p>(18.6%, 0.00300)</p>
         </c>
         <c ca="left">
            <p>unknown</p>
            <p>mitochondrion part</p>
            <p>(48.3%, 5.19e-07)</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>id2<sub>CC</sub></it>
            </p>
            <p>
               <it>id2<sub>PDNS</sub></it>
            </p>
         </c>
         <c ca="left">
            <p>CC</p>
            <p>PDNS</p>
         </c>
         <c ca="left">
            <p>translation</p>
            <p>(46.6%, 1.72e-22)</p>
            <p>translation</p>
            <p>(58.1%, 8.71e-37)</p>
         </c>
         <c ca="left">
            <p>structural constituent of ribosome (38.8%, 1.05e-36)</p>
            <p>structural constituent of ribosome (51.3%, 4.48e-59)</p>
         </c>
         <c ca="left">
            <p>cytosolic ribosome</p>
            <p>(38.8%, 1.10e-41)</p>
            <p>cytosolic ribosome</p>
            <p>(53.00%, 5.97e-70)</p>
         </c>
      </r>
   </tblbdy></tbl>
               <tbl id="T2"><title><p>Table 2</p></title><caption><p>Most significant shared GO terms (process, function, component) of OPSM and PDNS for biclusters on Yeast Cell-Cycle dataset</p></caption><tblbdy cols="5">
      <r>
         <c ca="left">
            <p><b>Bic</b>.</p>
         </c>
         <c ca="left">
            <p><b>Algo</b>.</p>
         </c>
         <c ca="left">
            <p>
               <b>Biological process</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Molecular function</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Cellular component</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>id7<sub>OPSM</sub></it>
            </p>
            <p>
               <it>id7<sub>PDNS</sub></it>
            </p>
         </c>
         <c ca="left">
            <p>OPSM</p>
            <p>PDNS</p>
         </c>
         <c ca="left">
            <p>unknown</p>
            <p>ribosome biogenesis</p>
            <p>(32.1%, 2.02e-07)</p>
         </c>
         <c ca="left">
            <p>unknown</p>
            <p>snoRNA binding</p>
            <p>(5.3%, 5.84e-06)</p>
         </c>
         <c ca="left">
            <p>unknown</p>
            <p>nucleolus</p>
            <p>(32.1%, 6.22e-10)</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>id10<sub>OPSM</sub></it>
            </p>
            <p>
               <it>id10<sub>PDNS</sub></it>
            </p>
         </c>
         <c ca="left">
            <p>OPSM</p>
            <p>PDNS</p>
         </c>
         <c ca="left">
            <p>sister chromatid</p>
            <p>segregation (24.7%, 0.00337)</p>
            <p>nucleic acid metabolic</p>
            <p>process (34.0%, 2.45e-11)</p>
         </c>
         <c ca="left">
            <p>unknown</p>
            <p>phosphatase regulator</p>
            <p>activity (1.7%, 0.00041)</p>
         </c>
         <c ca="left">
            <p>spindle</p>
            <p>(14.1%, 0.00196)</p>
            <p>nucleus</p>
            <p>(44.8%, 3.46e-15)</p>
         </c>
      </r>
   </tblbdy></tbl>
               <p>For the bicluster labeled <it>id9<sub>PDNS </sub></it>(Table <tblr tid="T1">1</tblr>), the genes <it>YCR005C, YHR037W, YLR304C, YNL037C, YNR001C </it>and <it>YOR136W </it>are together involved in the glutamate biosynthetic process. Each GO term is associated with a tuple, for example glutamate biosynthetic process (10.2%, 8.62e-08) indicates the cluster frequency and the statistical significance. The cluster frequency (10.2%) shows that out of 59 genes in the first bicluster 6 genes take part to this process, and the statistical significance is provided by a <it>p</it>-value of 8.62e-08. Furthermore, PDNS can improve all the biclusters of CC (resp. OPSM) and find biologically meaningful biclusters.</p>
               <p>For the worst (resp. the best) biclusters obtained from CC, i.e, <it>id9<sub>CC </sub></it>(resp. <it>id2<sub>CC</sub></it>) and OPSM, i.e., <it>id7<sub>OPSM </sub></it>(resp. <it>id10<sub>OPSM</sub></it>), we verify whether the PDNS algorithm can improve these biclusters to obtain biclusters of more relevant biological significance. We observe that PDNS does improve the worst and the best biclusters of CC and OPSM. For the worst biclusters which have no biological significant ("unknown"), i.e., <it>id9<sub>CC </sub></it>and <it>id7<sub>OPSM</sub></it>, the improved biclusters obtained by PDNS (<it>id9<sub>PDNS </sub></it>and <it>id7<sub>PDNS</sub></it>) tend to be more statistically and biologically significant. Indeed, when a bicluster has a bad quality, PDNS can improve it by replacing the bad genes/conditions by the good ones. For the best biclusters, i.e., <it>id2<sub>CC </sub></it>and <it>id10<sub>OPSM</sub></it>, PDNS can also improve them (<it>id2<sub>PDNS </sub></it>and <it>id10<sub>PDNS</sub></it>) by improving the respective <it>p</it>-value.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>We have presented the pattern-driven neighborhood search for the biclustering problem of microarray data. PDNS alternates between a descent-based intensification phase and a perturbation phase. By using a behavior matrix representation of solutions, the descent search procedure is guided by a pattern-based neighbourhood which is defined by two move operators. These operators change respectively the rows and columns of the current solution according to the pattern information related to each row and each column of the current solution as well as the initial matrix. Perturbation is realized by changing randomly a percentage of rows and columns of the best recorded solution (an option would be to constraint the changes to some critical rows and columns).</p>
         <p>The proposed algorithm has been assessed using two well-known microarray datasets (Yeast Cell-Cycle and Saccharomyces Cerevisiae). The experimental study showed competitive results of PDNS in comparison with other popular biclustering algorithms by providing statistically and biologically significant biclusters. PDNS is a computationally effective method and can also be used to improve biclusters obtained by other methods.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>WA carried out the implementation of the proposed idea, performed the statistical and biological experiments using <it>FuncAssociate </it>and <it>GoTermFinder</it>, and wrote the draft manuscript. JKH supervised the project and co-wrote the manuscript. ME participated in the correction of the final manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was partially supported by the projects 'Bioinformatique Lig&#233;rienne - BIL' (2009-2011, Pays de La Loire, France) and Radapop (2009-2013, Pays de La Loire, France) which are acknowledged. We thank the reviewers of the paper for their comments and suggestions.</p>
            <p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 13 Supplement 7, 2012: Advanced intelligent computing theories and their applications in bioinformatics. Proceedings of the 2011 International Conference on Intelligent Computing (ICIC 2011). The full contents of the supplement are available online at <url>http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S7</url>.</p>
         </sec>
      </ack>
      <refgrp><bibl id="B1"><title><p>The use and analysis of microarray data</p></title><aug><au><snm>Butte</snm><fnm>A</fnm></au></aug><source>Nat Rev Drug Discov</source><pubdate>2002</pubdate><volume>1</volume><fpage>951</fpage><lpage>960</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrd961</pubid><pubid idtype="pmpid" link="fulltext">12461517</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting</p></title><aug><au><snm>Dupuy</snm><fnm>A</fnm></au><au><snm>Simon</snm><fnm>RM</fnm></au></aug><source>J Natl Cancer Inst</source><pubdate>2007</pubdate><volume>99</volume><issue>2</issue><fpage>147</fpage><lpage>157</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/jnci/djk018</pubid><pubid idtype="pmpid" link="fulltext">17227998</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Applications of DNA microarrays in biology</p></title><aug><au><snm>Stoughton</snm><fnm>RB</fnm></au></aug><source>Annu Rev Biochem</source><pubdate>2005</pubdate><volume>74</volume><fpage>53</fpage><lpage>82</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1146/annurev.biochem.74.082803.133212</pubid><pubid idtype="pmpid" link="fulltext">15952881</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Machine learning in bioinformatics</p></title><aug><au><snm>Larranaga</snm><fnm>P</fnm></au><au><snm>Calvo</snm><fnm>B</fnm></au><au><snm>Santana</snm><fnm>R</fnm></au><au><snm>Bielza</snm><fnm>C</fnm></au><au><snm>Galdiano</snm><fnm>J</fnm></au><au><snm>Inza</snm><fnm>I</fnm></au><au><snm>Lozano</snm><fnm>JA</fnm></au><au><snm>Armananzas</snm><fnm>R</fnm></au><au><snm>Santafe</snm><fnm>G</fnm></au><au><snm>Perez</snm><fnm>A</fnm></au><au><snm>Robles</snm><fnm>V</fnm></au></aug><source>Brief Bioinform</source><pubdate>2006</pubdate><volume>7</volume><fpage>86</fpage><lpage>112</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bib/bbk007</pubid><pubid idtype="pmpid" link="fulltext">16761367</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>A hybrid LDA and genetic algorithm for gene selection and classification of microarray data</p></title><aug><au><snm>Bonilla Huerta</snm><fnm>E</fnm></au><au><snm>Duval</snm><fnm>B</fnm></au><au><snm>Hao</snm><fnm>JK</fnm></au></aug><source>Neurocomputing</source><pubdate>2010</pubdate><volume>73</volume><issue>13-15</issue><fpage>2375</fpage><lpage>2383</lpage><xrefbib><pubid idtype="doi">10.1016/j.neucom.2010.03.024</pubid></xrefbib></bibl><bibl id="B6"><title><p>Advances in metaheuristics for gene selection and classification of microarray data</p></title><aug><au><snm>Duval</snm><fnm>B</fnm></au><au><snm>Hao</snm><fnm>JK</fnm></au></aug><source>Brief Bioinform</source><pubdate>2010</pubdate><volume>11</volume><issue>1</issue><fpage>127</fpage><lpage>142</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bib/bbp035</pubid><pubid idtype="pmpid" link="fulltext">19789265</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Support vector machine classification and validation of cancer tissue samples using microarray expression data</p></title><aug><au><snm>Furey</snm><fnm>TS</fnm></au><au><snm>Cristianini</snm><fnm>N</fnm></au><au><snm>Duffy</snm><fnm>N</fnm></au><au><snm>Bednarski</snm><fnm>DW</fnm></au><au><snm>Schummer</snm><fnm>M</fnm></au><au><snm>Haussler</snm><fnm>D</fnm></au></aug><source>Bioinformatics</source><pubdate>2000</pubdate><volume>16</volume><issue>10</issue><fpage>906</fpage><lpage>914</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/16.10.906</pubid><pubid idtype="pmpid" link="fulltext">11120680</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Gene selection for cancer classification using support vector machines</p></title><aug><au><snm>Guyon</snm><fnm>I</fnm></au><au><snm>Weston</snm><fnm>J</fnm></au><au><snm>Barnhill</snm><fnm>S</fnm></au><au><snm>Vapnik</snm><fnm>V</fnm></au></aug><source>Machine Learning</source><pubdate>2002</pubdate><volume>46</volume><fpage>389</fpage><lpage>422</lpage><xrefbib><pubid idtype="doi">10.1023/A:1012487302797</pubid></xrefbib></bibl><bibl id="B9"><title><p>A genetic embedded approach for gene selection and classification of microarray data</p></title><aug><au><snm>Hernandez Hernandez</snm><fnm>JC</fnm></au><au><snm>Duval</snm><fnm>B</fnm></au><au><snm>Hao</snm><fnm>JK</fnm></au></aug><source>The Fifth European Conference on Evolutionary Computation, Machine Learning and Datamining in Bioinformatics. LNCS</source><pubdate>2007</pubdate><volume>4447</volume><fpage>90</fpage><lpage>101</lpage><xrefbib><pubid idtype="doi">10.1007/978-3-540-71783-6_9</pubid></xrefbib></bibl><bibl id="B10"><title><p>Independent component analysis based penalized discriminant method for tumor classification using gene expression data</p></title><aug><au><snm>Huang</snm><fnm>DS</fnm></au><au><snm>Zheng</snm><fnm>CH</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><issue>15</issue><fpage>1855</fpage><lpage>1862</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl190</pubid><pubid idtype="pmpid" link="fulltext">16709589</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the ga/knn method</p></title><aug><au><snm>Li</snm><fnm>L</fnm></au><au><snm>Weinberg</snm><fnm>CR</fnm></au><au><snm>Darden</snm><fnm>TA</fnm></au><au><snm>Pedersen</snm><fnm>LG</fnm></au></aug><source>Bioinformatics</source><pubdate>2001</pubdate><volume>17</volume><issue>12</issue><fpage>1131</fpage><lpage>1142</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/17.12.1131</pubid><pubid idtype="pmpid" link="fulltext">11751221</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset</p></title><aug><au><snm>Li</snm><fnm>L</fnm></au><au><snm>Jiang</snm><fnm>W</fnm></au><au><snm>Li</snm><fnm>X</fnm></au><au><snm>Moser</snm><fnm>KL</fnm></au><au><snm>Guo</snm><fnm>Z</fnm></au><au><snm>Du</snm><fnm>L</fnm></au><au><snm>Wang</snm><fnm>Q</fnm></au><au><snm>Topol</snm><fnm>EJ</fnm></au><au><snm>Wang</snm><fnm>Q</fnm></au><au><snm>Rao</snm><fnm>S</fnm></au></aug><source>Genomics</source><pubdate>2005</pubdate><volume>85</volume><issue>1</issue><fpage>16</fpage><lpage>23</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ygeno.2004.09.007</pubid><pubid idtype="pmpid" link="fulltext">15607418</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Metasample-based sparse representation for tumor classification</p></title><aug><au><snm>Zheng</snm><fnm>CH</fnm></au><au><snm>Zhang</snm><fnm>L</fnm></au><au><snm>Ng</snm><fnm>VTY</fnm></au><au><snm>Shiu</snm><fnm>SCK</fnm></au><au><snm>Huang</snm><fnm>DS</fnm></au></aug><source>IEEE/ACM Trans Comput Biol Bioinform</source><pubdate>2011</pubdate><volume>8</volume><issue>5</issue><fpage>1273</fpage><lpage>1282</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">21282864</pubid></xrefbib></bibl><bibl id="B14"><title><p>Gene expression data classification using consensus independent component analysis</p></title><aug><au><snm>Zheng</snm><fnm>CH</fnm></au><au><snm>Huang</snm><fnm>DS</fnm></au><au><snm>Kong</snm><fnm>XZ</fnm></au><au><snm>Zhao</snm><fnm>XM</fnm></au></aug><source>Genomics Proteomics &amp; Bioinformatics</source><pubdate>2008</pubdate><volume>6</volume><issue>2</issue><fpage>74</fpage><lpage>82</lpage></bibl><bibl id="B15"><title><p>Feature selection in independent component subspace for microarray data classification</p></title><aug><au><snm>Zheng</snm><fnm>CH</fnm></au><au><snm>Huang</snm><fnm>DS</fnm></au><au><snm>Shang</snm><fnm>L</fnm></au></aug><source>Neurocomputing</source><pubdate>2006</pubdate><volume>69</volume><issue>16-18</issue><fpage>2407</fpage><lpage>2410</lpage><xrefbib><pubid idtype="doi">10.1016/j.neucom.2006.02.006</pubid></xrefbib></bibl><bibl id="B16"><title><p>Distinct types of diffuse large (b)-cell lymphoma identified by gene expression profiling</p></title><aug><au><snm>Alizadeh</snm><fnm>A</fnm></au><au><snm>Eisen</snm><fnm>MB</fnm></au><au><snm>Davis</snm><fnm>RE</fnm></au><au><snm>Ma</snm><fnm>C</fnm></au><au><snm>Lossos</snm><fnm>IS</fnm></au><au><snm>Rosenwald</snm><fnm>A</fnm></au><au><snm>Boldrick</snm><fnm>JC</fnm></au><au><snm>Sabet</snm><fnm>H</fnm></au><au><snm>Tran</snm><fnm>T</fnm></au><au><snm>Yu</snm><fnm>X</fnm></au><au><snm>Powell</snm><fnm>JI</fnm></au><au><snm>Yang</snm><fnm>L</fnm></au><au><snm>Marti</snm><fnm>GE</fnm></au><au><snm>Moore</snm><fnm>T</fnm></au><au><snm>Hudson</snm><fnm>JJ</fnm></au><au><snm>Lu</snm><fnm>L</fnm></au><au><snm>Lewis</snm><fnm>DB</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au><au><snm>Chan</snm><fnm>WC</fnm></au><au><snm>Greiner</snm><fnm>TC</fnm></au><au><snm>Weisenburger</snm><fnm>DD</fnm></au><au><snm>Armitage</snm><fnm>JO</fnm></au><au><snm>Warnke</snm><fnm>R</fnm></au><au><snm>Levy</snm><fnm>R</fnm></au><au><snm>Wilson</snm><fnm>W</fnm></au><au><snm>Grever</snm><fnm>MR</fnm></au><au><snm>Byrd</snm><fnm>JC</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au><au><snm>Staudt</snm><fnm>LM</fnm></au></aug><source>Nature</source><pubdate>2000</pubdate><volume>403</volume><fpage>503</fpage><lpage>511</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/35000501</pubid><pubid idtype="pmpid" link="fulltext">10676951</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Molecular classification of cancer: class discovery and class prediction by gene expression monitoring</p></title><aug><au><snm>Golub</snm><fnm>T</fnm></au><au><snm>Slonim</snm><fnm>D</fnm></au><au><snm>Tamayo</snm><fnm>P</fnm></au><au><snm>Huard</snm><fnm>C</fnm></au><au><snm>Gaasenbeek</snm><fnm>M</fnm></au><au><snm>Mesirov</snm><fnm>J</fnm></au><au><snm>Coller</snm><fnm>H</fnm></au><au><snm>Loh</snm><fnm>M</fnm></au><au><snm>Downing</snm><fnm>J</fnm></au><au><snm>Caligiuri</snm><fnm>M</fnm></au><au><snm>Bloomfield</snm><fnm>C</fnm></au><au><snm>Lander</snm><fnm>E</fnm></au></aug><source>Science</source><pubdate>1999</pubdate><volume>286</volume><fpage>531</fpage><lpage>537</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.286.5439.531</pubid><pubid idtype="pmpid">10521349</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>A combinational feature selection and ensemble neural network method for classification of gene expression data</p></title><aug><au><snm>Liu</snm><fnm>B</fnm></au><au><snm>Cui</snm><fnm>Q</fnm></au><au><snm>Jiang</snm><fnm>T</fnm></au><au><snm>Ma</snm><fnm>S</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2004</pubdate><volume>5</volume><fpage>136</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-5-136</pubid><pubid idtype="pmcid">522806</pubid><pubid idtype="pmpid" link="fulltext">15450124</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Molecular pattern discovery based on penalized matrix decomposition</p></title><aug><au><snm>Zheng</snm><fnm>CH</fnm></au><au><snm>Zhang</snm><fnm>L</fnm></au><au><snm>Ng</snm><fnm>VTY</fnm></au><au><snm>Shiu</snm><fnm>SCK</fnm></au><au><snm>Huang</snm><fnm>DS</fnm></au></aug><source>IEEE/ACM Trans Comput Biol Bioinform</source><pubdate>2011</pubdate><volume>8</volume><issue>6</issue><fpage>1592</fpage><lpage>1603</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">21519114</pubid></xrefbib></bibl><bibl id="B20"><title><p>Tumor clustering using non-negative matrix factorization with gene selection</p></title><aug><au><snm>Zheng</snm><fnm>CH</fnm></au><au><snm>Huang</snm><fnm>DS</fnm></au><au><snm>Zhang</snm><fnm>L</fnm></au><au><snm>Kong</snm><fnm>XZ</fnm></au></aug><source>IEEE Trans Inf Technol Biomed</source><pubdate>2009</pubdate><volume>13</volume><issue>4</issue><fpage>599</fpage><lpage>607</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">19369170</pubid></xrefbib></bibl><bibl id="B21"><title><p>Biclustering via optimal reordering of data matrices in systems biology: rigorous methods and comparative studies</p></title><aug><au><snm>Dimaggio</snm><fnm>P</fnm></au><au><snm>Mcallister</snm><fnm>S</fnm></au><au><snm>Feng</snm><fnm>C</fnm></au><au><snm>Floudas</snm><fnm>XJ</fnm></au><au><snm>Rabinowitz</snm><fnm>JD</fnm></au><au><snm>Rabitzl</snm><fnm>HA</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2008</pubdate><volume>9</volume><issue>1</issue><fpage>458</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-9-458</pubid><pubid idtype="pmcid">2605474</pubid><pubid idtype="pmpid" link="fulltext">18954459</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Biclustering algorithms for biological data analysis: a survey</p></title><aug><au><snm>Madeira</snm><fnm>SC</fnm></au><au><snm>Oliveira</snm><fnm>AL</fnm></au></aug><source>IEEE/ACM Trans Comput Biol Bioinform</source><pubdate>2004</pubdate><volume>1</volume><issue>1</issue><fpage>24</fpage><lpage>45</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1109/TCBB.2004.2</pubid><pubid idtype="pmpid" link="fulltext">17048406</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Discovering statistically significant biclusters in gene expression data</p></title><aug><au><snm>Tanay</snm><fnm>A</fnm></au><au><snm>Sharan</snm><fnm>R</fnm></au><au><snm>Shamir</snm><fnm>R</fnm></au></aug><source>Bioinformatics</source><pubdate>2002</pubdate><volume>18</volume><fpage>S136</fpage><lpage>S144</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/18.suppl_1.S136</pubid><pubid idtype="pmpid" link="fulltext">12169541</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Biclustering of expression data</p></title><aug><au><snm>Cheng</snm><fnm>Y</fnm></au><au><snm>Church</snm><fnm>GM</fnm></au></aug><source>Proc Int Conf Intell Syst Mol Biol</source><pubdate>2000</pubdate><volume>8</volume><fpage>93</fpage><lpage>103</lpage><xrefbib><pubid idtype="pmpid">10977070</pubid></xrefbib></bibl><bibl id="B25"><title><p>Biclustering of microarray data</p></title><aug><au><snm>Ayadi</snm><fnm>W</fnm></au><au><snm>Elloumi</snm><fnm>M</fnm></au></aug><source>Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications. Wiley Book Series on Bioinformatics: Computational Techniques and Engineering</source><publisher>New Jersey, USA: John Wiley &amp; Sons Ltd</publisher><pubdate>2011</pubdate><fpage>651</fpage><lpage>664</lpage></bibl><bibl id="B26"><title><p>Biclustering in data mining</p></title><aug><au><snm>Busygin</snm><fnm>S</fnm></au><au><snm>Prokopyev</snm><fnm>O</fnm></au><au><snm>Pardalos</snm><fnm>PM</fnm></au></aug><source>Computers and Operations Research</source><pubdate>2008</pubdate><volume>35</volume><issue>9</issue><fpage>2964</fpage><lpage>2987</lpage><xrefbib><pubid idtype="doi">10.1016/j.cor.2007.01.005</pubid></xrefbib></bibl><bibl id="B27"><title><p>Computing the maximum similarity biclusters of gene expression data</p></title><aug><au><snm>Liu</snm><fnm>X</fnm></au><au><snm>Wang</snm><fnm>L</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>1</issue><fpage>50</fpage><lpage>56</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl560</pubid><pubid idtype="pmpid" link="fulltext">17090578</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>BicFinder: a biclustering algorithm for microarray data analysis</p></title><aug><au><snm>Ayadi</snm><fnm>W</fnm></au><au><snm>Elloumi</snm><fnm>M</fnm></au><au><snm>Hao</snm><fnm>JK</fnm></au></aug><source>Knowledge and Information Systems: An International Journal</source><pubdate>2012</pubdate><volume>30</volume><issue>2</issue><fpage>341</fpage><lpage>358</lpage><xrefbib><pubid idtype="doi">10.1007/s10115-011-0383-7</pubid></xrefbib></bibl><bibl id="B29"><title><p>Discovering local structure in gene expression data: the order-preserving submatrix problem</p></title><aug><au><snm>Ben-Dor</snm><fnm>A</fnm></au><au><snm>Chor</snm><fnm>B</fnm></au><au><snm>Karp</snm><fnm>R</fnm></au><au><snm>Yakhini</snm><fnm>Z</fnm></au></aug><source>Proceedings of the Sixth Annual International Conference on Computational Biology</source><publisher>New York, NY, USA</publisher><pubdate>2002</pubdate><fpage>49</fpage><lpage>57</lpage></bibl><bibl id="B30"><title><p>Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data</p></title><aug><au><snm>Teng</snm><fnm>L</fnm></au><au><snm>Chan</snm><fnm>L</fnm></au></aug><source>J Signal Process Syst</source><pubdate>2008</pubdate><volume>50</volume><issue>3</issue><fpage>267</fpage><lpage>280</lpage><xrefbib><pubid idtype="doi">10.1007/s11265-007-0121-2</pubid></xrefbib></bibl><bibl id="B31"><title><p>A condition-enumeration tree method for mining biclusters from DNA microarray data sets</p></title><aug><au><snm>Chen</snm><fnm>JR</fnm></au><au><snm>Chang</snm><fnm>YI</fnm></au></aug><source>Biosystems</source><pubdate>2009</pubdate><volume>97</volume><fpage>44</fpage><lpage>59</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.biosystems.2009.04.003</pubid><pubid idtype="pmpid" link="fulltext">19393714</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>BiMine+: an efficient algorithm for discovering relevant biclusters of DNA microarray data</p></title><aug><au><snm>Ayadi</snm><fnm>W</fnm></au><au><snm>Elloumi</snm><fnm>M</fnm></au><au><snm>Hao</snm><fnm>JK</fnm></au></aug><source>(Submitted)</source></bibl><bibl id="B33"><title><p>Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization</p></title><aug><au><snm>Cheng</snm><fnm>KO</fnm></au><au><snm>Law</snm><fnm>NF</fnm></au><au><snm>Siu</snm><fnm>WC</fnm></au><au><snm>Liew</snm><fnm>AW</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2008</pubdate><volume>9</volume><fpage>210</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-9-210</pubid><pubid idtype="pmcid">2396181</pubid><pubid idtype="pmpid" link="fulltext">18433478</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Op-cluster: clustering by tendency in high dimensional space</p></title><aug><au><snm>Liu</snm><fnm>J</fnm></au><au><snm>Wang</snm><fnm>W</fnm></au></aug><source>IEEE International Conference on Data Mining</source><pubdate>2003</pubdate><fpage>187</fpage><lpage>194</lpage></bibl><bibl id="B35"><title><p>Iterated local search for biclustering of microarray data</p></title><aug><au><snm>Ayadi</snm><fnm>W</fnm></au><au><snm>Elloumi</snm><fnm>M</fnm></au><au><snm>Hao</snm><fnm>JK</fnm></au></aug><source>Proceedings of 5th IAPR International Conference on Pattern Recognition in Bioinformatics, PRIB2010. LNCS</source><publisher>Springer-Verlag</publisher><pubdate>2010</pubdate><volume>6282</volume><fpage>219</fpage><lpage>229</lpage></bibl><bibl id="B36"><title><p>Application of simulated annealing to the biclustering of gene expression data</p></title><aug><au><snm>Bryan</snm><fnm>K</fnm></au><au><snm>Cunningham</snm><fnm>P</fnm></au><au><snm>Bolshakova</snm><fnm>N</fnm></au></aug><source>IEEE Trans Inf Technol Biomed</source><pubdate>2006</pubdate><volume>10</volume><issue>3</issue><fpage>519</fpage><lpage>525</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1109/TITB.2006.872073</pubid><pubid idtype="pmpid">16871720</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Application of reactive grasp to the biclustering of gene expression data</p></title><aug><au><snm>Das</snm><fnm>S</fnm></au><au><snm>Idicula</snm><fnm>SM</fnm></au></aug><source>Proceedings of the International Symposium on Biocomputing</source><publisher>New York, NY, USA: ACM</publisher><pubdate>2010</pubdate><fpage>1</fpage><lpage>8</lpage></bibl><bibl id="B38"><title><p>Biclustering of gene expression data using reactive greedy randomized adaptive search procedure</p></title><aug><au><snm>Dharan</snm><fnm>A</fnm></au><au><snm>Nair</snm><fnm>AS</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2009</pubdate><volume>10</volume><issue>Suppl 1</issue><fpage>S27</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-10-S1-S27</pubid><pubid idtype="pmcid">2648745</pubid><pubid idtype="pmpid" link="fulltext">19208127</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>Biclustering of expression data with evolutionary computation</p></title><aug><au><snm>Divina</snm><fnm>F</fnm></au><au><snm>Aguilar-Ruiz</snm><fnm>JS</fnm></au></aug><source>IEEE Transactions on Knowledge &amp; Data Engineering</source><pubdate>2006</pubdate><volume>18</volume><issue>5</issue><fpage>590</fpage><lpage>602</lpage></bibl><bibl id="B40"><title><p>A multi-objective approach to discover biclusters in microarray data</p></title><aug><au><snm>Divina</snm><fnm>F</fnm></au><au><snm>Aguilar-Ruiz</snm><fnm>JS</fnm></au></aug><source>Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation</source><publisher>New York, NY, USA: ACM</publisher><pubdate>2007</pubdate><fpage>385</fpage><lpage>392</lpage></bibl><bibl id="B41"><title><p>Microarray biclustering: a novel memetic approach based on the pisa platform</p></title><aug><au><snm>Gallo</snm><fnm>CA</fnm></au><au><snm>Carballido</snm><fnm>JA</fnm></au><au><snm>Ponzoni</snm><fnm>I</fnm></au></aug><source>Proceedings of the 7th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics</source><pubdate>2009</pubdate><fpage>44</fpage><lpage>55</lpage></bibl><bibl id="B42"><title><p>Multi-objective evolutionary biclustering of gene expression data</p></title><aug><au><snm>Mitra</snm><fnm>S</fnm></au><au><snm>Banka</snm><fnm>H</fnm></au></aug><source>Pattern Recognition</source><pubdate>2006</pubdate><volume>39</volume><issue>12</issue><fpage>2464</fpage><lpage>2477</lpage><xrefbib><pubid idtype="doi">10.1016/j.patcog.2006.03.003</pubid></xrefbib></bibl><bibl id="B43"><title><p>Clustering of time-course gene expression data using a mixed-effects model with b-splines</p></title><aug><au><snm>Luan</snm><fnm>Y</fnm></au><au><snm>Li</snm><fnm>H</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><fpage>474</fpage><lpage>482</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btg014</pubid><pubid idtype="pmpid" link="fulltext">12611802</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference</p></title><aug><au><snm>Peddada</snm><fnm>SD</fnm></au><au><snm>Lobenhofer</snm><fnm>EK</fnm></au><au><snm>Li</snm><fnm>L</fnm></au><au><snm>Afshari</snm><fnm>CA</fnm></au><au><snm>Weinberg</snm><fnm>CR</fnm></au><au><snm>Umbach</snm><fnm>DM</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><fpage>834</fpage><lpage>841</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btg093</pubid><pubid idtype="pmpid" link="fulltext">12724293</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Using hidden markov models to analyze gene expression time course data</p></title><aug><au><snm>Schliep</snm><fnm>A</fnm></au><au><snm>Schonhuth</snm><fnm>A</fnm></au><au><snm>Steinhoff</snm><fnm>C</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><fpage>i255</fpage><lpage>i263</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btg1036</pubid><pubid idtype="pmpid" link="fulltext">12855468</pubid></pubidlist></xrefbib></bibl><bibl id="B46"><title><p>Iterated local search</p></title><aug><au><snm>Lourenco</snm><fnm>HR</fnm></au><au><snm>Martin</snm><fnm>O</fnm></au><au><snm>Stutzle</snm><fnm>T</fnm></au></aug><source>Handbook of Meta-heuristics</source><publisher>Springer-Verlag</publisher><editor>Glover F, Kochenberger G</editor><pubdate>2003</pubdate><fpage>321</fpage><lpage>353</lpage></bibl><bibl id="B47"><title><p>Discovering pattern-based subspace clusters by pattern tree</p></title><aug><au><snm>Guan</snm><fnm>J</fnm></au><au><snm>Gan</snm><fnm>Y</fnm></au><au><snm>Wang</snm><fnm>H</fnm></au></aug><source>Knowledge-Based Systems</source><pubdate>2009</pubdate><volume>22</volume><issue>8</issue><fpage>569</fpage><lpage>579</lpage><xrefbib><pubid idtype="doi">10.1016/j.knosys.2009.02.011</pubid></xrefbib></bibl><bibl id="B48"><title><p>Random walk biclustering for microarray data</p></title><aug><au><snm>Angiulli</snm><fnm>F</fnm></au><au><snm>Cesario</snm><fnm>E</fnm></au><au><snm>Pizzuti</snm><fnm>C</fnm></au></aug><source>Information Sciences</source><pubdate>2008</pubdate><volume>178</volume><issue>6</issue><fpage>1479</fpage><lpage>1497</lpage><xrefbib><pubid idtype="doi">10.1016/j.ins.2007.11.007</pubid></xrefbib></bibl><bibl id="B49"><title><p>An EA framework for biclustering of gene expression data</p></title><aug><au><snm>Bleuler</snm><fnm>S</fnm></au><au><snm>Prelic</snm><fnm>A</fnm></au><au><snm>Zitzler</snm><fnm>E</fnm></au></aug><source>Proceedings of Congress on Evolutionary Computation</source><pubdate>2004</pubdate><fpage>166</fpage><lpage>173</lpage></bibl><bibl id="B50"><title><p>Enhanced biclustering on expression data</p></title><aug><au><snm>Yang</snm><fnm>J</fnm></au><au><snm>Wang</snm><fnm>H</fnm></au><au><snm>Wang</snm><fnm>W</fnm></au><au><snm>Yu</snm><fnm>P</fnm></au></aug><source>Proceedings of the 3rd IEEE Symposium on Bioinformatics and Bioengineering</source><publisher>Washington, DC, USA: IEEE Computer Society</publisher><pubdate>2003</pubdate><fpage>321</fpage><lpage>327</lpage></bibl><bibl id="B51"><title><p>Mining deterministic biclusters in gene expression data</p></title><aug><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Teo</snm><fnm>A</fnm></au><au><snm>Ooi</snm><fnm>BC</fnm></au><au><snm>Tan</snm><fnm>KL</fnm></au></aug><source>IEEE International Symposium on Bioinformatics and Bioengineering</source><pubdate>2004</pubdate><fpage>283</fpage><lpage>290</lpage></bibl><bibl id="B52"><title><p>Shifting and scaling patterns from gene expression data</p></title><aug><au><snm>Aguilar-Ruiz</snm><fnm>JS</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><fpage>3840</fpage><lpage>3845</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti641</pubid><pubid idtype="pmpid" link="fulltext">16144809</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>Virtual error: a new measure for evolutionary biclustering</p></title><aug><au><snm>Pontes</snm><fnm>B</fnm></au><au><snm>Divina</snm><fnm>F</fnm></au><au><snm>Gir&#225;ldez</snm><fnm>R</fnm></au><au><snm>Aguilar-Ruiz</snm><fnm>JS</fnm></au></aug><source>Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics</source><pubdate>2007</pubdate><fpage>217</fpage><lpage>226</lpage></bibl><bibl id="B54"><title><p>A biclustering algorithm based on a bicluster enumeration tree: application to DNA microarray data</p></title><aug><au><snm>Ayadi</snm><fnm>W</fnm></au><au><snm>Elloumi</snm><fnm>M</fnm></au><au><snm>Hao</snm><fnm>JK</fnm></au></aug><source>BioData Min</source><pubdate>2009</pubdate><volume>2</volume><issue>1</issue><fpage>9</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1756-0381-2-9</pubid><pubid idtype="pmcid">2804695</pubid><pubid idtype="pmpid" link="fulltext">20015398</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><aug><au><snm>Lehmann</snm><fnm>EL</fnm></au><au><snm>D'Abrera</snm><fnm>HJM</fnm></au></aug><source>Nonparametrics: Statistical Methods Based on Ranks</source><publisher>Englewood Cliffs, NJ: Prentice Hall</publisher><pubdate>1998</pubdate><fpage>292</fpage><lpage>323</lpage></bibl><bibl id="B56"><title><p>Defining transcription modules using large-scale gene expression data</p></title><aug><au><snm>Bergmann</snm><fnm>S</fnm></au><au><snm>Ihmels</snm><fnm>J</fnm></au><au><snm>Barkai</snm><fnm>N</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><issue>13</issue><fpage>1993</fpage><lpage>2003</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth166</pubid><pubid idtype="pmpid" link="fulltext">15044247</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>A systematic comparison and evaluation of biclustering methods for gene expression data</p></title><aug><au><snm>Prelic</snm><fnm>A</fnm></au><au><snm>Bleuler</snm><fnm>S</fnm></au><au><snm>Zimmermann</snm><fnm>P</fnm></au><au><snm>Buhlmann</snm><fnm>P</fnm></au><au><snm>Gruissem</snm><fnm>W</fnm></au><au><snm>Hennig</snm><fnm>L</fnm></au><au><snm>Thiele</snm><fnm>L</fnm></au><au><snm>Zitzler</snm><fnm>E</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><issue>9</issue><fpage>1122</fpage><lpage>1129</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl060</pubid><pubid idtype="pmpid" link="fulltext">16500941</pubid></pubidlist></xrefbib></bibl><bibl id="B58"><title><p>BicAt: a biclustering analysis toolbox</p></title><aug><au><snm>Barkow</snm><fnm>S</fnm></au><au><snm>Bleuler</snm><fnm>S</fnm></au><au><snm>Prelic</snm><fnm>A</fnm></au><au><snm>Zimmermann</snm><fnm>P</fnm></au><au><snm>Zitzler</snm><fnm>E</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><issue>10</issue><fpage>1282</fpage><lpage>1283</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl099</pubid><pubid idtype="pmpid" link="fulltext">16551664</pubid></pubidlist></xrefbib></bibl><bibl id="B59"><title><p>Genomic expression programs in the response of yeast cells to environmental changes</p></title><aug><au><snm>Gasch</snm><fnm>AP</fnm></au><au><snm>Spellman</snm><fnm>PT</fnm></au><au><snm>Kao</snm><fnm>CM</fnm></au><au><snm>Carmel-Harel</snm><fnm>O</fnm></au><au><snm>Eisen</snm><fnm>MB</fnm></au><au><snm>Storz</snm><fnm>G</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au></aug><source>Mol Biol Cell</source><pubdate>2000</pubdate><volume>11</volume><issue>12</issue><fpage>4241</fpage><lpage>4257</lpage><xrefbib><pubidlist><pubid idtype="pmcid">15070</pubid><pubid idtype="pmpid" link="fulltext">11102521</pubid></pubidlist></xrefbib></bibl><bibl id="B60"><title><p>Characterizing gene sets with funcassociate</p></title><aug><au><snm>Berriz</snm><fnm>GF</fnm></au><au><snm>King</snm><fnm>OD</fnm></au><au><snm>Bryant</snm><fnm>B</fnm></au><au><snm>Sander</snm><fnm>C</fnm></au><au><snm>Roth</snm><fnm>FP</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><issue>18</issue><fpage>2502</fpage><lpage>2504</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btg363</pubid><pubid idtype="pmpid" link="fulltext">14668247</pubid></pubidlist></xrefbib></bibl><bibl id="B61"><title><p>Systematic determination of genetic network architecture</p></title><aug><au><snm>Tavazoie</snm><fnm>S</fnm></au><au><snm>Hughes</snm><fnm>JD</fnm></au><au><snm>Campbell</snm><fnm>MJ</fnm></au><au><snm>Cho</snm><fnm>RJ</fnm></au><au><snm>Church</snm><fnm>GM</fnm></au></aug><source>Nat Genet</source><pubdate>1999</pubdate><volume>22</volume><fpage>281</fpage><lpage>285</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/10343</pubid><pubid idtype="pmpid" link="fulltext">10391217</pubid></pubidlist></xrefbib></bibl><bibl id="B62"><title><p>Biclustering of expression data</p></title><aug><au><snm>Cheng</snm><fnm>Y</fnm></au><au><snm>Church</snm><fnm>GM</fnm></au></aug><source>Technical Report, (Supplementary Information)</source><pubdate>2006</pubdate></bibl></refgrp>
   </bm>
</art>