<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-6-76</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>An Entropy-based gene selection method for cancer classification using microarray data</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Liu</snm>
               <fnm>Xiaoxing</fnm>
               <insr iid="I1"/>
               <email>xiaoxing@bii.a-star.edu.sg</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Krishnan</snm>
               <fnm>Arun</fnm>
               <insr iid="I1"/>
               <email>arun@bii.a-star.edu.sg</email>
            </au>
            <au id="A3">
               <snm>Mondry</snm>
               <fnm>Adrian</fnm>
               <insr iid="I1"/>
               <email>adrian@bii.a-star.edu.sg</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Bioinformatics Institute, 30, Biopolis Street, #07-01, (S) 138671, Singapore</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2005</pubdate>
         <volume>6</volume>
         <issue>1</issue>
         <fpage>76</fpage>
         <url>http://www.biomedcentral.com/1471-2105/6/76</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15790388</pubid>
               <pubid idtype="doi">10.1186/1471-2105-6-76</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>13</day>
               <month>7</month>
               <year>2004</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>24</day>
               <month>3</month>
               <year>2005</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>24</day>
               <month>3</month>
               <year>2005</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2005</year>
         <collab>Liu et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Accurate diagnosis of cancer subtypes remains a challenging problem. Building classifiers based on gene expression data is a promising approach; yet the selection of non-redundant but relevant genes is difficult.</p>
               <p>The selected gene set should be small enough to allow diagnosis even in regular clinical laboratories and ideally identify genes involved in cancer-specific regulatory pathways. Here an entropy-based method is proposed that selects genes related to the different cancer classes while at the same time reducing the redundancy among the genes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The present study identifies a subset of features by maximizing the relevance and minimizing the redundancy of the selected genes. A merit called <it>normalized mutual information </it>is employed to measure the relevance and the redundancy of the genes. In order to find a more representative subset of features, an iterative procedure is adopted that incorporates an initial clustering followed by data partitioning and the application of the algorithm to each of the partitions. A leave-one-out approach then selects the most commonly selected genes across all the different runs and the gene selection algorithm is applied again to pare down the list of selected genes until a <it>minimal </it>subset is obtained that gives a satisfactory accuracy of classification.</p>
               <p>The algorithm was applied to three different data sets and the results obtained were compared to work done by others using the same data sets</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>This study presents an entropy-based iterative algorithm for selecting genes from microarray data that are able to classify various cancer sub-types with high accuracy. In addition, the feature set obtained is very compact, that is, the redundancy between genes is reduced to a large extent. This implies that classifiers can be built with a smaller subset of genes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>DNA microarrays have become ubiquitous in analyzing the expression profiles of genes in the hope to distinguish between various disease types, such as discriminating between various cancer sub-types. Differential expression of genes is analyzed statistically and genes are assigned to various classes which may (or not) enhance the understanding of underlying biological processes. Alternatively, a reduced set of genes may be singled out and used as biomarkers for diagnosis and prognosis.</p>
         <p>Microarray data is typically used both to discover new classes as well as in class prediction. Discovery of new classes <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp> is usually achieved with the help of clustering techniques such as hierarchical clustering <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, k-means clustering <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and self organizing maps (SOM) <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Class prediction, involving the assignment of labels to samples based on their expression patterns, is typically based on statistical or supervised machine learning methods. These range from the application of simple techniques such as nearest neighbor algorithms <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> to classical methods such as linear discriminant analysis <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> to more advanced techniques such as neural networks <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, support vector machines <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>, fuzzy logic <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> and decision trees <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. The challenge in dealing with microarray data lies in the fact that there are orders of magnitude differences between the number of samples (typically less than a hundred) and the number of genes (typically tens of thousands) that are studied. The measurements also typically contain both measurement noise as well as systemic noise. This could have a significant impact on classification accuracy. Classification must therefore be preceded by a step known as feature selection where a subset of relevant features is identified.</p>
         <p>There are a number of advantages to feature set selection. The first lies in reducing the cost of clinical diagnosis. It is much cheaper to focus only on the expression of a few genes rather than on thousands of genes for diagnosis <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Feature set selection can also lead to a reduction in computational cost as a result of a reduction in problem dimensionality. Furthermore, feature set selection often gives rise to a much smaller and a more compact gene set. This could make it easier to identify genes of particular importance to the problem under study. Moreover, given the disparity in the magnitudes of the numbers of genes and samples, it is difficult to justify the development of a classifier based on a gene set where the number of genes is greater than the number of samples.</p>
         <p>One way to categorize feature set selection approaches is to classify them as either filter (such as those based on statistical tests such as <it>t</it>-test, <it>F</it>-test etc.) or wrapper <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> methods. These methods have the advantage of having very low computational complexity as well as better generalization potential since they are uncorrelated to the learning method.</p>
         <p>Wrapper type approaches are those in which the feature selection method is bundled together with the learning method. This implies that the usefulness of a feature is validated by the estimated accuracy of the learning method. In consequence, often, a small subset of the feature set with very high prediction accuracy can be obtained because the characteristics of the features match well with the characteristics of the learning method.</p>
         <p>Another way of categorizing feature set selection approaches is as univariate or multivariate <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Univariate methods <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B19">19</abbr></abbrgrp> consider the contributions of individual genes to the classification independently. In contrast multivariate methods such as recursive feature elimination (RFE) <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, leave one out (LOO) method <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, mutual information based approaches <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> etc., measure the relative contribution of a gene to the classification by taking the effect of other genes into consideration at the same time.</p>
         <p>A serious deficiency of currently used multivariate approaches for feature set selection is that they are based on selecting genes which are maximally relevant with respect to the classes. The problem with this approach is that there might still be genes among the selected set that are heavily correlated with each other and thus leading to a redundancy in the selected feature set. Ding et. al. <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> have used mutual information for gene selection that has maximum relevance with minimal redundancy by solving a simple two-objective optimization.</p>
         <p>In the study presented here, a similar approach has been followed for feature set selection by trying to maximize the relevance and minimize the redundancy of the selected genes. However, <it>normalized mutual information </it>has been used instead of mutual information. In addition, both Battiti's greedy selection algorithm <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> as well as a simulated annealing based approach <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> have been used. In order to find a more representative subset of features, an iterative procedure was adopted that incorporates an initial clustering followed by data partitioning and the application of the algorithm to each of the partitions. A leave-one-out approach then selects the most commonly selected genes across all the different runs and the gene selection algorithm is applied again to pare down the list of selected genes until a <it>minimal </it>subset that gives a satisfactory accuracy of classification is obtained. The algorithm was applied to three different data sets and the results obtained were compared to work done by others using the same data sets. Additionally the algorithm was also compared to work done by Ding and Peng <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> for three different datasets.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Datasets</p>
            </st>
            <p>Three public microarray data sets were used to assess the performance of the algorithm.</p>
            <sec>
               <st>
                  <p>SRBCT data</p>
               </st>
               <p>This data set includes 88 cDNA arrays for 63 training samples and 25 test samples from <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. All samples were combined together and the 5 non-SRBCT samples were removed. The data set consists of four types of tumors in childhood, including Ewing's sarcoma (EWS), rhabdomyosarcoma (RMS), neuroblastoma (NB), and Burkitt lymphoma (BL). After filtering by <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, 2308 genes remained in the data set. The data was transformed to natural logarithmic values. Finally, each sample was also standardized to zero mean and unit variance.</p>
            </sec>
            <sec>
               <st>
                  <p>Breast cancer data</p>
               </st>
               <p>This data set contains expression levels of 7129 genes in 49 breast tumor samples from <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The samples were classified according to their estrogen receptor (ER) status. 25 samples were ER positive while the other 24 samples were ER negative. In the pre-processing procedure, the data was thresholded with a floor of 100 and a ceiling of 16000 Affymetrix intensity units. Then those genes with <graphic file="1471-2105-6-76-i1.gif"/> &#8804; 5 or <it>max - min </it>&#8804; 500 were excluded. The filtered data was transformed to base 10 logarithmic values. Finally, each sample was standardized to zero mean and unit variance.</p>
            </sec>
            <sec>
               <st>
                  <p>Colon cancer data</p>
               </st>
               <p>This data set contains expression levels of 40 tumor and 22 normal colon tissues. Only the 2000 genes with the highest minimal intensity were selected by <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The data waspre-processed by transforming the raw intensities to base 10 logarithmic values and standardizing each sample to zero mean and unit variance.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Results</p>
            </st>
            <p>The results of the application of the full algorithm using both the greedy selection algorithm as well as the simulated annealing algorithm for solving Problem 2 are shown in Table <tblr tid="T1">1</tblr>. The associated clustering dendrograms are shown in Figures <figr fid="F1">1</figr>, <figr fid="F3">3</figr> and <figr fid="F5">5</figr>, respectively. For all the dendrograms, the samples are presented along the x-axis with the gene-set along the y-axis. <it>Orange </it>reflects up-expression while <it>yellow </it>represents no or little expression.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Clustering dendrogram of SRBCT data &#8211; First Iteration</p>
               </caption>
               <text>
                  <p>Clustering dendrogram of SRBCT data &#8211; First Iteration.</p>
               </text>
               <graphic file="1471-2105-6-76-1"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Clustering dendrogram of breast cancer data &#8211; First Iteration</p>
               </caption>
               <text>
                  <p>Clustering dendrogram of breast cancer data &#8211; First Iteration.</p>
               </text>
               <graphic file="1471-2105-6-76-3"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Clustering dendrogram of colon cancer data &#8211; First Iteration</p>
               </caption>
               <text>
                  <p>Clustering dendrogram of colon cancer data &#8211; First Iteration.</p>
               </text>
               <graphic file="1471-2105-6-76-5"/>
            </fig>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Classification accuracies and the number of selected genes for the two different optimizationmethods (Greedy and Simulated Annealing (SA)). For Greedy selection, the accuracies as well as the number of genes selected in iterations 1 and 2 are shown. The reported accuracies are LOOCV accuracies while the number of genes is the smaller subset common to all LOOCV experiments.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="center">
                        <p>Algorithm</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Colon</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Breast</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>SRBCT</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>% Acc</p>
                     </c>
                     <c ca="center">
                        <p># Genes</p>
                     </c>
                     <c ca="center">
                        <p>% Acc</p>
                     </c>
                     <c ca="center">
                        <p># Genes</p>
                     </c>
                     <c ca="center">
                        <p>% Acc</p>
                     </c>
                     <c ca="center">
                        <p># Genes</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Greedy</p>
                     </c>
                     <c ca="center">
                        <p>90.3/91.9</p>
                     </c>
                     <c ca="center">
                        <p>29/9</p>
                     </c>
                     <c ca="center">
                        <p>89.8/89.8</p>
                     </c>
                     <c ca="center">
                        <p>31/12</p>
                     </c>
                     <c ca="center">
                        <p>100/100</p>
                     </c>
                     <c ca="center">
                        <p>58/14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SA</p>
                     </c>
                     <c ca="center">
                        <p>87.1/-</p>
                     </c>
                     <c ca="center">
                        <p>26/-</p>
                     </c>
                     <c ca="center">
                        <p>89.8/-</p>
                     </c>
                     <c ca="center">
                        <p>44/-</p>
                     </c>
                     <c ca="center">
                        <p>100/-</p>
                     </c>
                     <c ca="center">
                        <p>58/-</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>The results for SRBCT were the best with a 100% accuracy obtained. The number of genes selected in this case was 58 as opposed to the 96 genes selected by Khan <it>et al</it>. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. It is interesting to note that when the binary optimization algorithm was used to select genes for the SRBCT data, 50 of the genes selected were the same as those selected with the greedy algorithm. The accuracy rate for breast cancer data was similar for both cases with about 5 samples being misclassified. The final gene set for this data set contained 31 genes. For colon cancer data, there were 6 mis-classifications, with an overall accuracy rate of 90.3%. There were 29 genes in the final selected gene set.</p>
            <p>There seems to be no quantitative or qualitative difference when using the greedy selection or the binary optimization algorithm. Moreover, since the simulated annealing procedure requires an inordinate amount of computation time (of the order of days) as compared to the greedy selection algorithm (of the order of a couple of hours), the iterative procedure was implemented with the greedy algorithm. The iterative approach shown in Figure <figr fid="F8">8</figr> was used for all three data sets and the clustering dendrograms with the reduced feature sets are shown in Figures <figr fid="F2">2</figr>, <figr fid="F4">4</figr> and <figr fid="F6">6</figr> respectively. It is interesting to note that the classification accuracy is not affected by using a much reduced feature set. In fact, for colon cancer data, the accuracy improved to 91.9%.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Clustering dendrogram of SRBCT data &#8211; Reduced Feature Set</p>
               </caption>
               <text>
                  <p>Clustering dendrogram of SRBCT data &#8211; Reduced Feature Set.</p>
               </text>
               <graphic file="1471-2105-6-76-2"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Clustering dendrogram of breast cancer data &#8211; Reduced Feature Set</p>
               </caption>
               <text>
                  <p>Clustering dendrogram of breast cancer data &#8211; Reduced Feature Set.</p>
               </text>
               <graphic file="1471-2105-6-76-4"/>
            </fig>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Clustering dendrogram of colon cancer data &#8211; Reduced Feature Set</p>
               </caption>
               <text>
                  <p>Clustering dendrogram of colon cancer data &#8211; Reduced Feature Set.</p>
               </text>
               <graphic file="1471-2105-6-76-6"/>
            </fig>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Optimal Feature Set Selection Algorithm</p>
               </caption>
               <text>
                  <p><b>Optimal Feature Set Selection Algorithm</b>. The function CLUSTER uses the k-means clustering approach to partition the initial gene set into the desired number of partitions <it>K</it>, with <it>G </it>genes in each partition. <it>K </it>and <it>G </it>are user-specified. The function SELECT_GENES uses either the greedy approach (Figure 7) or the heuristic simulated annealing approach to solve Problem 2. The function CLASSIFICATION_ERROR uses kNN classification method to assess the discriminant power of the selected genes and returns the classification error.</p>
               </text>
               <graphic file="1471-2105-6-76-8"/>
            </fig>
            <p>One of the main concerns while carrying out a multi-objective optimization is the presence of the weight factor <it>&#946;</it>. The selection of <it>&#946; </it>is usually heuristic. Battiti suggested in <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> that the value of <it>&#946; </it>between 0.5 and 1.0 is appropriate for most cases. The effect of changing <it>&#946; </it>was studied by changing its value from 0 to 1 in steps of 0.2. using the colon cancer data set and the classification accuracy calculated (Table <tblr tid="T2">2</tblr>). A value of (0.5 &#8211; 1.0) for <it>&#946; </it>seems appropriate. Also, the order of selection of the first 10 genes was examined (Table <tblr tid="T3">3</tblr>). It appears that varying <it>&#946; </it>does affect the gene selection order to a certain extent. For example, comparing the gene selection orders for <it>&#946; </it>= 0.6, 0.8 reveals that genes 267 and 513 swap places while genes 1256 (for <it>&#946; </it>= 0.6) and 1727 (for <it>&#946; </it>= 0.8) are not common to both cases. However, it must be kept in mind that the selection order in this case is <it>not </it>indicative of the relative importance of the genes since a greedy algorithm is being used.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Effect of varying <it>&#946; </it>on classification accuracy. The effect of varying <it>&#946; </it>was studied for the colon cancer data set. A value of between 0.5 &#8211; 1 as suggested by Battiti [21] seems appropriate.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="center">
                        <p>
                           <it>&#946;</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.0</p>
                     </c>
                     <c ca="center">
                        <p>0.2</p>
                     </c>
                     <c ca="center">
                        <p>0.4</p>
                     </c>
                     <c ca="center">
                        <p>0.6</p>
                     </c>
                     <c ca="center">
                        <p>0.8</p>
                     </c>
                     <c ca="center">
                        <p>1.0</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>accurate</p>
                     </c>
                     <c ca="center">
                        <p>87.1%</p>
                     </c>
                     <c ca="center">
                        <p>88.7%</p>
                     </c>
                     <c ca="center">
                        <p>90.3%</p>
                     </c>
                     <c ca="center">
                        <p>90.3%</p>
                     </c>
                     <c ca="center">
                        <p>90.3%</p>
                     </c>
                     <c ca="center">
                        <p>90.3%</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Effect of varying <it>&#946; </it>on the selection order of genes. The first ten genes selected for each value of <it>&#946; </it>are shown here. The numbers correspond to the gene numbers for the colon cancer data set. Varying <it>&#946; </it>does seem to affect the order in which the genes are selected. However, selection order is not indicative of the relative importance of genes since a greedy-selection method is being used.</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="center">
                        <p>
                           <it>&#946;</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>g<sup>1</sup></p>
                     </c>
                     <c ca="center">
                        <p>g<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>g<sup>3</sup></p>
                     </c>
                     <c ca="center">
                        <p>g<sup>4</sup></p>
                     </c>
                     <c ca="center">
                        <p>g<sup>5</sup></p>
                     </c>
                     <c ca="center">
                        <p>g<sup>6</sup></p>
                     </c>
                     <c ca="center">
                        <p>g<sup>7</sup></p>
                     </c>
                     <c ca="center">
                        <p>g<sup>8</sup></p>
                     </c>
                     <c ca="center">
                        <p>g<sup>9</sup></p>
                     </c>
                     <c ca="center">
                        <p>g<sup>10</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>0.0</p>
                     </c>
                     <c ca="center">
                        <p>377</p>
                     </c>
                     <c ca="center">
                        <p>267</p>
                     </c>
                     <c ca="center">
                        <p>765</p>
                     </c>
                     <c ca="center">
                        <p>493</p>
                     </c>
                     <c ca="center">
                        <p>1582</p>
                     </c>
                     <c ca="center">
                        <p>513</p>
                     </c>
                     <c ca="center">
                        <p>1635</p>
                     </c>
                     <c ca="center">
                        <p>1671</p>
                     </c>
                     <c ca="center">
                        <p>245</p>
                     </c>
                     <c ca="center">
                        <p>780</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>0.2</p>
                     </c>
                     <c ca="center">
                        <p>377</p>
                     </c>
                     <c ca="center">
                        <p>267</p>
                     </c>
                     <c ca="center">
                        <p>1582</p>
                     </c>
                     <c ca="center">
                        <p>513</p>
                     </c>
                     <c ca="center">
                        <p>765</p>
                     </c>
                     <c ca="center">
                        <p>493</p>
                     </c>
                     <c ca="center">
                        <p>1635</p>
                     </c>
                     <c ca="center">
                        <p>1671</p>
                     </c>
                     <c ca="center">
                        <p>780</p>
                     </c>
                     <c ca="center">
                        <p>1423</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>0.4</p>
                     </c>
                     <c ca="center">
                        <p>377</p>
                     </c>
                     <c ca="center">
                        <p>1582</p>
                     </c>
                     <c ca="center">
                        <p>267</p>
                     </c>
                     <c ca="center">
                        <p>513</p>
                     </c>
                     <c ca="center">
                        <p>493</p>
                     </c>
                     <c ca="center">
                        <p>765</p>
                     </c>
                     <c ca="center">
                        <p>1635</p>
                     </c>
                     <c ca="center">
                        <p>1671</p>
                     </c>
                     <c ca="center">
                        <p>780</p>
                     </c>
                     <c ca="center">
                        <p>1491</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>0.6</p>
                     </c>
                     <c ca="center">
                        <p>377</p>
                     </c>
                     <c ca="center">
                        <p>1582</p>
                     </c>
                     <c ca="center">
                        <p>267</p>
                     </c>
                     <c ca="center">
                        <p>513</p>
                     </c>
                     <c ca="center">
                        <p>1491</p>
                     </c>
                     <c ca="center">
                        <p>493</p>
                     </c>
                     <c ca="center">
                        <p>1635</p>
                     </c>
                     <c ca="center">
                        <p>765</p>
                     </c>
                     <c ca="center">
                        <p>1671</p>
                     </c>
                     <c ca="center">
                        <p>1256</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>0.8</p>
                     </c>
                     <c ca="center">
                        <p>377</p>
                     </c>
                     <c ca="center">
                        <p>1582</p>
                     </c>
                     <c ca="center">
                        <p>513</p>
                     </c>
                     <c ca="center">
                        <p>267</p>
                     </c>
                     <c ca="center">
                        <p>1491</p>
                     </c>
                     <c ca="center">
                        <p>1727</p>
                     </c>
                     <c ca="center">
                        <p>493</p>
                     </c>
                     <c ca="center">
                        <p>1635</p>
                     </c>
                     <c ca="center">
                        <p>1671</p>
                     </c>
                     <c ca="center">
                        <p>765</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1.0</p>
                     </c>
                     <c ca="center">
                        <p>377</p>
                     </c>
                     <c ca="center">
                        <p>1582</p>
                     </c>
                     <c ca="center">
                        <p>1491</p>
                     </c>
                     <c ca="center">
                        <p>513</p>
                     </c>
                     <c ca="center">
                        <p>267</p>
                     </c>
                     <c ca="center">
                        <p>1727</p>
                     </c>
                     <c ca="center">
                        <p>1244</p>
                     </c>
                     <c ca="center">
                        <p>1256</p>
                     </c>
                     <c ca="center">
                        <p>1671</p>
                     </c>
                     <c ca="center">
                        <p>1873</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>We also compared our methodology to that of Ding and Peng <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> for three different datasets. The first dataset is the colon cancer dataset <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The second dataset is the leukemia dataset <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The third and final dataset used was the NCI dataset <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The results are tabulated in Table <tblr tid="T4">4</tblr>. As can be observed, the Uncertainity-based (UB) method (our method) seemed to do better than the DP (Ding and Peng) method for the colon dataset. On the other hand, for the leukemia dataset, DP proved superior to our method. For the NCI dataset, both methods performed poorly with the DP method having a slight edge. It must however be noted that the NCI dataset consists of 9 classes and only 60 samples. As a result, classifying the dataset with a very small sample size into 9 different classes and using only 15 genes is very difficult.</p>
            <p>A further difficulty in comparing different methodologies lies in the fact that the initial pre-processing step could also play a role in classification accuracies. In the absence of a uniformity of preprocessing of the datasets, it is difficult to draw general conclusions regarding the relative performances of two different methodologies.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Classification accuracies and the number of selected genes for the two different mutual information based methodologies (Uncertainity based (UB) and Ding and Peng's (DP)). The accuracies as well as the number of genes selected in iterations 1 and 2 respectively are shown for the UB method while the accuracies and genes selected for two different runs are shown for the DP case. For both methodologies, the accuracies reported are LOOCV accuracies.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="center">
                        <p>Algorithm</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Colon</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Leukemia</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>NCI</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>% Acc</p>
                     </c>
                     <c ca="center">
                        <p># Genes</p>
                     </c>
                     <c ca="center">
                        <p>% Acc</p>
                     </c>
                     <c ca="center">
                        <p># Genes</p>
                     </c>
                     <c ca="center">
                        <p>% Acc</p>
                     </c>
                     <c ca="center">
                        <p># Genes</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>UB (ours)</p>
                     </c>
                     <c ca="center">
                        <p>90.3/91.9</p>
                     </c>
                     <c ca="center">
                        <p>29/9</p>
                     </c>
                     <c ca="center">
                        <p>80.6/76.4</p>
                     </c>
                     <c ca="center">
                        <p>21/5</p>
                     </c>
                     <c ca="center">
                        <p>57.6/52.5</p>
                     </c>
                     <c ca="center">
                        <p>59/15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>DP</p>
                     </c>
                     <c ca="center">
                        <p>75.8/91.9</p>
                     </c>
                     <c ca="center">
                        <p>50/20</p>
                     </c>
                     <c ca="center">
                        <p>98.6/100</p>
                     </c>
                     <c ca="center">
                        <p>50/10</p>
                     </c>
                     <c ca="center">
                        <p>73.3/61.7</p>
                     </c>
                     <c ca="center">
                        <p>50/20</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>As commonly observed when analytical algorithms are compared, the performance shows mixed results. While Ding and Peng algorithm outperformed the one presented here (see table <tblr tid="T4">4</tblr>), it should be noted that the description of methods in their article did not allow us to compare both algorithms on equal terms as no gene ranking was provided, and thus the biological significance of their findings could not be assessed.</p>
            <p>A comparison between the accuracies obtained by the original papers (from which the datasets were obtained) and our method is given in Table <tblr tid="T5">5</tblr>. The list of genes selected for each dataset and their ranks in the original papers are given in the supplementary file.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Classification Results of Original Papers</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Dataset</p>
                     </c>
                     <c ca="center">
                        <p>Colon</p>
                     </c>
                     <c ca="center">
                        <p>Breast</p>
                     </c>
                     <c ca="center">
                        <p>SRBCT</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Accuracy (original)</p>
                     </c>
                     <c ca="center">
                        <p>only clustering</p>
                     </c>
                     <c ca="center">
                        <p>89.47</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Accuracy (UB)</p>
                     </c>
                     <c ca="center">
                        <p>91.9</p>
                     </c>
                     <c ca="center">
                        <p>89.8</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The details of the selected genes and the comparison with the original data are listed in the supplementary material for all three data sets. This section presents a discussion of the comparison of genes selected by the algorithm presented in this work with those presented in earlier work (or as in the case of Breast cancer and SRBCT data, in the original work).</p>
         <sec>
            <st>
               <p>SRBCT data set</p>
            </st>
            <p>There were a total of 41 genes that overlapped between the selection methods presented in this work and those by <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. The common genes were from all rank levels of the original method. Left out genes coded often, but not always for proteins from a functional system similar to those still selected here, as in the case of no. 233721 insulin-like growth factor binding protein (not selected here) and no. 296448 (insulin- like growth factor 2) and 207274 (insulin-like growth factor 2, exon 7 and additional ORF), which were selected by both methods. Interestingly, two viral oncogene sequences were not selected (nos. 417226 and 812965, v-myc avianmyelocytomatosis viral oncogene homologs), nor were some extra- cellular matrix associated genes (nos: 122159 and 809901, collagens type III and XV) both without replacement from similar genes. The seventeen newly selected genes that were not part of the original selection come from various functional systems. Of interest here is that while the original gene no. 245330 (Human krueppel-related zinc finger protein H-plk) was left out, gene no. 767495 (GLI-Krueppel family member GLI3) was newly selected. Such "nuclear localization signals" have been shown to be involved in processes determining proper nuclear localization <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, but may also be determinants of progression towards cancer <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Breast cancer data set</p>
            </st>
            <p>Out of the 31 genes selected here, 16 were not selected in the original publication <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, which selected 60 genes. The 45 genes not selected by the present method covered a large variety of physiological functions, without a specific pattern becoming obvious. Two genes linked to the ILGF were left out (no: s37730 and m62403), with no replacement. ILGF is linked to the development of a number of cancers (review in <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>). The fact that ILGF-linked genes are left out here may be discussed in two diametrically opposite ways. For once, leaving these genes out of the classification set may cause an oversight of the tissue's potential to induce further cancerous growth. More likely, though, it seems like whatever physiological role these genes play in the tissue, they do not contribute to distinguishing between various types of cancer.</p>
         </sec>
         <sec>
            <st>
               <p>Colon cancer data set</p>
            </st>
            <p>Contrary to the other two test sets, in the case of colon cancer, the original publication did not rank the gene set retrieved, so that a direct comparison of results was not possible. The same dataset, however, has been re- analyzed previously by Silvio Bicciato <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, using an auto associative neural network model, which yielded a ranked gene list. With the exception of Tetraspan-1, which heads the rank list with a weight of 0.9391, the top genes found by Bicciato for the reconstruction of the normal class concur with the rank list presented here, while only one gene (Heat shock 60 kD protein 1) is selected by both methods when compared to the gene list in <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> for the reconstruction of the tumor class. This tetraspan family of proteins is involved in cell adhesion processes at the gap junctions and one related protein was enhanced in highly metastatic gastric cancer <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Compared to the classification methods described in the original articles or previous third party analysis, the algorithm described here compares favorably in its capacity to select small sets of genes that distinguish between various cancer types. The observation that it leaves out several genes known to be involved in cancer development may indicate that this method's advantage lies more in good classification, but not in the detection of new dysfunctional regulatory mechanisms.</p>
         <p>Although preliminary results using a greedy selection algorithm are encouraging, additional work needs to be done in order to develop alternative methodologies for multi-objective optimization that can select a more optimal and representative set of genes for discriminating between various cancer sub-types.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>Algorithms for microarray data analysis typically focus on obtaining a set of genes that can distinguish between the different classes in a given sample set. Thus, the primary concern is to ensure the relevance of the genes to the classes under consideration.</p>
         <p>Given a microarray data set with <it>m </it>samples belonging to <it>k </it>known classes and <it>n </it>genes, we want to select out those genes which are able to predict the differences in the gene expression patterns in different sample classes. Define <graphic file="1471-2105-6-76-i2.gif"/>; |<it>c</it>| = <it>k</it>, as the vector labeling the classes of samples and <graphic file="1471-2105-6-76-i3.gif"/>; <it>i </it>&#8712; <it>n </it>as the gene expression profile of gene <it>i</it>. Let <graphic file="1471-2105-6-76-i4.gif"/> be the feature set of all genes and let <it>S </it>be the set of selected genes. Then, the feature set selection problem can be defined as follows:</p>
         <sec>
            <st>
               <p>Problem 1</p>
            </st>
            <p><it>Select a set S of genes, S </it>&#8834; <graphic file="1471-2105-6-76-i4.gif"/><it> such that &#8704; gene s </it>&#8712; <it>S the relevance of s with </it><graphic file="1471-2105-6-76-i2.gif"/><it>is maximized</it>.</p>
            <p>However, the feature set of genes selected will contain a number of redundant genes with sometimes little relevance to the classes. This is due to the fact that the presence of genes that are closely related to each other imply that there is a possibility of genes orthoganal to those in the selected set being left out of the final feature set. Moreover, the presence of genes with little relevance to the classes leads to a reduction in the "useful information".</p>
            <p>Ideally, selected genes should have high relevance with the classes while the redundancy among the selected genes is low. Most previous studies emphasized the selection of highly relevant genes. Ding et. al. <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> addressed the issue of the redundancies among the selected genes. The genes with high relevance are expected to be able to predict the classes of the samples. However, the prediction power is reduced if many redundant genes are selected. In contrast, a feature set that contains genes not only with high relevance with respect to the classes but with low mutual redundancy is more effective in its prediction capability.</p>
         </sec>
         <sec>
            <st>
               <p>Problem formulation</p>
            </st>
            <p>To assess the effectiveness of the genes, both the relevance and the redundancy need to be measured quantitatively. An entropy based correlation measure is chosen here. According to Shannon's information theory <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, the entropy of a random variable <it>X </it>can be defined as:</p>
            <p>
               <graphic file="1471-2105-6-76-i5.gif"/>
            </p>
            <p>Entropy measures the uncertainty of a random variable. For the measurement of the interdependency of two random variables <it>X </it>and <it>Y</it>, some researchers <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp> used mutual information, which is defined as:</p>
            <p><it>I</it>(<it>X</it>, <it>Y</it>) = <it>H</it>(<it>X</it>) + <it>H</it>(<it>Y</it>) - <it>H</it>(<it>X, Y</it>)&#160;&#160;&#160;(2)</p>
            <p>In order to ensure that different values are comparable and have similar effects, <it>normalized mutual information </it>is used as a measure and is defined as:</p>
            <p>
               <graphic file="1471-2105-6-76-i6.gif"/>
            </p>
            <p><it>U</it>(<it>X, Y</it>) is symmetrical and ranges from 0 to 1, with the value 1 indicating that the knowledge of one variable completely predicts the other (high mutual relevance) while the value 0 indicates that <it>X </it>and <it>Y </it>are independent (low mutual relevance).</p>
            <p>The mutual relevance between <graphic file="1471-2105-6-76-i3.gif"/> and <graphic file="1471-2105-6-76-i2.gif"/> can then be modeled by <it>U </it>(<graphic file="1471-2105-6-76-i7.gif"/>) while the dependency between two genes is <it>U </it>(<graphic file="1471-2105-6-76-i8.gif"/>).</p>
            <p>The total relevance of all selected genes is given by</p>
            <p>
               <graphic file="1471-2105-6-76-i9.gif"/>
            </p>
            <p>The total redundancy among the selected genes is given by</p>
            <p>
               <graphic file="1471-2105-6-76-i10.gif"/>
            </p>
            <p>Therefore, the problem of selecting genes can be reformulated as follows:</p>
         </sec>
         <sec>
            <st>
               <p>Problem 2</p>
            </st>
            <p><it>Select a set S of genes, S </it>&#8834; <graphic file="1471-2105-6-76-i4.gif"/><it>such that &#8704; g<sub>i </sub>&#8712; S, the total relevance of all the selected genes with </it><graphic file="1471-2105-6-76-i2.gif"/>, <it>J</it><sub>1</sub>, <it>is maximized while the total relevance among all the selected genes g<sub>i </sub>&#8712; S, J</it><sub>2</sub>, <it>is minimized</it>.</p>
            <p>This is a two-objective optimization problem. To solve it, a simple way is to combine these two objectives into one:</p>
            <p>
               <graphic file="1471-2105-6-76-i11.gif"/>
            </p>
            <p>where <it>&#946; </it>is a weight parameter.</p>
            <p>subsection* Algorithm</p>
            <p>To solve the above problem, Battiti <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> proposed a greedy algorithm. The procedure can be described as follows (see Figure <figr fid="F7">7</figr>):</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Greedy Algorithm</p>
               </caption>
               <text>
                  <p>Greedy Algorithm.</p>
               </text>
               <graphic file="1471-2105-6-76-7"/>
            </fig>
            <p>1. Initialization: <it>F </it>&#8592; <it>allgenes, S </it>&#8592; &#8709;.</p>
            <p>2. First gene: select gene <it>i </it>that has highest relevance <it>U </it>(<graphic file="1471-2105-6-76-i7.gif"/>). <it>g</it><sub><it>i </it></sub>&#8712; <it>S</it>, <it>F </it>\ <it>i</it>.</p>
            <p>3. Remaining genes: From <it>F</it>, select gene <it>j </it>that maximizes <graphic file="1471-2105-6-76-i12.gif"/>.</p>
            <p>4. Repeat the above step until the desired number of genes are obtained.</p>
            <p>The maximization problem (6) can also be re-formulated into a binary optimization problem. Let <it>x</it><sub><it>i </it></sub>be a binary variable with value 1 for selecting gene <it>i </it>while value 0 for not. Thus, Equation (6) can be rewritten into:</p>
            <p>
               <graphic file="1471-2105-6-76-i13.gif"/>
            </p>
            <p>It can be further rewritten into matrix form:</p>
            <p>max <it>U</it><sub><it>c</it></sub><sup><it>T</it></sup><it>x </it>- <it><it>&#946;</it>x</it><sup><it>T</it></sup><it>U</it><sub><it>p</it></sub><it>x</it>&#160;&#160;&#160;(8)</p>
            <p>where <it>U</it><sub><it>c </it></sub>is the relevance vector, <it>U</it><sub><it>p </it></sub>is matrix of pairwise redundancy.</p>
            <p>Beasley et al. <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> discussed several heuristic algorithms to solve such binary quadratic programming problems. A heuristic simulated annealing method was employed to solve the problem. The pseudo codes of simulated annealing can be obtained from <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
            <p>There are however limitations to both approaches. There is a possibility that the solution obtained for Problem 2 can lead to a local optimum. This could result in a sub-optimal feature set thereby affecting the prediction accuracy. In order to expand the search space, an iterative procedure was adopted. The data was initially clustered and partitioned into <it>K </it>groups, <it>C</it><sub>1</sub>, <it>C</it><sub>2</sub>,..., <it>C</it><sub><it>K </it></sub>by using k-means clustering. The idea was to group genes with similar expression patterns together. The greedy or heuristic simulated annealing procedure was then applied to select a subset of genes, <it>S</it><sub><it>k</it></sub>, from each partition, <it>k</it>, such that the selected genes had low mutual relevance with respect to each other while at the same having maximal relevance with the different classes. The genes selected from each subset are then combined to obtain a single gene set, that is, <it>S </it>= <it>S</it><sub>1 </sub>&#8899; <it>S</it><sub>2 </sub>&#8899; <it>S</it><sub>3</sub>,..., &#8899;<it>S</it><sub><it>K</it></sub>.</p>
            <p>The final set of genes is selected by carrying out a leave-one-out cross validation (LOOCV). For each run, one sample is held out for testing whilei the remaining <it>N </it>- 1 samples are used to train the classifier. The genes are selected by the algorithm using the training samples and then are used to classify the testing sample. The overall accuracy rate is calculated based on the correctness of the classifications of each testing sample. In order to get a deeper understanding of the selected genes, those genes found in common for all the <it>N </it>different runs of the LOOCV experiment are finally listed out for further investigation. The process of gene selection is repeated by selecting a subset of genes from this feature set, that gives a classification error that is below a user defined threshold <it>&#949;</it>. Nearest neighborhood (k-NN) classification method is used to assess the discriminant power of the selected genes by the method. The process is stopped when the error becomes greater than <it>&#949;</it>. The full algorithm is presented in Figure <figr fid="F8">8</figr>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>LXX was responsible for the development and implementation of the algorithm as well as for writing parts of the paper. AK was involved in algorithm development as well as in writing the manuscript. AM was responsible for the analysis of the results as well as manuscript preparation. All authors read and approved the manuscript.</p>
         <suppl id="S1">
            <title>
               <p>Additional File 1</p>
            </title>
            <text>
               <p>Selected Genes for All Datasets. The file contains the list of selected genes for each of the three datasets used in this study as well as the corresponding ranks of those selected genes in the original papers.</p>
            </text>
            <file name="1471-2105-6-76-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank the anonymous reviewers for their suggestions and critical reviews of the paper.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Molecular classification of Cancer:class discovery and class prediction by gene expression monitoring</p>
            </title>
            <aug>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gaasenbeek</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Coller</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Loh</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Downing</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Caligiuri</snm>
                  <fnm>MA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>286</volume>
            <fpage>531</fpage>
            <lpage>537</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1126/science.286.5439.531</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling</p>
            </title>
            <aug>
               <au>
                  <snm>Alizadeh</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lossos</snm>
                  <fnm>IS</fnm>
               </au>
               <au>
                  <snm>Rosenwald</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Boldrick</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Sabet</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tran</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>X</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <fpage>503</fpage>
            <lpage>511</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35000501</pubid>
                  <pubid idtype="pmpid" link="fulltext">10676951</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Molecular classification of cutaneous malignant melanoma by gene expression profiling</p>
            </title>
            <aug>
               <au>
                  <snm>Bittner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Meltzer</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Seftor</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hendrix</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Radmacher</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yakhini</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Ben-Dor</snm>
                  <fnm>A</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>406</volume>
            <fpage>536</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35020115</pubid>
                  <pubid idtype="pmpid" link="fulltext">10952317</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Molecular portraits of human breast tumours</p>
            </title>
            <aug>
               <au>
                  <snm>Perou</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Sorlie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>van de Rijn</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jeffrey</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Rees</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Pollack</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Ross</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Johnsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Akslen</snm>
                  <fnm>LA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>406</volume>
            <fpage>747</fpage>
            <lpage>752</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35021093</pubid>
                  <pubid idtype="pmpid" link="fulltext">10963602</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>An informatio-intensive approach to the molecular pharmacology of cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Weinstein</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>TG</fnm>
               </au>
               <au>
                  <snm>O'Connor</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Friend</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Fornace</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Kohn</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Fojo</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bates</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Rubinstein</snm>
                  <fnm>LV</fnm>
               </au>
               <au>
                  <snm>Anderson</snm>
                  <fnm>NLea</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>275</volume>
            <fpage>343</fpage>
            <lpage>349</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.275.5298.343</pubid>
                  <pubid idtype="pmpid" link="fulltext">8994024</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Systematic determination of genetic network architecture</p>
            </title>
            <aug>
               <au>
                  <snm>Tavazoi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Cho</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Nature Genetics</source>
            <pubdate>1999</pubdate>
            <volume>22</volume>
            <fpage>281</fpage>
            <lpage>285</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/10343</pubid>
                  <pubid idtype="pmpid" link="fulltext">10391217</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation</p>
            </title>
            <aug>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Kitareewan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dmitrovsky</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>National Academy of Sciences</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>2907</fpage>
            <lpage>2912</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.96.6.2907</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Comparison of discriminantion methods for the classification of tumors using gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Dudoit</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fridlyand</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Tech rep</source>
            <publisher>University of Carlifornia, Berkeley</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Feature (Gene) Selection in Gene Expression-Based Tumor Classification</p>
            </title>
            <aug>
               <au>
                  <snm>Xiong</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Boerwinkle</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Molecular Genetics and Metabolism</source>
            <pubdate>2001</pubdate>
            <volume>73</volume>
            <fpage>239</fpage>
            <lpage>247</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/mgme.2001.3193</pubid>
                  <pubid idtype="pmpid" link="fulltext">11461191</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Classification and diagnostic prediction of cancers using gene expression profiling andartificial neural networks</p>
            </title>
            <aug>
               <au>
                  <snm>Khan</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature Medicine</source>
            <pubdate>2001</pubdate>
            <volume>7</volume>
            <issue>6</issue>
            <fpage>673</fpage>
            <lpage>679</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/89044</pubid>
                  <pubid idtype="pmpid" link="fulltext">11385503</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Support vector machine classification of cancer tissue samples using microarray expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Cristiani</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Duffy</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bednarski</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Schummer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>906</fpage>
            <lpage>914</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.10.906</pubid>
                  <pubid idtype="pmpid" link="fulltext">11120680</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Gene selection for cancer classification using support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Guyon</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Weston</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barnhill</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vapnik</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Machine Learn</source>
            <pubdate>2002</pubdate>
            <volume>46</volume>
            <fpage>389</fpage>
            <lpage>422</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1023/A:1012487302797</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <aug>
               <au>
                  <snm>Mukherjee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Verri</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Messirov</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Support vector machine classification of microarray data. AI memo. CBCL paper 182</source>
            <publisher>MIT Press, Cambridge, MA</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Classification of gene expression data using fuzzy logic</p>
            </title>
            <aug>
               <au>
                  <snm>Ohno-Machado</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Vinterbo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Journal of Intelligent and Fuzzy Systems</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>1</issue>
            <fpage>19</fpage>
            <lpage>24</lpage>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Classification of gene cancer types by support vector machines using microarray gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Cai</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dayanik</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hasan</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Terauchi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Grundy</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>International Conference on Intelligent Systems for Molecular Biology</source>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Gene selection for cancer classification using a hybrid of univariate and multivariate feature selection methods</p>
            </title>
            <aug>
               <au>
                  <snm>Xu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Setiono</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Applied Genomics and Proteomics</source>
            <pubdate>2003</pubdate>
            <volume>2</volume>
            <issue>2</issue>
            <fpage>79</fpage>
            <lpage>91</lpage>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Wrapper for feature subset selection</p>
            </title>
            <aug>
               <au>
                  <snm>Kohavi</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>John</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Artificial Intelligence</source>
            <pubdate>1997</pubdate>
            <volume>97</volume>
            <issue>1&#8211;2</issue>
            <fpage>273</fpage>
            <lpage>324</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0004-3702(97)00043-X</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Motoda</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Feature selection for knowledge discovery and data mining</source>
            <publisher>Boston, Kluwer Acad. Publishers</publisher>
            <pubdate>1998</pubdate>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Class prediction and discovery using gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Slonim</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>4th Annual International Conference on Computational Molecular Biology (RECOMB), 2000 Apr 8&#8211;11; Tokyo, Japan</source>
            <publisher>Tokyo: Universal Academy Press</publisher>
            <pubdate>2000</pubdate>
            <fpage>263</fpage>
            <lpage>272</lpage>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Minimum Redundancy feature selection from microarray gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Ding</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Peng</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Computational Systems Bioinformatics</source>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Using Mutual Information for Selecting Features in Supervised Neural Networks</p>
            </title>
            <aug>
               <au>
                  <snm>Battiti</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>IEEE transactions on neural networks</source>
            <pubdate>1994</pubdate>
            <volume>5</volume>
            <issue>4</issue>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Optimization by simulated annealing</p>
            </title>
            <aug>
               <au>
                  <snm>Kirkpatrick</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gelatt</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vecchi</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1983</pubdate>
            <volume>220</volume>
            <fpage>671</fpage>
            <lpage>680</lpage>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Predicting the clinical status of human breast cancer by using gene expression profiles</p>
            </title>
            <aug>
               <au>
                  <snm>West</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <issue>20</issue>
            <fpage>11462</fpage>
            <lpage>11467</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">58752</pubid>
                  <pubid idtype="pmpid" link="fulltext">11562467</pubid>
                  <pubid idtype="doi">10.1073/pnas.201162998</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissuesprobed by oligonucleotide arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Alon</snm>
                  <fnm>U</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <issue>12</issue>
            <fpage>6745</fpage>
            <lpage>6750</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">21986</pubid>
                  <pubid idtype="pmpid" link="fulltext">10359783</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.12.6745</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Systematic variation in gene expression patterns in human cancer cell lines</p>
            </title>
            <aug>
               <au>
                  <snm>Ross</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Scherf</snm>
                  <fnm>Ut</fnm>
               </au>
            </aug>
            <source>Nature Genetics</source>
            <pubdate>2000</pubdate>
            <volume>24</volume>
            <issue>3</issue>
            <fpage>227</fpage>
            <lpage>234</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/73432</pubid>
                  <pubid idtype="pmpid" link="fulltext">10700174</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Kruppel-like zinc fingers bind to nuclear import proteins and are required for efficient nuclear localization of erythroid Kruppel-like factor</p>
            </title>
            <aug>
               <au>
                  <snm>Quadrini</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bieker</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2002</pubdate>
            <volume>277</volume>
            <issue>35</issue>
            <fpage>32243</fpage>
            <lpage>32252</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M205677200</pubid>
                  <pubid idtype="pmpid" link="fulltext">12072445</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Phosphorylation near nuclear localization signal regulates nuclear import of adenomatous polyposis coli protei</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Neufeld</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <issue>23</issue>
            <fpage>12577</fpage>
            <lpage>12582</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">18806</pubid>
                  <pubid idtype="pmpid" link="fulltext">11050185</pubid>
                  <pubid idtype="doi">10.1073/pnas.230435597</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Insulin-like growth factor(IGF)-I, IGF binding protein-3, and cancer risk</p>
            </title>
            <aug>
               <au>
                  <snm>Renehan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zwahlen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Minder</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>O'Dwyer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shalet</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Egger</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Systematic Review and Meta-Regression Analysis, Lancet</source>
            <pubdate>2004</pubdate>
            <volume>363</volume>
            <issue>9418</issue>
            <fpage>1346</fpage>
            <lpage>1353</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Pattern Identification and Classification in Gene Expression Data Using an Autoassociative Neural Network Model</p>
            </title>
            <aug>
               <au>
                  <snm>Bicciato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pandin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Didone</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bello</snm>
                  <fnm>CD</fnm>
               </au>
            </aug>
            <source>Biotechnology and bioengineering</source>
            <pubdate>2003</pubdate>
            <volume>81</volume>
            <issue>5</issue>
            <fpage>594</fpage>
            <lpage>606</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/bit.10505</pubid>
                  <pubid idtype="pmpid" link="fulltext">12514809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Identification of genes differentially expressed between gastric cancers and normal gastric mucosa with cDNA microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Baek</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ha</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>DK</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>DI</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cancer Letters</source>
            <pubdate>2002</pubdate>
            <volume>184</volume>
            <issue>2</issue>
            <fpage>197</fpage>
            <lpage>206</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0304-3835(02)00197-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">12127692</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <aug>
               <au>
                  <snm>Shannon</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Weaver</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>The Mathematical Theory of Communication</source>
            <publisher>University of Illinois Press</publisher>
            <pubdate>1949</pubdate>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Heuristic algorithms for the unconstrained binary quadratic programming problem</p>
            </title>
            <aug>
               <au>
                  <snm>Beasley</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>London, England</source>
            <pubdate>1998</pubdate>
         </bibl>
      </refgrp>
   </bm>
</art>
