<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-10-S1-S19</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>A voting approach to identify a small number of highly predictive genes using multiple classifiers</p>
         </title>
         <aug>
            <au ca="yes" id="A1">
               <snm>Hassan</snm>
               <fnm>Md Rafiul </fnm>
               <insr iid="I1"/>
               <email>mrhassan@csse.unimelb.edu.au</email>
            </au>
            <au ca="yes" id="A2">
               <snm>Hossain</snm>
               <fnm>M Maruf</fnm>
               <insr iid="I1"/>
               <email>hossain@csse.unimelb.edu.au</email>
            </au>
            <au id="A3">
               <snm>Bailey</snm>
               <fnm>James</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jbailey@csse.unimelb.edu.au</email>
            </au>
            <au id="A4">
               <snm>Macintyre</snm>
               <fnm>Geoff</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>gmaci@csse.unimelb.edu.au</email>
            </au>
            <au id="A5">
               <snm>Ho</snm>
               <mi>WK</mi>
               <fnm>Joshua</fnm>
               <insr iid="I3"/>
               <insr iid="I4"/>
               <email>joshua@it.usyd.edu.au</email>
            </au>
            <au id="A6">
               <snm>Ramamohanarao</snm>
               <fnm>Kotagiri</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>rao@csse.unimelb.edu.au</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia</p>
            </ins>
            <ins id="I2">
               <p>NICTA Victoria Laboratory, The University of Melbourne, Victoria 3010, Australia</p>
            </ins>
            <ins id="I3">
               <p>School of Information Technologies, The University of Sydney, NSW 2006, Australia</p>
            </ins>
            <ins id="I4">
               <p>NICTA, Australian Technology Park, Eveleigh, NSW 2015, Australia</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <supplement>
            <title>
               <p>Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)</p>
            </title>
            <editor>Michael Q Zhang, Michael S Waterman and Xuegong Zhang</editor>
            <note>Research</note>
         </supplement>
         <conference>
            <title>
               <p>The Seventh Asia Pacific Bioinformatics Conference (APBC 2009)</p>
            </title>
            <location>Beijing, China</location>
            <date-range>13&#8211;16 January 2009</date-range>
            <url>http://bioinfo.au.tsinghua.edu.cn/apbc2009/</url>
         </conference>
         <issn>1471-2105</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>Suppl 1</issue>
         <fpage>S19</fpage>
         <url>http://www.biomedcentral.com/1471-2105/10/S1/S19</url>
         <xrefbib>
            
         <pubidlist><pubid idtype="pmpid">19208118</pubid><pubid idtype="doi">10.1186/1471-2105-10-S1-S19</pubid></pubidlist></xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>30</day>
               <month>1</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Hassan et al; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Microarray gene expression profiling has provided extensive datasets that can describe characteristics of cancer patients. An important challenge for this type of data is the discovery of gene sets which can be used as the basis of developing a clinical predictor for cancer. It is desirable that such gene sets be compact, give accurate predictions across many classifiers, be biologically relevant and have good biological process coverage.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>By using a new type of multiple classifier voting approach, we have identified gene sets that can predict breast cancer prognosis accurately, for a range of classification algorithms. Unlike a wrapper approach, our method is not specialised towards a single classification technique. Experimental analysis demonstrates higher prediction accuracies for our sets of genes compared to previous work in the area. Moreover, our sets of genes are generally more compact than those previously proposed. Taking a biological viewpoint, from the literature, most of the genes in our sets are known to be strongly related to cancer.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We show that it is possible to obtain superior classification accuracy with our approach and obtain a compact gene set that is also biologically relevant and has good coverage of different biological processes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Gene microarrays are a popular technology for assisting with the prediction and understanding of diseases <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Cancer is one such disease where this technology has proved to be particularly powerful. An important challenge in this area is the discovery of gene sets which can be used as predictors of cancer. For a gene set to be useful as the basis of developing a clinical predictor for cancer, there are a number of desirable properties it should have:</p>
         <p>&#8226; <it>Compactness</it>: There should not be too many genes in the set. This reduces the cost involved in developing a clinical diagnostic test using these genes.</p>
         <p>&#8226; <it>Accuracy</it>: When the genes are input to a machine learning algorithm as features, it should be possible to achieve a high true positive rate and a low false positive rate.</p>
         <p>&#8226; <it>Classifier independence</it>: It should be possible to achieve high accuracy using a range of different machine learning classifiers with the gene set. This increases the confidence that biologists have in the stability and generality of the gene set.</p>
         <p>&#8226; <it>Biological relevance</it>: Most of the genes in the gene set should have a known relationship to cancer, based on the literature.</p>
         <p>&#8226; <it>Biological coverage</it>: The genes in the gene set should span a number of distinct biological processes and each gene should be independently useful for prediction. The set of genes should not be confined to a single pathway. This increases the robustness of prediction and allows more uniform classification power across different subtypes of cancer.</p>
         <p>In this paper, we propose a new classifier voting approach to discover a gene set for breast cancer prognosis that satisfies these five properties. We are able to discover a gene set using the van 't Veer <it>et al. </it><abbrgrp><abbr bid="B3">3</abbr></abbrgrp> dataset that consists of 7 genes and delivers highly accurate prediction results for a range of classifiers. In addition, we were able to discover a 6 gene set that delivers high accuracy on the Ma <it>et al. </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp> dataset, for which we also validated our performance on an additional independent dataset exhibiting the same biological conditions <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The majority of these genes have been previously mentioned in the cancer literature, and we have found the genes in these sets to be relatively independent in terms of function, meaning our genes are able to cover a number of different processes involved in cancer. In comparison with other studies on these datasets, our gene sets are considerably smaller and deliver considerably higher performance across a range of machine learning classifiers.</p>
         <p>Our proposed technique is based on the use of multiple voting classifiers to identify the final gene set. Its use of multiple classifiers makes it different from previous work for microarray classification, such as wrapper based methods, which only target a single classifier.</p>
         <p>An important aspect of our method is that it does not employ any biological domain knowledge (e.g. the Gene Ontology) as part of the algorithm for identifying the gene set. This makes it particularly applicable for deployment in scenarios where the literature is sparse or the state-of-the-art is immature. Nevertheless, for the dataset we use, we are able to confirm that the individual genes in the sets that are discovered are biologically relevant.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>Our proposed approach comprises two steps. In the first step, we rank all genes of the training set. In the second step, we investigate the classification performance of combinations of genes using a voting approach from the ranked genes obtained from the first step, but employing a number of classifiers instead of just one classifier. The steps are described in detail in the subsequent sections.</p>
         <sec>
            <st>
               <p>The receiver operating characteristic (ROC) curve: preliminaries</p>
            </st>
            <p>In machine learning, the receiver operating characteristic (ROC) curve is used to evaluate the discriminative performance of binary classifiers. This is obtained by plotting the curve of the true positive rate (<it>Sensitivity</it>) versus the false positive rate (1 &#8211; <it>Specificity</it>) for a binary classifier by varying the discrimination threshold.</p>
            <p>All the calculations of true positive rate and false positive rate are attained when using a particular classifier threshold. By varying the threshold, a set of values for these measurements is obtained. This set of values is plotted in a two-dimensional Cartesian graph to yield the ROC curve. The ROC curve takes into account all the possible solutions by varying the discriminative threshold. The best performance would be produced, if the ROC curve matches with the upper left corner of the ROC space (which yields 100% <it>sensitivity </it>and 100% <it>specificity</it>). The closer the ROC curve is to the upper part of the ROC space, the better the performance of the classifier.</p>
            <p>An ROC curve is a two dimensional illustration of classifier performance. Reducing ROC performance to a single scalar value to represent expected performance helps compare classifiers. A popular method is to calculate the area under the ROC curve (AUC) <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
            <p>The AUC, being a part of the area of the unit square, has a value between 0 and 1. Since random guessing could produce the diagonal line between (0, 0) and (1, 1) with an area of 0.5, a classifier with an AUC less than 0.5 is undesirable <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. An AUC value close to 1 indicates better performance for a binary classifier <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Feature ranking using ROC</p>
            </st>
            <p>We rank all genes of the training set using the first step of Mamitsuka <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>'s ROC, which is the equivalent of the Mann-Whitney U statistic <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, normalized by the number of possible pairings of positive and negative values, also known as the two sample Wilcoxon rank sum statistic <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The AUC actually represents the probability that a randomly chosen positive example is correctly rated (ranked) with greater suspicion than a randomly chosen negative example.</p>
            <p>Let us consider a training dataset <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S19-i1"><m:semantics><m:mi mathvariant="script">D</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83aXteaaa@374F@</m:annotation></m:semantics></m:math></inline-formula> of <it>n </it>examples, where each example comprises <it>m </it>attributes: <it>x</it><sub>1</sub>, <it>x</it><sub>2</sub>, <it>x</it><sub>3</sub>,..., <it>x</it><sub><it>m</it></sub>. Each of the <it>m </it>attributes has a differing discriminative power reflected by its respective AUC. To calculate the discriminative power that is expressed in terms of AUC, we plot the ROC curve for each gene paired with the class label, (i.e., {<it>x</it><sub><it>i</it></sub>, <it>Y</it><sub><it>i</it></sub>}, where 1 &#8804; <it>i </it>&#8804; <it>m </it>and <it>Y </it>is the vector of class labels) and calculate the AUC of the ROC curve. Now, we order the genes based on their respective AUCs.</p>
         </sec>
         <sec>
            <st>
               <p>Multi-classifier voting approach to select genes</p>
            </st>
            <p>We attempt to classify the validation dataset with the top ranked genes. At first we pass the top 10 genes individually to all classifiers, and note the classification accuracy of each classifier on a validation set. For the second pass, we select the gene for which the most classifiers achieve their highest accuracy. Then we form a pair of the selected gene from the first pass, along with the remaining nine genes, and input these nine pairs to the selected classifiers. We then note the classification accuracy. The pair on which the most classifiers achieve their highest accuracy is selected and given to a third pass. We continue adding single genes to form 3-gene combinations, and so on.</p>
            <p>A diverse set of fifteen classifiers was used for this process. They are: Logistic Model Tree (LMT) <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, Na&#239;ve Bayes Tree (NBTree) <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, Na&#239;ve Bayes, Random Forest, C4.5, <it>k</it>-Nearest Neighbour (<it>k</it>-NN), Artificial Neural Network (ANN), Logistic Regression, Support Vector Machine (SVM), and bagging <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and boosting (ADABoost.M1) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> for Na&#239;ve Bayes, Random Forest and C4.5.</p>
            <p>We stop growing the gene set once more than 50% of the classifiers have their accuracy lowered on the validation set by the addition of an individual gene. In the case of any tie between two or more genes, we note the total accuracy (out of 1500%) for the tied genes and the gene with the largest total accuracy is chosen for the next pass. This majority voting approach allows us to select a small subset of genes that can boost classification accuracy on a number of classifiers. One can then tune the performance of an individual classifier by choosing the prefix of the genes (ranked using the voting approach) that delivers best accuracy. For example, for C4.5 with boosting on the van 't Veer data, adding the 5th gene to the first four genes actually degrades the performance of that specific classifier. So for that individual classifier, rather than using our final selection of 7 genes, we can instead use only a subset of 4 (out of the 7) genes.</p>
         </sec>
         <sec>
            <st>
               <p>Datasets</p>
            </st>
            <p>In each of the three datasets used in our analysis, the prognostic outcome to be predicted is whether distant metastases will occur within 5 years (poor prognosis) or whether the patient is disease-free after 5 years (good prognosis).</p>
            <p><it>van't Veer data </it><abbrgrp><abbr bid="B3">3</abbr></abbrgrp>: The dataset comprises 97 breast cancer patients treated through modified radical mastectomy or breast-conserving treatment followed by radiotherapy. The patients were split into a training set of 68, a validation set of 10, and a test set of 19 cases. The training set consists of 29 positive (poor prognoses) and 39 negative (good prognoses) cases, the validation set comprises five positive and five negative cases, and the test set was made up of 12 positive and 7 negative cases. Further, we created a merged dataset from van 't Veer's <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> training (our training and validation set) and test sets to apply <it>k</it>-fold cross validation (CV). In <it>k</it>-fold CV, 10 cases out of the training set are randomly selected for the validation set before applying the FROC.</p>
            <p><it>Ma et al. data </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp>: This dataset contains 60 breast cancer patients treated through standard breast surgery followed by continued adjuvent tamoxifen therapy. There were 28 positive cases (poor prognoses) and 32 negative cases (good prognoses). We separate the first 5 positive cases and the last 5 negative cases to form the test set and use the remaining cases for training.</p>
            <p><it>Loi et al. data </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp>: The dataset is made up of 77 breast cancer patients obtained from the GUYT2 test data used in the Loi <it>et al</it>. study, with similar treatments to those performed in the Ma <it>et al. </it>dataset. There were 10 positive cases (poor prognoses) and 67 negative cases (good prognoses). This dataset was included in our study as a completely independent test dataset. All patients were considered as test cases to gauge performance of the classifiers trained on the Ma <it>et al. </it>dataset.</p>
         </sec>
         <sec>
            <st>
               <p>Evaluation</p>
            </st>
            <p>For the van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and Ma <it>et al. </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp> datasets, we used a holdout cross validation procedure, where a training set is used to train the classifiers and a separate test set is used to evaluate. We have also used the more general <it>k</it>-fold cross validation (CV) scheme to evaluate the performance on van 't Veer data. In <it>k</it>-fold CV, the original sample is partitioned into <it>k </it>subsamples. Of the <it>k </it>subsamples, a single subsample is retained as the validation data for testing the model, and the remaining <it>k </it>- 1 subsamples are used as training data. The CV process is then repeated <it>k </it>times (the folds), with each of the <it>k </it>subsamples used exactly once as the validation data. The <it>k </it>results from the folds, then, can be averaged (or otherwise combined) to produce a single estimation. Here, we have chosen <it>k </it>to be 5. The results for <it>k</it>-fold CV is presented in the Additional File <supplr sid="S1">1</supplr>.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>This file contains the rank gene list used in each fold of 5-fold CV, and performance of each fold using the selected genes for different classifier.</p>
               </text>
               <file name="1471-2105-10-S1-S19-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>All of our evaluation results are reported in weighted accuracy <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, which is calculated by the formula shown in Eq. (1).</p>
            <p>
               <display-formula id="M1">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S19-i2">
                     <m:semantics>
                        <m:mrow>
                           <m:mtext>Weighted&#160;Accuracy</m:mtext>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:mi>T</m:mi>
                                       <m:mi>P</m:mi>
                                    </m:mrow>
                                    <m:mi>P</m:mi>
                                 </m:mfrac>
                                 <m:mo>+</m:mo>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:mi>T</m:mi>
                                       <m:mi>N</m:mi>
                                    </m:mrow>
                                    <m:mi>N</m:mi>
                                 </m:mfrac>
                              </m:mrow>
                              <m:mn>2</m:mn>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaee4vaCLaeeyzauMaeeyAaKMaee4zaCMaeeiAaGMaeeiDaqNaeeyzauMaeeizaqMaeeiiaaIaeeyqaeKaee4yamMaee4yamMaeeyDauNaeeOCaiNaeeyyaeMaee4yamMaeeyEaKNaeyypa0tcfa4aaSaaaeaadaWcaaqaaiabdsfaujabdcfaqbqaaiabdcfaqbaacqGHRaWkdaWcaaqaaiabdsfaujabd6eaobqaaiabd6eaobaaaeaacqaIYaGmaaaaaa@4CA8@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where, <it>P </it>= Total number of positive cases,</p>
            <p><it>N </it>= Total number of negative cases.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>Using the van 't Veer data <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, our final selected gene set consists of the 7 genes (<it>TSPYL5</it>, <it>NMU</it>, <it>CA9</it>, <it>AGTPBP1</it>, <it>LIN9</it>, <it>ASPM</it>, and <it>DIAPH1</it>) and as we shall show, this compact gene set delivers highly accurate performance across a range of classifiers. The functions of these genes are summarised in Table <tblr tid="T1">1</tblr>. Additionally, we used the Ma <it>et al. </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp> data to test whether our method was successful on a dataset with different biology to the van 't Veer dataset. We have obtained a different set of 6 genes (<it>RGS19</it>, <it>ZIC2</it>, <it>SRD5A3</it>, <it>PPARD</it>, <it>GM2A</it>, <it>CD55</it>). We also tested the generalisability of this 6 gene set on an additional independent dataset exhibiting the same biological conditions <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Most of the later discussion will be using the van 't Veer data as an example, unless otherwise stated.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Our set of 7 genes selected by majority voting and ordered by area under ROC curve</p>
            </caption>
            <tblbdy cols="4">
               <r>
                  <c ca="left">
                     <p>
                        <b>GeneBank Accession Number</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>AUC</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Gene Symbol</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Gene Description</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <ext-link ext-link-id="AL080059" ext-link-type="gen">AL080059</ext-link>
                     </p>
                  </c>
                  <c ca="center">
                     <p>0.800802</p>
                  </c>
                  <c ca="left">
                     <p>
                        <it>TSPYL5</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>TSPY-like 5</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <ext-link ext-link-id="NM_006681" ext-link-type="gen">NM_006681</ext-link>
                     </p>
                  </c>
                  <c ca="center">
                     <p>0.794786</p>
                  </c>
                  <c ca="left">
                     <p>
                        <it>NMU</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Neuromedin U</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <ext-link ext-link-id="NM_001216" ext-link-type="gen">NM_001216</ext-link>
                     </p>
                  </c>
                  <c ca="center">
                     <p>0.794786</p>
                  </c>
                  <c ca="left">
                     <p>
                        <it>CA9</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Carbonic Anhydrase IX</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <ext-link ext-link-id="AA830802" ext-link-type="gen">AA830802</ext-link>
                     </p>
                  </c>
                  <c ca="center">
                     <p>0.792781</p>
                  </c>
                  <c ca="left">
                     <p>
                        <it>AGTPBP1</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>ATP/GTP binding protein 1</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <ext-link ext-link-id="AA834945" ext-link-type="gen">AA834945</ext-link>
                     </p>
                  </c>
                  <c ca="center">
                     <p>0.774733</p>
                  </c>
                  <c ca="left">
                     <p>
                        <it>LIN9</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Lin-9 homolog (C. elegans)</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <ext-link ext-link-id="AA748494" ext-link-type="gen">AA748494</ext-link>
                     </p>
                  </c>
                  <c ca="center">
                     <p>0.766711</p>
                  </c>
                  <c ca="left">
                     <p>
                        <it>ASPM</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>ASP (abnormal spindle) homolog, microcephaly associated (Drosophila)</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <ext-link ext-link-id="NM_005219" ext-link-type="gen">NM_005219</ext-link>
                     </p>
                  </c>
                  <c ca="center">
                     <p>0.764706</p>
                  </c>
                  <c ca="left">
                     <p>
                        <it>DIAPH1</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Diaphanous homolog 1 (Drosophila)</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <sec>
            <st>
               <p>Biological significance of the compact gene sets</p>
            </st>
            <p>As the treatment procedures applied to the patients in both the van 't Veer study <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> (no adjuvent therapy) and Ma <it>et al. </it>study <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> (adjuvent therapy with tamoxifen) are vastly different, it is not surprising that there is no overlap between the two gene sets identified as the best predictors of prognosis outcome. The biology driving the chance to distant metastasis in each dataset is likely to be significantly different and as such it would not make sense to expect the gene lists to overlap. Therefore, we will consider each gene set independently.</p>
            <p>Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the two gene sets show that each of the genes are diverse in function and appear to be unrelated in terms of the biological processes in which they are involved. What is interesting, however, is that the majority of the genes have been previously shown to be related to cancer in the literature (as shown below). This suggests that our feature selection procedure yields a compact sampling of the diverse biological processes represented by the microarray, which are highly representative of the prognostic potential of the patient. In concordance with this, in the 7 gene set, <it>TSPYL5 </it>and <it>CA9 </it>have been previously used as prognostic biomarkers in cancer <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Furthermore, four of the top 7 genes selected by our method are in the set of 231 genes used in the study by van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. and the most important individual gene in improving a number of the classifier performances in the test set (<it>TSPYL5</it>) is present in the 17 genes selected by Alexe <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. In the 6 gene set, <it>CD55 </it>has been used previously as a prognostic biomarker in gastric cancer <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Links between identified genes and the cancer literature</p>
            </st>
            <sec>
               <st>
                  <p>7 gene set</p>
               </st>
               <p>Each of the 7 genes can be directly linked to potential cancer re-occurrence through their respective biological functions. <it>TSPYL5 </it>is involved in nucleosome assembly, a process which, if destabilised, can alter the regulatory mechanisms of a cell <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, which is likely to occur in cancer. <it>NMU </it>has been shown to be related to metastatic potential and cancer cachexia <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, which would have a significant impact on the potential of reoccurrence of the cancer. <it>CA9 </it>is involved in nitrogen metabolism and is linked to cell proliferation and <it>ASPM </it>is involved in mitotic spindle regulation and is expressed in proliferating tissues. (Proliferation is a mechanism which is well known to be related to the cancerous potential of cells). <it>LIN9 </it>is involved in progression through the cell cycle <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and is a tumor suppressor <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> that inhibits DNA synthesis, thus having significant cancerous potential. Regulation of the <it>DIAPH1 </it>gene <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> is important in regulating the transcription factor Mitf, which in turn regulates the invasiveness of melanoma. Finally, while no significant link to cancer processes were found for <it>AGTPBP1</it>, somatic mutations in the coding sequence have been found in colorectal cancers <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>6 gene set</p>
               </st>
               <p>5 out of the 6 genes in this set not only have links to cancer through the literature, they have in most cases been shown to be directly linked to prognostic outcome. A study into antibodies against <it>ZIC2 </it>in small lung cell carcinoma showed that the concentration of antibodies is a good indicator of prognosis <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. <it>SRDA53 </it>overexpression in hormone-refractory prostate cancers was shown to be crucial for cell viability <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and is a likely factor in resistance to hormone based therapies in prostate cancers. <it>PPAR </it>has been shown previously to be abberently expressed in colon cancer cells <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> and is an important player in the proliferation and growth of these cells <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. A protein involved in innate immune response which is critical to the regulation of the complement cascade, <it>CD55</it>, has been shown to be important in prostate growth <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, gastric tumor invasiveness <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> and breast cancer prognosis <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Finally, <it>RGS19 </it>has been implicated in the control of autophagy in colon cancer cell lines <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>.</p>
               <p>The demonstrated links of these genes with the literature highlights the relevancy of each of the genes with respect to cancer and demonstrates their potential to represent biological processes which are directly related to the prognostic potential (chance of cancer re-occurrence) of a patient.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Classification performance on test set</p>
            </st>
            <sec>
               <st>
                  <p>7 gene set</p>
               </st>
               <p>Table <tblr tid="T2">2</tblr> shows the results for a range of different classifiers being tested on the van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> dataset, when configured to select their preferred subset of genes from our 7 gene set. As most classifiers have their own internal mechanism to rank and select the features to classify, it is obvious that all classifiers will not perform similarly with the same subset of genes. The 'majority voting' scheme, used to select the significant 7 genes in our multi-classifier voting approach from one pass to another, helped in improving the performance of the all classifiers. However, a few classifiers &#8211; namely C4.5, C4.5 with bagging, Na&#239;ve Bayes with bagging, Na&#239;ve Bayes with boosting, LMT, NBTree and <it>k</it>-NN &#8211; showed the best performance using only a single gene.</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>Accuracy achieved on test set by different classifiers using various subsets of our 7 genes</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="left">
                           <p>
                              <b>Classifier</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Gene Combination</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>C4.5</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>C4.5 with boosting (ADABoost.M1)</p>
                        </c>
                        <c ca="center">
                           <p>91.67%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5-DIAPH1-AGTPBP1</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>C4.5 with bagging</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Na&#239;ve Bayes</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Na&#239;ve Bayes with boosting</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Na&#239;ve Bayes with bagging</p>
                        </c>
                        <c ca="center">
                           <p>88.69%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5-DIAPH1-NMU</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>LMT</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>NBTree</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Random Forest</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5-DIAPH1-ASPM</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Random Forest with boosting</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5-DIAPH1-ASPM</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Random Forest with bagging</p>
                        </c>
                        <c ca="center">
                           <p>88.69%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5-DIAPH1-ASPM-NMU</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>k</it>-NN</p>
                        </c>
                        <c ca="center">
                           <p>80.36%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Logistic Regression</p>
                        </c>
                        <c ca="center">
                           <p>81.55%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5-DIAPH1-CA9</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>ANN</p>
                        </c>
                        <c ca="center">
                           <p>77.38%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5-CA9</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>SVM</p>
                        </c>
                        <c ca="center">
                           <p>83.33%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>TSPYL5-LIN9</it>
                           </p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>The best performance for the test dataset obtained was 91.67% for C4.5 with boosting. This performance was achieved using only three genes: <it>TSPYL5, DIAPH1</it>, and <it>AGTPBP1</it>. It is worth noting that the gene subset {<it>TSPYL5</it>, <it>DIAPH1</it>} was found to be significant for at least six of the considered 15 classifiers (see Table <tblr tid="T2">2</tblr>). It was also found that the gene <it>TSPYL5 </it>is the most influential and has been chosen by all the considered classifiers. The performance of ANN and SVM was found to be better for the gene subsets {<it>TSPYL5</it>, <it>CA9</it>}, and {<it>TSPYL5</it>, <it>LIN9</it>}, respectively. The gene <it>LIN9 </it>was found to be important only when using SVM. Similarly, the gene <it>CA9 </it>was found to be suitable for the ANN and Logistic Regression along with other genes. An analysis of the experimental results reveals that similar types of classifiers tended to choose the same subset of genes (except one or two different genes in the subset) to obtain the best performance. For instance, Random Forest, Random Forest with bagging and Random Forest with boosting, are essentially similar classifiers with some small variation. All these classifiers chose the gene subset {<it>TSPYL5</it>, <it>DIAPH1</it>, <it>ASPM</it>} for classifying the dataset. However, Random Forest with bagging produced the best accuracy of the three different types of Random Forest considered in this study, adding the gene <it>NMU </it>to the common gene subset {<it>TSPYL5</it>, <it>DIAPH1</it>, <it>ASPM</it>}. Thus, a subset of genes used by all classifiers is selected as the important gene subset. Table <tblr tid="T3">3</tblr> summarises the classification accuracy for both the individual test set of van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and for 5 fold cross validation (CV), comparing the use of subsets of the 7 genes, versus the scenario where all 25,000 genes are used.</p>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>Comparison of the weighted accuracy of different classifiers using i) subsets of our 7 genes and ii) all 25,000 genes</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c ca="center">
                           <p>
                              <b>Classifier</b>
                           </p>
                        </c>
                        <c ca="center" cspan="2">
                           <p>
                              <b>Subsets of our 7 genes</b>
                           </p>
                        </c>
                        <c ca="center" cspan="2">
                           <p>
                              <b>All 25,000 genes</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>Test set (19)</p>
                        </c>
                        <c ca="center">
                           <p>All data (5-fold CV)</p>
                        </c>
                        <c ca="center">
                           <p>Test set (19)</p>
                        </c>
                        <c ca="center">
                           <p>All data (5-fold CV)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>C4.5</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>88.49%</p>
                        </c>
                        <c ca="center">
                           <p>79.17%</p>
                        </c>
                        <c ca="center">
                           <p>62.36%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>C4.5 with boosting (ADABoost)</p>
                        </c>
                        <c ca="center">
                           <p>91.67%</p>
                        </c>
                        <c ca="center">
                           <p>89.54%</p>
                        </c>
                        <c ca="center">
                           <p>63.10%</p>
                        </c>
                        <c ca="center">
                           <p>62.89%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>C4.5 with bagging</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>88.94%</p>
                        </c>
                        <c ca="center">
                           <p>48.81%</p>
                        </c>
                        <c ca="center">
                           <p>63.98%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Na&#239;ve Bayes</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>92.13%</p>
                        </c>
                        <c ca="center">
                           <p>50.00%</p>
                        </c>
                        <c ca="center">
                           <p>52.17%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Na&#239;ve Bayes with bagging</p>
                        </c>
                        <c ca="center">
                           <p>88.69%</p>
                        </c>
                        <c ca="center">
                           <p>86.82%</p>
                        </c>
                        <c ca="center">
                           <p>50.00%</p>
                        </c>
                        <c ca="center">
                           <p>52.17%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Na&#239;ve Bayes with boosting</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>87.65%</p>
                        </c>
                        <c ca="center">
                           <p>50.00%</p>
                        </c>
                        <c ca="center">
                           <p>52.17%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>LMT</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>88.11%</p>
                        </c>
                        <c ca="center">
                           <p>77.38%</p>
                        </c>
                        <c ca="center">
                           <p>60.29%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>NBTree</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>83.69%</p>
                        </c>
                        <c ca="center">
                           <p>66.07%</p>
                        </c>
                        <c ca="center">
                           <p>58.76%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Random Forest</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>90.59%</p>
                        </c>
                        <c ca="center">
                           <p>66.07%</p>
                        </c>
                        <c ca="center">
                           <p>62.47%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Random Forest with bagging</p>
                        </c>
                        <c ca="center">
                           <p>88.69%</p>
                        </c>
                        <c ca="center">
                           <p>90.59%</p>
                        </c>
                        <c ca="center">
                           <p>73.21%</p>
                        </c>
                        <c ca="center">
                           <p>64.75%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Random Forest with boosting</p>
                        </c>
                        <c ca="center">
                           <p>84.52%</p>
                        </c>
                        <c ca="center">
                           <p>88.48%</p>
                        </c>
                        <c ca="center">
                           <p>66.07%</p>
                        </c>
                        <c ca="center">
                           <p>62.45%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>k</it>-NN</p>
                        </c>
                        <c ca="center">
                           <p>80.36%</p>
                        </c>
                        <c ca="center">
                           <p>83.00%</p>
                        </c>
                        <c ca="center">
                           <p>63.69%</p>
                        </c>
                        <c ca="center">
                           <p>61.94%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Logistic Regression</p>
                        </c>
                        <c ca="center">
                           <p>81.55%</p>
                        </c>
                        <c ca="center">
                           <p>88.11%</p>
                        </c>
                        <c ca="center">
                           <p>Out of memory*</p>
                        </c>
                        <c ca="center">
                           <p>Out of memory*</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>ANN</p>
                        </c>
                        <c ca="center">
                           <p>77.38%</p>
                        </c>
                        <c ca="center">
                           <p>83.44%</p>
                        </c>
                        <c ca="center">
                           <p>Out of memory*</p>
                        </c>
                        <c ca="center">
                           <p>Out of memory*</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>SVM</p>
                        </c>
                        <c ca="center">
                           <p>83.33%</p>
                        </c>
                        <c ca="center">
                           <p>76.23%</p>
                        </c>
                        <c ca="center">
                           <p>63.69%</p>
                        </c>
                        <c ca="center">
                           <p>68.12%</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>*Our experiments were carried out on a standard Intel Core 2 Duo CPU 2.4 GHz desktop computer running 2 GB of RAM.</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>6 gene set</p>
               </st>
               <p>Table <tblr tid="T4">4</tblr> shows the classification performance of different classifiers on the Ma <it>et al. </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and Loi <it>et al. </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp> data. Apart from our selection of 6 genes, we have also used the 2 gene biomarker proposed by Ma <it>et al. </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp> for comparison, bearing in mind that this is a somewhat of a simplification, as the two genes are actually used as a ratio in their study. Our selection of 6 genes is performing much better than the 2 genes on the Ma <it>et al</it>. dataset. Of the 15 classifiers, 13 achieve 100% accuracy, whereas the 2 gene biomarker showed a maximum accuracy of only 80% by one classifier. For most classifiers, the 2 gene biomarker showed only 60% to 70% accuracy. When testing on the Loi <it>et al. </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp> dataset, the performance of the 6 gene set over the 2 gene set is quite strong (9 wins by the 6 gene set, 3 wins by the 2 gene set and 3 draws over the 15 classifiers).</p>
               <tbl id="T4">
                  <title>
                     <p>Table 4</p>
                  </title>
                  <caption>
                     <p>Comparison of the accuracy of different classifiers using 2 known biomarker genes and our selection of 6 genes on Ma <it>et al. </it>and Loi <it>et al. </it>data</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c ca="left">
                           <p>
                              <b>Classifier</b>
                           </p>
                        </c>
                        <c ca="center" cspan="2">
                           <p>
                              <b>Ma <it>et al.</it></b>
                              <b>data</b>
                           </p>
                        </c>
                        <c ca="center" cspan="2">
                           <p>
                              <b>Loi <it>et al. </it></b>
                              <b>data</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>2 genes</p>
                        </c>
                        <c ca="center">
                           <p>6 genes</p>
                        </c>
                        <c ca="center">
                           <p>2 genes</p>
                        </c>
                        <c ca="center">
                           <p>6 genes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>C4.5</p>
                        </c>
                        <c ca="center">
                           <p>60.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>75.64%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>80.77%</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>C4.5 with boosting (ADABoost)</p>
                        </c>
                        <c ca="center">
                           <p>70.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>66.67%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>82.05%</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>C4.5 with bagging</p>
                        </c>
                        <c ca="center">
                           <p>70.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>67.95%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>75.64%</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Na&#239;ve Bayes</p>
                        </c>
                        <c ca="center">
                           <p>60.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>74.36%</p>
                        </c>
                        <c ca="center">
                           <p>74.36%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Na&#239;ve Bayes with boosting</p>
                        </c>
                        <c ca="center">
                           <p>60.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>80.00%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>74.36%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>77.95%</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Na&#239;ve Bayes with bagging</p>
                        </c>
                        <c ca="center">
                           <p>60.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>75.64%</p>
                        </c>
                        <c ca="center">
                           <p>75.64%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>LMT</p>
                        </c>
                        <c ca="center">
                           <p>70.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>76.92%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>79.49%</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>NBTree</p>
                        </c>
                        <c ca="center">
                           <p>80.00%</p>
                        </c>
                        <c ca="center">
                           <p>80.00%</p>
                        </c>
                        <c ca="center">
                           <p>75.64%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>82.05%</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Random Forest</p>
                        </c>
                        <c ca="center">
                           <p>60.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>74.36%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>75.38%</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Random Forest with boosting</p>
                        </c>
                        <c ca="center">
                           <p>70.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>67.95%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>74.36%</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Random Forest with bagging</p>
                        </c>
                        <c ca="center">
                           <p>70.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>74.36%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>71.79%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>k</it>-NN</p>
                        </c>
                        <c ca="center">
                           <p>70.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>73.08%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>71.79%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Logistic Regression</p>
                        </c>
                        <c ca="center">
                           <p>70.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>76.92%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>74.36%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>ANN</p>
                        </c>
                        <c ca="center">
                           <p>60.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>74.36%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>76.67%</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>SVM</p>
                        </c>
                        <c ca="center">
                           <p>60.00%</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>100%</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>74.36%</p>
                        </c>
                        <c ca="center">
                           <p>74.36%</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>In our multi-classifier voting approach, one selects a subset of genes using an ROC based ranking. This is then followed by a classifier voting phase, to refine this list of genes even further. Our improved performance is achieved due to two factors. First, ROC is a classifier independent method that is not dependent on the standard deviation of the features. Second, the multi-classifier voting gene selection approach produces the best possible combination of genes satisfied by a majority of the classifiers. These two benefits contribute to obtaining a better classification performance for the complete set of unseen datasets. Furthermore, the significant reduction of genes we obtain is another advantage of our approach.</p>
         <p>Previous studies that link gene expression profiles to clinical outcomes in breast cancer cases have demonstrated that the potential for distant metastasis and overall survival probability may be attributable to the biological characteristics of the primary tumor at the time of diagnosis <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. In particular, a 70-gene expression signature by van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> has proven to be a strong prognostic factor, outperforming all known clinicopathological parameters. The accuracy in distinguishing cases of Poor and Good breast cancer prognosis, provided by the subset of 70 genes selected by van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, was revalidated and confirmed by van de Vijver <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> in a different cohort of patients. However, 70 genes is not a compact set, greatly increasing the expense of developing a clinical predictor. Even the set of 17 genes by Alexe <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp> is twice as large as our 7 gene set. Our method yielded better accuracy with 7 genes, and that, too, was independent of a specific classifier. Having a compact set of genes is extremely important from a treatment and drug development viewpoint, where clinical and experimental validation is costly and it is vital to restrict the number of hypotheses or targets (genes) to be followed up. We also show, in the comparison of our 6 gene set with the 2 gene set of Ma <it>et al. </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp> in Table <tblr tid="T4">4</tblr> using the Loi <it>et al. </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp> data, that our approach avoids generating an overly compact geneset that may not generalise well to microarray data from another lab. This is extremely important when attempting to develop a robust predictor in a clinical setting.</p>
         <p>We have also compared our best results for the van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> dataset against some well known cancer treatment guidelines (see Table <tblr tid="T5">5</tblr>). It clearly shows that machine learning approaches are effective technique in classifying breast cancer prognosis.</p>
         <tbl id="T5">
            <title>
               <p>Table 5</p>
            </title>
            <caption>
               <p>Comparison of the weighted accuracy on the test set of the best result from our voting method versus some well known cancer treatment guidelines</p>
            </caption>
            <tblbdy cols="2">
               <r>
                  <c ca="left">
                     <p>
                        <b>Classifier</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>Weighted accuracy on Test set</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>C4.5 with boosting</p>
                  </c>
                  <c ca="center">
                     <p>91.67%</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>St. Gallen 1998*</p>
                  </c>
                  <c ca="center">
                     <p>68%</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>NIH 2000*</p>
                  </c>
                  <c ca="center">
                     <p>79%</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>NPI*</p>
                  </c>
                  <c ca="center">
                     <p>58%</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>70-genes*</p>
                  </c>
                  <c ca="center">
                     <p>74%</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>BPIM*</p>
                  </c>
                  <c ca="center">
                     <p>68%</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>BDIM*</p>
                  </c>
                  <c ca="center">
                     <p>58%</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>*Results obtained from Gevaert <it>et al. </it><abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, where results were provided as number of true positives and true negatives.</p>
            </tblfn>
         </tbl>
         <sec>
            <st>
               <p>Comparison with other studies</p>
            </st>
            <p>A number of efforts have been made in this direction for breast cancer prognosis but without major success. Ritz <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> combined both genetic and clinical information in a neural network for breast cancer prognosis, but found that the combination did not improve the performance.</p>
            <p>Dettling <it>et al. </it><abbrgrp><abbr bid="B41">41</abbr></abbrgrp> applied penalized logistic regression analysis to predict cancer prognosis for the van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> dataset. They found that none of the clinical variables entered the model and concluded that the clinical data did not contain any useful independent information for prediction, given the gene expression profile.</p>
            <p>To prognosticate on the breast cancer dataset, Alexe <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp> applied the Logical Analysis of Data (LAD) tool to analyze microarray data. They identified 17 genes out of 25,000 possible genes that could distinguish patients with Poor or Good prognoses. Amongst the 17 genes the LAD tool identified three and five genes that were associated with Poor and Good prognoses, respectively. Two wholly new classes (defined by similar sets of covering patterns, gene expression ranges, and clinical features) of patients were discovered. It was also demonstrated that the training and test sets of van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> differ in their characteristics. However, this study of Alexe <it>et al. </it>was overly specific to the chosen classifier (the LAD tool), as we shall shortly see.</p>
            <p>We assessed the classification performance of five different subsets of genes on the van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> dataset. The 231 and 70 genes selected by van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, the set of 17 genes selected by the LAD technique <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, a set of 17 genes selected by ROC (FROC) with Markov blanket <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> and the set of 7 genes selected by our voting approach. For ROC (FROC) with Markov blanket we have used the same parameter for the number of genes to select as given in that paper, namely 50 genes and then using an area between two ROC curves (ABR) value &gt; 20 for the second step, chosen to yield the most competitive performance for the technique. We applied five classification methods used by Alexe <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and our top performing classifier C4.5 with boosting on the gene set of size 4. These classification methods include ANN, SVM, Logistic Regression, <it>k</it>-NN, C4.5 decision tree and C4.5 with boosting (see Table <tblr tid="T6">6</tblr>). Following this, predictive models were constructed for the training set and were tested using the supplied test set of 19 samples.</p>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p>Comparison of the classifier performance using i) a variable subset of our 7 genes, ii) a set of 17 genes identified by ROC with Markov Blanket <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, iii) a set of 17 genes identified by LAD <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, iv) a set of 70 genes identified by van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and v) a set of 231 genes identified by van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp></p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Classifier</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Subsets of our 7 genes</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Our 7 genes</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Set of 17 genes </b>
                           <abbrgrp>
                              <abbr bid="B9">9</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Set of 17 genes </b>
                           <abbrgrp>
                              <abbr bid="B15">15</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Set of 70 genes </b>
                           <abbrgrp>
                              <abbr bid="B3">3</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Set of 231 genes </b>
                           <abbrgrp>
                              <abbr bid="B3">3</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C4.5 with boosting</p>
                     </c>
                     <c ca="center">
                        <p>91.67%</p>
                     </c>
                     <c ca="center">
                        <p>84.52%</p>
                     </c>
                     <c ca="center">
                        <p>68.42%</p>
                     </c>
                     <c ca="center">
                        <p>59.52%</p>
                     </c>
                     <c ca="center">
                        <p>54.76%</p>
                     </c>
                     <c ca="center">
                        <p>76.19%</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C4.5</p>
                     </c>
                     <c ca="center">
                        <p>84.52%</p>
                     </c>
                     <c ca="center">
                        <p>84.52%</p>
                     </c>
                     <c ca="center">
                        <p>68.42%</p>
                     </c>
                     <c ca="center">
                        <p>57.90%*</p>
                     </c>
                     <c ca="center">
                        <p>42.11%*</p>
                     </c>
                     <c ca="center">
                        <p>73.68%*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>k</it>-NN</p>
                     </c>
                     <c ca="center">
                        <p>80.36%</p>
                     </c>
                     <c ca="center">
                        <p>77.38%</p>
                     </c>
                     <c ca="center">
                        <p>74.21%</p>
                     </c>
                     <c ca="center">
                        <p>63.16%*</p>
                     </c>
                     <c ca="center">
                        <p>63.16%*</p>
                     </c>
                     <c ca="center">
                        <p>78.94%*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Logistic Regression</p>
                     </c>
                     <c ca="center">
                        <p>81.55%</p>
                     </c>
                     <c ca="center">
                        <p>77.38%</p>
                     </c>
                     <c ca="center">
                        <p>73.68%</p>
                     </c>
                     <c ca="center">
                        <p>73.68%*</p>
                     </c>
                     <c ca="center">
                        <p>47.37%*</p>
                     </c>
                     <c ca="center">
                        <p>73.68%*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ANN</p>
                     </c>
                     <c ca="center">
                        <p>77.38%</p>
                     </c>
                     <c ca="center">
                        <p>76.19%</p>
                     </c>
                     <c ca="center">
                        <p>84.21%</p>
                     </c>
                     <c ca="center">
                        <p>84.21%*</p>
                     </c>
                     <c ca="center">
                        <p>42.11%*</p>
                     </c>
                     <c ca="center">
                        <p>73.68%*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SVM</p>
                     </c>
                     <c ca="center">
                        <p>83.33%</p>
                     </c>
                     <c ca="center">
                        <p>76.19%</p>
                     </c>
                     <c ca="center">
                        <p>79.47%</p>
                     </c>
                     <c ca="center">
                        <p>63.16%*</p>
                     </c>
                     <c ca="center">
                        <p>57.90%*</p>
                     </c>
                     <c ca="center">
                        <p>73.68%*</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*Results adopted from Alexe <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp>.</p>
               </tblfn>
            </tbl>
            <p>It is clear that the weighted accuracy in distinguishing patients with Good and Poor breast cancer prognoses is the highest across all classifiers using the 7 genes selected by our voting approach and is much higher than the models using 17 (by LAD), 70 and 231 genes. Our approach produced much better performance for most classifiers, except for ANN, where using the 17 selected genes of Alexe <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp> or by using ROC with Markov blanket <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> was better. However, the methodology of Alexe <it>et al. </it>incorporated a "selection bias" <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> for finding their subset of 17 genes, since the test set was used. In contrast, our voting approach did not have access to the test set for gene selection. When selecting genes using our voting approach, we used only the training dataset, keeping the test set completely unknown. The performance of the classifiers using the selected 70 and 231 genes by van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> was found to be insignificant compared with that of our approach. Furthermore, 70 or 231 is not a compact set of genes and our voting method can obtain a better accuracy using at most four genes (see Table <tblr tid="T2">2</tblr>).</p>
            <p>Gevaert <it>et al. </it><abbrgrp><abbr bid="B43">43</abbr></abbrgrp> proposed a Bayesian networ <it>k</it>-based strategy to treat clinical and microarray data along the same lines as above using the same dataset. A probabilistic model was used because it integrates the data sources in several ways, and explores and documents the model structure and parameters. The concept of a Markov Blanket is used to identify all the variables that shield the class variable from being affected by the rest of the network. However, all the processes are integrated in the classifier, and hence the performance of the system is biased towards the choice of a classifier. Furthermore, the performance of the classifier would depend on the selection of the initial distribution for the model.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison with other filter approaches</p>
            </st>
            <p>Jeffery <it>et al. </it><abbrgrp><abbr bid="B44">44</abbr></abbrgrp> have demonstrated that the ROC is an accurate way to identify differentially regulated genes in a microarray dataset and that it can produce robust classifiers applying 9 feature selection techniques on 9 gene expression datasets. When dealing with datasets that have 15 or more samples, the ROC was shown to be the most accurate. Other filter approaches like <it>t</it>-test and Principle Component Analysis (PCA) produce reasonable results, but ROC yields better results (see Figure <figr fid="F1">1</figr>) on the van 't Veer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> dataset. It is particularly useful for gene expression data, as it is not directly dependent on the standard deviation of the expression of each gene like the <it>t</it>-test is, or only on the correlation of each genes like the PCA is. Moreover, unlike PCA, ROC is not sensitive to the scaling of the data.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>ROC curves of three classifiers with selected genes using three filter approaches FROC, <it>t</it>-test and PCA</p>
               </caption>
               <text>
                  <p><b>ROC curves of three classifiers with selected genes using three filter approaches FROC, <it>t</it>-test and PCA</b>. A group of three graphs showing ROC curves for three classifiers with selected genes using three filter approaches FROC, <it>t</it>-test and PCA.</p>
               </text>
               <graphic file="1471-2105-10-S1-S19-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Gene set significance tests</p>
            </st>
            <p>It is interesting to consider the results of gene set enrichment analysis (GSEA) <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> on our set of 7 genes obtained by voting, against the other gene sets proposed by van 't Veer <it>et al. </it><abbrgrp><abbr bid="B3">3</abbr></abbrgrp> (size 70 and 231 genes, respectively), Alexe <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp> (17 genes), and the one we have obtained using Mamitsuka's <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> technique (17 genes). Three out of the five gene sets are enriched in phenotype 1 (i.e. relapse). Of which, Mamitsuka's and our gene sets have an FDR <it>q</it>-value equal to 0 and 0.005, with an enrichment score (ES) of 0.78 and 0.80, respectively. Members of the leading edge subset (i.e., tags = 100%, list = 20% and signal = 125%) also indicate that our gene set contains only those genes contributing to the enrichment score, compared to the other gene sets that contain only a fraction of genes contributing to the enrichment score (see Additional File <supplr sid="S2">2</supplr>).</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>This file contains the result of gene set enrichment analysis (GSEA).</p>
               </text>
               <file name="1471-2105-10-S1-S19-S2.zip">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We have proposed and implemented a multi-classifier voting approach for gene selection, to effectively classify the prognosis of breast cancer patients using data from two distinct treatment cases. The novelty of our approach is that it can identify a very small number of genes that are predictive across a large range of classifiers. We applied our voting approach to three well-known microarray datasets, related to breast cancer. Experimental analysis demonstrated high prediction accuracies for the gene sets discovered, compared to previous studies. The gene sets discovered were also biologically relevant and had good biological process coverage.</p>
      </sec>
      <sec>
         <st>
            <p>List of abbreviations used</p>
         </st>
         <p>ROC: Receiver Operating Characteristic; AUC: Area Under the ROC Curve; LMT: Logistic Model Tree; <it>k</it>-NN: <it>k</it>-Nearest Neighbour; ANN: Artificial Neural Network; SVM: Support Vector Machine; NBTree: Na&#239;ve Bayes Tree; CV: Cross Validation; PCA: Principle Component Analysis; GSEA: Gene Set Enrichment Analysis; ES: Enrichment Score; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>MRH, MMH and KR conceived the design of the ROC ranking and voting algorithm. JB and GM contributed to the experimental design, and the experiments were performed by MMH and MRH. The biological significance was investigated by GM and JWKH and the writing was performed with input from all authors.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work is partially supported by National ICT Australia. National ICT Australia is funded by the Australian Government's Backing Australia's Ability initiative in part through the Australian Research Council. MMH is supported by a Melbourne Research Scholarship. JWKH and GM are supported by an Australian Postgraduate Award and a NICTA Research Project Award.</p>
            <p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 10 Supplement 1, 2009: Proceedings of The Seventh Asia Pacific Bioinformatics Conference (APBC) 2009. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2105/10?issue=S1</url></p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <aug>
               <au>
                  <snm>Schena</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Microarray Analysis</source>
            <publisher>Hoboken, NJ, USA: Wiley-Liss</publisher>
            <pubdate>2003</pubdate>
            <fpage>630</fpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>DNA Microarray Technology: Devices, Systems, and Applications</p>
            </title>
            <aug>
               <au>
                  <snm>Heller</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Annual Review of Biomedical Engineering</source>
            <pubdate>2002</pubdate>
            <volume>4</volume>
            <fpage>129</fpage>
            <lpage>153</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.bioeng.4.020702.153438</pubid>
                  <pubid idtype="pmpid" link="fulltext">12117754</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Gene expression profiling predicts clinical outcome of breast cancer</p>
            </title>
            <aug>
               <au>
                  <snm>van't Veer</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Dai</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Vijver</snm>
                  <mnm>van de</mnm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>AAM</fnm>
               </au>
               <au>
                  <snm>Mao</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Peterse</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Kooy</snm>
                  <mnm>van der</mnm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Marton</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Witteveen</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Schreiber</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Kerkhoven</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Linsley</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Bernards</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Friend</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>530</fpage>
            <lpage>535</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/415530a</pubid>
                  <pubid idtype="pmpid" link="fulltext">11823860</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen</p>
            </title>
            <aug>
               <au>
                  <snm>Ma</snm>
                  <fnm>XJ</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Ryan</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Isako3</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Barmettler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fuller</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Muir</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mohapatra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Salunga</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tuggle</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Tran</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tran</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tassin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Amon</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Enright</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Stecker</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Estepa-Sabal</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Younger</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Balis</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Michaelson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bhan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Habin</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Baer</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Brugge</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Haber</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Erlander</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Sgroi</snm>
                  <fnm>DC</fnm>
               </au>
            </aug>
            <source>Cancer Cell</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>607</fpage>
            <lpage>616</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ccr.2004.05.015</pubid>
                  <pubid idtype="pmpid" link="fulltext">15193263</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen</p>
            </title>
            <aug>
               <au>
                  <snm>Loi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Haibe-Kains</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Desmedt</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wirapati</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Lallemand</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Tutt</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Gillet</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ryder</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Reid</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Daidone</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Pierotti</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Berns</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Jansen</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Foekens</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Delorenzi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bontempi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Piccart</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Sotiriou</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>239</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2423197</pubid>
                  <pubid idtype="pmpid" link="fulltext">18498629</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-9-239</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The meaning and use of the area under a receiver operating characteristic (ROC) curve</p>
            </title>
            <aug>
               <au>
                  <snm>Hanley</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>McNeil</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Radiology</source>
            <pubdate>1982</pubdate>
            <volume>143</volume>
            <fpage>29</fpage>
            <lpage>36</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7063747</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>ROC Graphs: Notes and Practical Considerations for Researchers</p>
            </title>
            <aug>
               <au>
                  <snm>Fawcett</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <publisher>Technical Report MS 1143 &#8211; Extended version of HPL-2003-4, HP Laboratories</publisher>
            <pubdate>2004</pubdate>
         </bibl>
         <bibl id="B8">
            <aug>
               <au>
                  <snm>Egan</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Signal Detection Theory and ROC Analysis</source>
            <publisher>Academic Press Series in Cognition and Perception, London, UK: Academic Press</publisher>
            <pubdate>1975</pubdate>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Selecting features in microarray classification using ROC curves</p>
            </title>
            <aug>
               <au>
                  <snm>Mamitsuka</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Pattern Recognition</source>
            <pubdate>2006</pubdate>
            <volume>39</volume>
            <issue>12</issue>
            <fpage>2393</fpage>
            <lpage>2404</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/j.patcog.2006.07.010</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The area above the ordinal dominance graph and the area below the receiver operating characteristic graph</p>
            </title>
            <aug>
               <au>
                  <snm>Bamber</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Journal of Mathematics and Psychology</source>
            <pubdate>1975</pubdate>
            <volume>12</volume>
            <fpage>387</fpage>
            <lpage>415</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0022-2496(75)90001-2</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Logistic Model Trees</p>
            </title>
            <aug>
               <au>
                  <snm>Landwehr</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Frank</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Proceedings of the 16th European Conference on Machine Learning (ECML 2003)</source>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Scaling up the accuracy of Na&#239;ve-Bayes classifiers: A decision tree hybrid</p>
            </title>
            <aug>
               <au>
                  <snm>Kohavi</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Procedings of the Second Internaltional Conference on Knowledge Discovery and Data Mining</source>
            <pubdate>1996</pubdate>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Bagging predictors</p>
            </title>
            <aug>
               <au>
                  <snm>Breiman</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Machine Learning</source>
            <pubdate>1996</pubdate>
            <volume>24</volume>
            <issue>2</issue>
            <fpage>123</fpage>
            <lpage>140</lpage>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Experiments with a new boosting algorithm</p>
            </title>
            <aug>
               <au>
                  <snm>Freund</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Schapire</snm>
                  <fnm>RE</fnm>
               </au>
            </aug>
            <source>Proceedings of International Conference on Machine Learning</source>
            <publisher>San Francisco: Morgan Kaufmann</publisher>
            <pubdate>1996</pubdate>
            <fpage>148</fpage>
            <lpage>156</lpage>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Breast cancer prognosis by combinatorial analysis of gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Alexe</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Alexe</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Axelrod</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Bonates</snm>
                  <fnm>TO</fnm>
               </au>
               <au>
                  <snm>Lozina</snm>
                  <fnm>II</fnm>
               </au>
               <au>
                  <snm>Reiss</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hammer</snm>
                  <fnm>PL</fnm>
               </au>
            </aug>
            <source>Breast Cancer Research</source>
            <pubdate>2006</pubdate>
            <volume>8</volume>
            <fpage>R41</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1779471</pubid>
                  <pubid idtype="pmpid" link="fulltext">16859500</pubid>
                  <pubid idtype="doi">10.1186/bcr1512</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Quantitative analysis of carbonic anhydrase IX mRNA in human non-small cell lung cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Simi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Venturini</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Malentacchi</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gelmini</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Andreani</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Janni</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pastorekova</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Supuran</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Pazzaqli</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Orlando</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Lung Cancer</source>
            <pubdate>2006</pubdate>
            <volume>52</volume>
            <fpage>59</fpage>
            <lpage>66</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.lungcan.2005.11.017</pubid>
                  <pubid idtype="pmpid" link="fulltext">16513206</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Improved breast cancer prognosis through the combination of clinical and genetic markers</p>
            </title>
            <aug>
               <au>
                  <snm>Sun</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Goodison</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Farmerie</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>30</fpage>
            <lpage>37</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl543</pubid>
                  <pubid idtype="pmpid" link="fulltext">17130137</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Expression of carbonic anhydrase IX in astrocytic tumors predicts poor prognosis</p>
            </title>
            <aug>
               <au>
                  <snm>Rantala</snm>
                  <fnm>IJ</fnm>
               </au>
               <au>
                  <snm>Soini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Parkkila</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Pastorekova</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pastorek</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Parkkila</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Haapasalo</snm>
                  <fnm>HK</fnm>
               </au>
            </aug>
            <source>Clinical Cancer Research</source>
            <pubdate>2006</pubdate>
            <volume>12</volume>
            <issue>2</issue>
            <fpage>473</fpage>
            <lpage>477</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1158/1078-0432.CCR-05-0848</pubid>
                  <pubid idtype="pmpid" link="fulltext">16428489</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>HIF-1alpha and CA IX staining in invasive breast carcinomas: prognosis and treatment outcome</p>
            </title>
            <aug>
               <au>
                  <snm>Trastour</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Benizri</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ettore</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ramaioli</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chamorey</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pouyss&#233;gur</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Berra</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>International Journal of Cancer</source>
            <pubdate>2007</pubdate>
            <volume>120</volume>
            <issue>7</issue>
            <fpage>1451</fpage>
            <lpage>1458</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/ijc.22436</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Role of CD97(stalk) and CD55 as molecular markers for prognosis and therapy of gastric carcinoma patients</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Peng</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>ZX</fnm>
               </au>
               <au>
                  <snm>Hoang-Vu</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Journal of Zhejiang University Science B</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>9</issue>
            <fpage>913</fpage>
            <lpage>918</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1389911</pubid>
                  <pubid idtype="pmpid" link="fulltext">16130195</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Nucleosome destabilization in the epigenetic regulation of gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nature Reviews Genetics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>15</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg2206</pubid>
                  <pubid idtype="pmpid" link="fulltext">18059368</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Neuromedin U is regulated by the metastasis suppressor RhoGDI2 and is a novel promoter of tumor formation, lung metastasis and cancer cachexia</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>McRoberts</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Berr</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Frierson</snm>
                  <fnm>HF</fnm>
               </au>
               <au>
                  <snm>Conaway</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Theodorescu</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Oncogene</source>
            <pubdate>2007</pubdate>
            <volume>26</volume>
            <issue>5</issue>
            <fpage>765</fpage>
            <lpage>773</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.onc.1209835</pubid>
                  <pubid idtype="pmpid" link="fulltext">16878152</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Mip/LIN-9 regulates the expression of B-Myb and the induction of cyclin A, cyclin B, and CDK1</p>
            </title>
            <aug>
               <au>
                  <snm>Pilkinton</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sandoval</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Song</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ness</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Colamonici</snm>
                  <fnm>OR</fnm>
               </au>
            </aug>
            <source>Journal of Biological Chemistry</source>
            <pubdate>2007</pubdate>
            <volume>282</volume>
            <fpage>168</fpage>
            <lpage>175</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M609924200</pubid>
                  <pubid idtype="pmpid" link="fulltext">17098733</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Inhibition of oncogenic transformation by mammalian Lin-9, a pRB-associated protein</p>
            </title>
            <aug>
               <au>
                  <snm>Gagrica</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hauser</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kolfschoten</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Osterloh</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Agami</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gaubatz</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>EMBO Journal</source>
            <pubdate>2004</pubdate>
            <volume>23</volume>
            <issue>23</issue>
            <fpage>4627</fpage>
            <lpage>4638</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">533054</pubid>
                  <pubid idtype="pmpid" link="fulltext">15538385</pubid>
                  <pubid idtype="doi">10.1038/sj.emboj.7600470</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Mitf regulation of Dia1 controls melanoma proliferation and invasiveness</p>
            </title>
            <aug>
               <au>
                  <snm>Carreira</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Goodall</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Denat</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Rodriguez</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nuciforo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hoek</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Testori</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Larue</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Goding</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>Genes and Development</source>
            <pubdate>2006</pubdate>
            <volume>20</volume>
            <issue>24</issue>
            <fpage>3426</fpage>
            <lpage>3429</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1101/gad.406406</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The consensus coding sequences of human breast and colorectal cancers</p>
            </title>
            <aug>
               <au>
                  <snm>Sjoblom</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wood</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Parsons</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barber</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Mandelker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Leary</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Ptak</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Silliman</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Szabo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Buckhaults</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Farrell</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Meeh</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Markowitz</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Willis</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dawson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Willson</snm>
                  <fnm>JKV</fnm>
               </au>
               <au>
                  <snm>Gazdar</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Hartigan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Parmigiani</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>BH</fnm>
               </au>
               <au>
                  <snm>Bachman</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Papadopoulos</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Vogelstein</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kinzler</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Velculescu</snm>
                  <fnm>VE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>314</volume>
            <issue>5797</issue>
            <fpage>268</fpage>
            <lpage>274</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1133427</pubid>
                  <pubid idtype="pmpid" link="fulltext">16959974</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Frequency of SOX Group B (SOX1, 2, 3) and ZIC2 antibodies in Turkish patients with small cell lung carcinoma and their correlation with clinical parameters</p>
            </title>
            <aug>
               <au>
                  <snm>Vural</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Saip</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>YT</fnm>
               </au>
               <au>
                  <snm>Ustuner</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gonen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>AJG</fnm>
               </au>
               <au>
                  <snm>Old</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Ozbek</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Gure</snm>
                  <fnm>AO</fnm>
               </au>
            </aug>
            <source>Cancer</source>
            <pubdate>2005</pubdate>
            <volume>103</volume>
            <issue>12</issue>
            <fpage>2575</fpage>
            <lpage>83</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/cncr.21088</pubid>
                  <pubid idtype="pmpid" link="fulltext">15880380</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Novel 5 alpha-steroid reductase (SRD5A3, type-3) is overexpressed in hormone-refractory prostate cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Uemura</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tamura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Chung</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Honma</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Okuyama</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nakagawa</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Cancer Science</source>
            <pubdate>2008</pubdate>
            <volume>99</volume>
            <fpage>81</fpage>
            <lpage>86</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17986282</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Prostacyclin-mediated activation of peroxisome proliferator-activated receptor delta in colorectal cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Gupta</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>WF</fnm>
               </au>
               <au>
                  <snm>Geraci</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Willson</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Dey</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>DuBois</snm>
                  <fnm>RN</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences of the United States of America</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <issue>24</issue>
            <fpage>13275</fpage>
            <lpage>13280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">27215</pubid>
                  <pubid idtype="pmpid" link="fulltext">11087869</pubid>
                  <pubid idtype="doi">10.1073/pnas.97.24.13275</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Genetic disruption of PPARdelta decreases the tumorigenicity of human colon cancer cells</p>
            </title>
            <aug>
               <au>
                  <snm>Park</snm>
                  <fnm>BH</fnm>
               </au>
               <au>
                  <snm>Vogelstein</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kinzler</snm>
                  <fnm>KW</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences of the United States of America</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>2598</fpage>
            <lpage>2603</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">30184</pubid>
                  <pubid idtype="pmpid" link="fulltext">11226285</pubid>
                  <pubid idtype="doi">10.1073/pnas.051630998</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Inhibition of decay-accelerating factor (CD55) attenuates prostate cancer growth and survival in vivo</p>
            </title>
            <aug>
               <au>
                  <snm>Loberg</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Dunn</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kalikin</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Pienta</snm>
                  <fnm>KJ</fnm>
               </au>
            </aug>
            <source>Neoplasia (New York, NY)</source>
            <pubdate>2006</pubdate>
            <volume>8</volume>
            <fpage>69</fpage>
            <lpage>78</lpage>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Gene Expression Data Classification With Kernel Principal Component Analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bensmail</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Journal of Biomedicine and Biotechnology</source>
            <pubdate>2005</pubdate>
            <volume>2</volume>
            <fpage>155</fpage>
            <lpage>159</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1155/JBB.2005.155</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Loss of CD55 is associated with aggressive breast tumors</p>
            </title>
            <aug>
               <au>
                  <snm>Madjd</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Durrant</snm>
                  <fnm>LG</fnm>
               </au>
               <au>
                  <snm>Bradley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Spendlove</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>IO</fnm>
               </au>
               <au>
                  <snm>Pinder</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Clinical Cancer Research: An Official Journal of the American Association for Cancer Research</source>
            <pubdate>2004</pubdate>
            <volume>10</volume>
            <issue>8</issue>
            <fpage>2797</fpage>
            <lpage>2803</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15102687</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Erk1/2-dependent Phosphorylation of Galpha-interacting Protein Stimulates Its GTPase Accelerating Activity and Autophagy in Human Colon Cancer Cells</p>
            </title>
            <aug>
               <au>
                  <snm>Ogier-Denis</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pattingre</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Benna</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Codogno</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2000</pubdate>
            <volume>275</volume>
            <issue>50</issue>
            <fpage>39090</fpage>
            <lpage>39095</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M006198200</pubid>
                  <pubid idtype="pmpid" link="fulltext">10993892</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Gene expression predictors of breast cancer outcomes</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Dressman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pittman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tsou</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Horng</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Bild</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Iversen</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Liao</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>West</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nevins</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>AT</fnm>
               </au>
            </aug>
            <source>Lancet</source>
            <pubdate>2003</pubdate>
            <volume>361</volume>
            <issue>9369</issue>
            <fpage>1590</fpage>
            <lpage>1596</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0140-6736(03)13308-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12747878</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Gene Expression Patterns of Breast Carcinomas Distinguish Tumor Subclasses with Clinical Implications</p>
            </title>
            <aug>
               <au>
                  <snm>S&#248;rlie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Perou</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Aas</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Geisler</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Johnsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Rijn</snm>
                  <mnm>van de</mnm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jeffrey</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Thorsen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Quist</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Matese</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lonning</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Borresen-Dale</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>10869</fpage>
            <lpage>10874</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.191367098</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Repeated Observation of Breast Tumor Subtypes in Independent Gene Expression Data Sets</p>
            </title>
            <aug>
               <au>
                  <snm>S&#248;rlie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Parker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Marron</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Nobel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Johnsen</snm>
                  <fnm>SDH</fnm>
               </au>
               <au>
                  <snm>Pesich</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Geisler</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Demeter</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Perou</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Lonning</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Borresen-Dale</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <issue>14</issue>
            <fpage>8418</fpage>
            <lpage>8423</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.0932692100</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Breast cancer classification and prognosis based on gene expression profiles from a population-based study</p>
            </title>
            <aug>
               <au>
                  <snm>Sotiriou</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Neo</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>McShane</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Korn</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Jazaeri</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Martiat</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Fox</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>ET</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <issue>18</issue>
            <fpage>10393</fpage>
            <lpage>10398</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.1732912100</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>A gene-expression signature as a predictor of survival in breast cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Vijver</snm>
                  <mnm>van de</mnm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Yudong</snm>
                  <fnm>HE</fnm>
               </au>
               <au>
                  <snm>van't Veer</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Dai</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Voskuil</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Schreiber</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Peterse</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Marton</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Parrish</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Atsma</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Witteveen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Glas</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Delahaye</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Velde</snm>
                  <mnm>van der</mnm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bartelink</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Rodenhuis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rutgers</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Friend</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Bernards</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>The New England Journal of Medicine</source>
            <pubdate>2002</pubdate>
            <volume>347</volume>
            <issue>45</issue>
            <fpage>1999</fpage>
            <lpage>2009</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1056/NEJMoa021967</pubid>
                  <pubid idtype="pmpid" link="fulltext">12490681</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Comparing prognostic markers for metastases in breast cancer using artificial neural networks</p>
            </title>
            <aug>
               <au>
                  <snm>Ritz</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Masters thesis</source>
            <publisher>Lund University, Sweden</publisher>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Finding predictive gene groups from microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Dettling</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Buehlmann</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Journal of Multivariate Analysis</source>
            <pubdate>2004</pubdate>
            <volume>90</volume>
            <fpage>106</fpage>
            <lpage>131</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/j.jmva.2004.02.012</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Selection Bias in Gene Extraction on the Basis of Microarray Gene-Expression Data</p>
            </title>
            <aug>
               <au>
                  <snm>Ambroise</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>McLachlan</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>6562</fpage>
            <lpage>6566</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.102102699</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks</p>
            </title>
            <aug>
               <au>
                  <snm>Gevaert</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>de Smet</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Timmerman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>de Moor</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>14</issue>
            <fpage>e184</fpage>
            <lpage>e190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl230</pubid>
                  <pubid idtype="pmpid" link="fulltext">16873470</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Jeffery</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Culhane</snm>
                  <fnm>AC</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>359</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1544358</pubid>
                  <pubid idtype="pmpid" link="fulltext">16872483</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-359</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles</p>
            </title>
            <aug>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mootha</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Mukherjee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ebert</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Gillette</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Paulovich</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pomeroy</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <issue>43</issue>
            <fpage>15545</fpage>
            <lpage>15550</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0506580102</pubid>
                  <pubid idtype="pmpid" link="fulltext">16199517</pubid>
                  <pubid idtype="pmcid">1239896</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>