<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1755-8794-2-64</ui>
   <ji>1755-8794</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Accurate molecular classification of cancer using simple rules</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Wang</snm>
               <fnm>Xiaosheng</fnm>
               <insr iid="I1"/>
               <email>david@genome.ist.i.kyoto-u.ac.jp</email>
            </au>
            <au id="A2">
               <snm>Gotoh</snm>
               <fnm>Osamu</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>o.gotoh@i.kyoto-u.ac.jp</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan</p>
            </ins>
            <ins id="I2">
               <p>National Institute of Advanced Industrial Science and Technology, Computational Biology Research Center, Tokyo 135-0064, Japan</p>
            </ins>
         </insg>
         <source>BMC Medical Genomics</source>
         <issn>1755-8794</issn>
         <pubdate>2009</pubdate>
         <volume>2</volume>
         <issue>1</issue>
         <fpage>64</fpage>
         <url>http://www.biomedcentral.com/1755-8794/2/64</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19874631</pubid>
               <pubid idtype="doi">10.1186/1755-8794-2-64</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>05</day>
               <month>2</month>
               <year>2009</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>30</day>
               <month>10</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>30</day>
               <month>10</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Wang and Gotoh; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>One intractable problem with using microarray data analysis for cancer classification is how to reduce the extremely high-dimensionality gene feature data to remove the effects of noise. Feature selection is often used to address this problem by selecting informative genes from among thousands or tens of thousands of genes. However, most of the existing methods of microarray-based cancer classification utilize too many genes to achieve accurate classification, which often hampers the interpretability of the models. For a better understanding of the classification results, it is desirable to develop simpler rule-based models with as few marker genes as possible.</p>
            </sec>
            <sec>
               <st>
                  <p>Methods</p>
               </st>
               <p>We screened a small number of informative single genes and gene pairs on the basis of their depended degrees proposed in rough sets. Applying the decision rules induced by the selected genes or gene pairs, we constructed cancer classifiers. We tested the efficacy of the classifiers by leave-one-out cross-validation (LOOCV) of training sets and classification of independent test sets.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We applied our methods to five cancerous gene expression datasets: leukemia (acute lymphoblastic leukemia [ALL] vs. acute myeloid leukemia [AML]), lung cancer, prostate cancer, breast cancer, and leukemia (ALL vs. mixed-lineage leukemia [MLL] vs. AML). Accurate classification outcomes were obtained by utilizing just one or two genes. Some genes that correlated closely with the pathogenesis of relevant cancers were identified. In terms of both classification performance and algorithm simplicity, our approach outperformed or at least matched existing methods.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>In cancerous gene expression datasets, a small number of genes, even one or two if selected correctly, is capable of achieving an ideal cancer classification effect. This finding also means that very simple rules may perform well for cancerous class prediction.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Rapid advances in gene expression microarray technology have enabled the simultaneous measurement of the expression levels of tens of thousands of genes in a single experiment <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. By measuring gene expression levels related to multiple individuals and multiple tissue or tumor samples, investigators can discover molecular markers to be used for cancer diagnosis, prognosis, and prediction. Many researchers have explored the use of microarray technology to build cancer diagnosis, prognosis, and prediction classifiers, since the pioneering work of Golub et al. in applying gene expression monitoring by DNA microarray to cancer classification <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. However, one intractable problem with using microarray data analysis to create cancer classifiers is how to reduce the exceedingly high-dimensional gene expression data, which contain a large amount of noise. On the other hand, compared with the measured quantities of gene expression levels in experiments, the numbers of samples are severely limited. This brings about two computational challenges: computational cost and classification accuracy. To achieve efficient and accurate classification, it is natural for researchers to investigate feature selection; i.e., gene filtering <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. However, one serious drawback of most existing methods is that too many genes are ultimately selected for the classification of cancer, thereby hampering the interpretability of the models. In fact, it is not easy to gauge which gene is essential in determining a cancerous class if accurate classification is obtained based on a large cluster of genes.</p>
         <p>In parallel with feature selection, classifier construction is an important topic in this field. In machine learning and data mining, the methods of generating classifiers include unsupervised and supervised approaches. The latter is further classified into two categories: "black-box" and "white-box" models. The "black-box" models, such as support vector machines (SVMs), discriminant analysis (DA), artificial neural networks (ANNs), genetic algorithms (GAs), na&#239;ve Bayes (NB), and <it>k</it>-nearest neighbors (<it>k</it>-NNs), address classification problems without any knowledge-based explanation rules. In contrast, the "white-box" models, such as Decision Trees <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, Rough Sets <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, and emerging patterns (EPs) <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, often implement classification by giving "IF-THEN"-like rules. The "white-box" models are sometimes more welcomed by biologists and clinicians because they are easily understood.</p>
         <p>Many investigators have utilized the rule-based approaches (i.e., "white-box" models) to produce cancer classifiers <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. In general, these classifiers involve few genes, whereas they exhibit efficient prediction performance. In <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, the authors proposed one method of identifying good diagnostic gene groups from gene expression profiles using the concept of EPs. The authors sought to find the gene groups whose frequency of patterns changed significantly between two classes of cells. They then used the rules arising from these patterns to construct cancer classifiers. Their classifiers were uncomplicated, as they merely contained the rules involving a few genes. In <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, decision tree algorithms involving single C4.5, Bagging, and AdaBoost decision trees were applied to classify gene expression datasets. In essence, a decision tree is a rule-based classifier. The classifier screens the informative features to build decision trees based on the information entropy concept. Subsequently, rules are derived from the trees. Because decision tree algorithms commonly conduct pruning of the trees to remove unnecessary features, the derived rules generally involve only a small number of features. In <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, the authors proposed the use of high-ranked association rule groups to construct cancer classifiers instead of utilizing all of the mined association rules, which commonly involves excessive numbers of redundant rules.</p>
         <p>Some investigators have addressed the problem of using pairs of genes to conduct cancer classification. In <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, the authors classified gene expression profiles using a comparison-based approach, the "top-scoring pair(s)," called the <it>TSP </it>classifier. The authors attempted to predict classes by comparing the expression levels of a single pair of genes, chosen based on a simple measure of class discrimination. In <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, the authors investigated the use of gene pairs for classification. They screened the gene pairs that had marked differences in average expression levels between the tumor types in the training set. The gene pairs were then applied to classify test sets.</p>
         <p>Rough sets, a data-analysis method originally proposed by Pawlak in the early 1980s <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, has evolved into a widely accepted machine-learning and data-mining method <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. In <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, rough sets was applied for cancer classification and prediction based on an attribute reduction approach. In <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, we proposed a rough sets-based soft computing method to conduct cancer classification using single genes or gene pairs. In this article, we also explore the use of single genes and gene pairs in constructing cancer classifiers; however, in contrast to <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, we first aimed to use the concept of canonical depended degree, as proposed in rough sets for gene selection. In the cases that this approach was unsuccessful, we considered utilizing the <it>&#945; </it>depended degree standard suggested in <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> for gene selection. In this work, the <it>&#945; </it>depended degree was employed for a portion of the datasets. In addition, unlike the other rough sets-based methods, we did not carry out attribute reduction for gene selection. Instead, we first implemented feature ranking according to the depended degree or <it>&#945; </it>depended degree of attributes, and then selected the top-ranked genes to create classifiers so as to avoid expensive computation for attribute reduction. Moreover, we made use of the decision rules induced by the chosen genes to build classifiers, whereas existing rough sets-based methods only utilized rough sets for gene selection, and the classifier constructions depended upon other machine-learning algorithms such as SVMs, ANNs, GAs, NB, and <it>k-</it>NNs <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>We tested the methods in the five publicly available gene expression datasets: Leukemia 1 (ALL vs. AML), Lung Cancer, Prostate Cancer, Breast Cancer, and Leukemia 2 (ALL vs. MLL vs. AML), which can be downloaded from the Kent Ridge Bio-medical Data Set Repository <url>http://datam.i2r.a-star.edu.sg/datasets/krbd/</url>. We compared our results with the findings of previous studies. Furthermore, we examined and analyzed the biological relevance of the selected genes.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Rough sets</p>
            </st>
            <p>In rough sets, an equivalence relation on <it>U </it>is referred to as one <it>knowledge</it>, and a family of equivalence relations is referred to as a <it>knowledge base </it>on <it>U</it>. In reality, we are often faced with a large amount of ill-defined data, and we want to learn about them based on pre-existing knowledge. However, most of these data cannot be precisely defined based on pre-existing knowledge, as they incorporate both definite and vague components. In <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, Pawlak describes the definite parts using the concept of positive region.</p>
            <p><b>Definition 1 </b>Let <it>U </it>be a universe of discourse, <it>X </it>&#8838; <it>U</it>, and <it>R </it>is an equivalence relation on <it>U</it>. <it>U/R </it>represents the set of the equivalence class of <it>U </it>induced by <it>R</it>. The <it>positive region </it>of <it>X </it>on <it>R </it>in <it>U </it>is defined as <it>pos(R, X) </it>= &#8746; {<it>Y </it>&#8712; <it>U/R </it>| <it>Y </it>&#8838; <it>X</it>}
<abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
            <p>The decision table is the data form studied by rough sets. One decision table can be represented as <it>S </it>= (<it>U</it>, <it>A </it>= <it>C </it>&#8746; <it>D</it>), where <it>U </it>is the set of samples, <it>C </it>is the condition attribute set, and <it>D </it>is the decision attribute set. Without loss of generality, hereafter we assume <it>D </it>is a single-element set, and we call <it>D </it>the <it>decision attribute</it>. <it>A </it>can be viewed as a knowledge base in <it>S</it>, as each attribute or attribute subset can induce an equivalence relation on <it>U</it>. In the decision table, if we designate <it>I</it><sub><it>a </it></sub>as the function mapping a member (sample) of <it>U </it>to the value of the member on the attribute <it>a </it>(<it>a </it>&#8712; <it>A</it>), then the equivalence relation <it>R(A') </it>induced by the attribute subset <it>A' </it>&#8838; <it>A </it>is defined as: for &#8704;<it>x</it>, <it>y </it>&#8712; <it>U</it>, <it>xR(A')y</it>, if and only if <it>I</it><sub><it>a</it></sub><it>(x) </it>= <it>I</it><sub><it>a</it></sub><it>(y) </it>for each <it>a </it>&#8712; <it>A'</it>.</p>
            <p>For the cancer classification problem, every collected set of microarray data can be represented as a decision table in the form of Table <tblr tid="T1">1</tblr>. In the microarray data decision table, there are <it>m </it>samples and <it>n </it>genes. Every sample is assigned to one class label. The expression level of gene <it>y </it>in sample <it>x </it>is represented by <it>g(x, y)</it>.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Microarray data decision table</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Samples</b>
                        </p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>
                           <b>Condition attributes (genes)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Decision attributes (classes)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Gene 1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Gene 2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>...</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Gene <it>n</it></b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Class label</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p><it>g</it>(1,1)</p>
                     </c>
                     <c ca="center">
                        <p><it>g</it>(1,2)</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p><it>g</it>(1, <it>n</it>)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Class (1)</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p><it>g</it>(2,1)</p>
                     </c>
                     <c ca="center">
                        <p><it>g</it>(2,2)</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p><it>g</it>(2, <it>n</it>)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Class (2)</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>m</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>g</it>(<it>m</it>,1)</p>
                     </c>
                     <c ca="center">
                        <p><it>g</it>(<it>m</it>,2)</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p><it>g</it>(<it>m</it>, <it>n</it>)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Class (m)</it>
                        </p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>In rough sets, the <it>degree of dependency </it>of a set of attributes <it>Q </it>on another set of attributes <it>P </it>is denoted by <it>&#947;</it><sub><it>P</it></sub>(<it>Q</it>) and is defined as</p>
            <p>
               <display-formula>
                  <graphic file="1755-8794-2-64-i1.gif"/>
               </display-formula>
            </p>
            <p>Where <inline-formula><graphic file="1755-8794-2-64-i2.gif"/></inline-formula> represents the size of the union of the lower approximation of each equivalence class in <it>U</it>/<it>R</it>(<it>Q</it>) on <it>P </it>in <it>U</it>, and |<it>U</it>| represents the size of <it>U </it>(set of samples).</p>
            <p>If <it>Q </it>is the decision attribute <it>D</it>, and <it>P </it>is a subset of condition attributes, then <it>&#947;</it><sub><it>P</it></sub>(<it>D</it>) represents the <it>depended degree </it>of the condition attribute subset <it>P </it>by the decision attribute <it>D</it>; that is, to what degree <it>P </it>can discriminate the distinct classes of <it>D</it>. In this sense, <it>&#947;</it><sub><it>P</it></sub>(<it>D</it>) reflects the classification power of the subset <it>P </it>of attributes. The greater is <it>&#947;</it><sub><it>P</it></sub>(<it>D</it>), the stronger the classification ability <it>P </it>is inclined to possess. We chose the measure of the depended degree of condition attributes by class attributes as the basis for selecting informative genes.</p>
            <p>In contrast to other correlation-based feature selection standards such as t-score, the <it>depended degree </it>can be calculated only when the attribute values are discrete. Thus, for the studied microarray datasets, the discretization of gene expression values is an essential step. Indeed, the discretization will bring about several advantages. First, some unimportant genes will be found immediately after the discretization. When the discretized expression values of a gene are identical among all of the samples, we view the gene as being insignificant because distinct classes cannot be separated according to the gene's expression values. Second, when gene expression values are reduced to discrete states, the rules formed by the genes can be described naturally via the discretized data.</p>
            <p>However, for some datasets it is difficult to detect the discriminative features based on the depended degree because of its excessively rigid definition. In this case, we employed the <it>&#945; </it>depended degree proposed in <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> as the basis for choosing genes. The <it>&#945; depended degree </it>of an attribute subset <it>P </it>by the decision attribute <it>D </it>is defined as <inline-formula><graphic file="1755-8794-2-64-i3.gif"/></inline-formula>, where 0 &#8804; <it>&#945; </it>&#8804; 1, <inline-formula><graphic file="1755-8794-2-64-i4.gif"/></inline-formula> and <it>pos(P, X, <it>&#945;</it>) </it>= &#8746;{<it>Y </it>&#8712; <it>U/R</it>(<it>P</it>) | |<it>Y </it>&#8745; <it>X</it>|/|<it>Y</it>|&#8805; <it>&#945;</it>} <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. In fact, as indicated in <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, the depended degree is a specific case of the <it>&#945; </it>depended degree when <it>&#945; </it>= 1. In the case that the depended degree was largely ineffective as a basis on which to screen features, we employed the <it>&#945; </it>(0.7 &#8804; <it>&#945; </it>&lt; 1) depended degree.</p>
            <p>Inducing decision rules that are hiding in decision tables is one of the key tasks of rough sets, which is also an essential procedure of our classifier construction. One decision rule in the form of "A &#8658; B" indicates that "if <it>A</it>, then <it>B</it>," where <it>A </it>is the description of condition attributes and <it>B </it>the description of decision attributes. The <it>confidence </it>of a decision rule <it>A </it>&#8743; <it>B </it>is defined as follows: <inline-formula><graphic file="1755-8794-2-64-i5.gif"/></inline-formula>, where support(<it>A</it>) denotes the proportion of the samples satisfying <it>A </it>and where support(<it>A </it>&#8743; <it>B</it>) denotes the proportion of the samples satisfying <it>A </it>and <it>B </it>simultaneously. The confidence of a decision rule indicates the reliability of the rule. If a decision rule had 100% confidence, we called it a <it>consistent decision rule</it>. It is evident that if <it>&#947;</it><sub><it>P</it></sub>(<it>D</it>) equals 1, <it>P </it>&#8658; <it>D </it>must be a consistent decision rule. In contrast, <it>&#947;</it><sub><it>P</it></sub>(<it>D</it>, <it>&#945;</it>) = 1 does not mean that <it>P </it>&#8658; <it>D </it>must be a consistent decision rule.</p>
            <p>To ensure the reliability of the classification rules, we chose only the genes or gene pairs with <it>&#947;</it><sub><it>P</it></sub>(<it>D</it>) or <it>&#947;</it><sub><it>P</it></sub>(<it>D</it>, <it>&#945;</it>) equal to 1 when forming decision rules. Suppose <it>g </it>is one of the selected genes and <it>U </it>is the sample set. <it>U/R</it>(<it>g</it>) = {c<sub>1</sub>(<it>g</it>), c<sub>2</sub>(<it>g</it>), ..., c<sub>n</sub>(<it>g</it>)} represents the set of the equivalence class of samples induced by <it>R</it>(<it>g</it>). Two samples, s<sub>1 </sub>and s<sub>2</sub>, belong to the same equivalence class of <it>U/R</it>(<it>g</it>) if and only if they have the same value on <it>g</it>. In addition, we represented the set of the equivalence class of samples induced by <it>R</it>(<it>D</it>) as <it>U/R</it>(<it>D</it>) = {d<sub>1</sub>(<it>D</it>), d<sub>2</sub>(<it>D</it>), ..., d<sub>m</sub>(<it>D</it>)}, where <it>D </it>is the decision attribute. Likewise, two samples, s<sub>1 </sub>and s<sub>2</sub>, belong to the same equivalence class of <it>U/R</it>(<it>D</it>) if and only if they have the same value on <it>D</it>. For each c<sub>i</sub>(<it>g</it>) (i = 1, 2, ..., n), if there exists some d<sub>j</sub>(<it>D</it>) (j &#8712; {1, 2, ..., m}), satisfying c<sub>i</sub>(<it>g</it>) &#8838; d<sub>j</sub>(<it>D</it>) in light of the depended degree or |c<sub>i</sub>(<it>g</it>) &#8745; d<sub>j</sub>(<it>D</it>)|/|c<sub>i</sub>(<it>g</it>)|&#8805; <it>&#945; </it>in light of the <it>&#945; </it>depended degree, we then generated the following classification rule: <it>A</it>(c<sub>i</sub>(<it>g</it>)) &#8658; <it>B</it>(d<sub>j</sub>(<it>D</it>)), where <it>A</it>(c<sub>i</sub>(<it>g</it>)) is the formula describing the sample set c<sub>i</sub>(<it>g</it>) by the <it>g </it>value, and <it>B</it>(d<sub>j</sub>(<it>D</it>)) is the formula describing the sample set d<sub>j</sub>(<it>D</it>) by the class value. We used the same strategy to construct classification rules for gene pairs.</p>
            <p>In the case of the depended degree, each employed classification rule was the consistent decision rule. However, in the case of the <it>&#945; </it>depended degree, the classification rules may not have been consistent, yet the confidence of every classification rule must be no less than <it>&#945;</it>, as proven in <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Hence, if we specified a large enough <it>&#945; </it>threshold, the confidence of classification rules would have been sufficiently high.</p>
         </sec>
         <sec>
            <st>
               <p>Datasets</p>
            </st>
            <sec>
               <st>
                  <p>Leukemia dataset 1 (ALL vs. AML)</p>
               </st>
               <p>The first dataset we analyzed was the well-known leukemia data studied by Golub et al. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, which has been explored widely by many researchers. In this dataset, there are 72 observations, each of which is described by the gene expression levels of 7129 genes and a class attribute with two distinct labels: AML vs. ALL. The 72 observations are divided into a training set with 38 samples (27 ALL, 11 AML) and a test set with 34 samples (20 ALL, 14 AML).</p>
            </sec>
            <sec>
               <st>
                  <p>Lung Cancer dataset</p>
               </st>
               <p>The Lung Cancer dataset is a classification of malignant pleural mesothelioma (MPM) vs. adenocarcinoma (ADCA) of the lung <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, and consists of 181 tissue samples (31 MPM, 150 ADCA). The training set contains 32 of the samples (16 MPM vs. 16 ADCA); the remaining 149 samples are used for testing. Each sample is described by 12,533 genes.</p>
            </sec>
            <sec>
               <st>
                  <p>Prostate Cancer dataset</p>
               </st>
               <p>The Prostate Cancer dataset is concerned with prostate tumor vs. normal classification. The training set contains 52 prostate tumor samples and 50 non-tumor prostate samples <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>; the total number of genes is 12,600. Two classes are denoted as "Tumor" and "Normal." The test set samples were from a different experiment and have a nearly 10-fold difference in overall microarray intensity compared with the training data. We made use of the test set provided by Kent Ridge Bio-medical Data Set Repository, which includes 25 tumor and 9 normal samples.</p>
            </sec>
            <sec>
               <st>
                  <p>Breast Cancer dataset</p>
               </st>
               <p>This dataset is concerned with the prediction of patient outcome for breast cancer <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The training set contains 78 patient samples, 34 of which are from patients who had developed distant metastases within 5 years ("relapse"); the remaining 44 samples are from patients who remained healthy from the disease for an interval of at least 5 years after initial diagnosis ("non-relapse"). There are 12 relapse and 7 non-relapse samples in the test set, and the number of genes is 24,481.</p>
            </sec>
            <sec>
               <st>
                  <p>Leukemia dataset 2 (ALL vs. MLL vs. AML)</p>
               </st>
               <p>This dataset is about subtype prediction for leukemia <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The training set contains 57 samples (20 ALL, 17 MLL, and 20 AML), while the testing set contains 15 samples (4 ALL, 3 MLL, and 8 AML). The number of genes is 12,582.</p>
               <p>The gene number, class, training sample number and test sample number contained in the five datasets are listed in Table <tblr tid="T2">2</tblr>.</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>Summary of the five gene expression datasets</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c ca="left">
                           <p>
                              <b>Dataset</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Original genes</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Class</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Training samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Test samples</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Leukemia 1</p>
                        </c>
                        <c ca="center">
                           <p>7129</p>
                        </c>
                        <c ca="center">
                           <p>ALL/AML</p>
                        </c>
                        <c ca="center">
                           <p>38 (27/11)</p>
                        </c>
                        <c ca="center">
                           <p>34 (20/14)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Lung Cancer</p>
                        </c>
                        <c ca="center">
                           <p>12533</p>
                        </c>
                        <c ca="center">
                           <p>MPM/ADCA</p>
                        </c>
                        <c ca="center">
                           <p>32 (16/16)</p>
                        </c>
                        <c ca="center">
                           <p>149 (15/134)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Prostate Cancer</p>
                        </c>
                        <c ca="center">
                           <p>12600</p>
                        </c>
                        <c ca="center">
                           <p>Tumor/Normal</p>
                        </c>
                        <c ca="center">
                           <p>102 (52/50)</p>
                        </c>
                        <c ca="center">
                           <p>34 (25/9)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Breast Cancer</p>
                        </c>
                        <c ca="center">
                           <p>24481</p>
                        </c>
                        <c ca="center">
                           <p>relapse/non-relapse</p>
                        </c>
                        <c ca="center">
                           <p>78 (34/44)</p>
                        </c>
                        <c ca="center">
                           <p>19 (12/7)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Leukemia 2</p>
                        </c>
                        <c ca="center">
                           <p>12582</p>
                        </c>
                        <c ca="center">
                           <p>ALL/MLL/AML</p>
                        </c>
                        <c ca="center">
                           <p>57 (20/17/20)</p>
                        </c>
                        <c ca="center">
                           <p>15 (4/3/8)</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Data preprocessing</p>
            </st>
            <sec>
               <st>
                  <p>Normalization of attributes value</p>
               </st>
               <p>Because the training set samples and the test set samples in the prostate cancer dataset are from two different experiments, and because discrepancies in microarray intensity exist between the two sets of samples, we normalized both the training set and the test set. Suppose that the original expression level of gene <it>y </it>in sample <it>x </it>is <it>g</it>(<it>x</it>, <it>y</it>). Then, the normalized value of <it>g</it>(<it>x</it>, <it>y</it>) is <inline-formula><graphic file="1755-8794-2-64-i6.gif"/></inline-formula>, where max g(&#8226;, <it>y</it>) and min g(&#8226;, <it>y</it>) represent the maximum and the minimum expression levels of gene <it>y </it>in all of the samples, respectively. After normalization, all of the expression levels of the genes lie within the interval [-1, 1]. As a result, we can apply the rules induced in the training set to the test set. Because the training set samples and the test set samples in the other datasets are from the same experiments, we chose not to normalize these data to avoid any loss of information.</p>
            </sec>
            <sec>
               <st>
                  <p>Discretization of decision tables</p>
               </st>
               <p>Because rough sets is suitable for handling discrete attributes, we needed to first discretize the training set decision tables. We used the entropy-based discretization method, as first proposed by Fayyad et al. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. This algorithm recursively applies an entropy minimization heuristic to discretize the continuous-valued attributes. The stop of the recursive step for this algorithm depends on the minimum description length (MDL) principle. We implemented the discretization in the Weka package <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. After the discretization, the majority of attributes contained at most two distinct values, while a small number of attributes contained three or four distinct values. We executed our learning algorithm in the discretized decision tables.</p>
            </sec>
            <sec>
               <st>
                  <p>Feature selection, classifier construction, and validation</p>
               </st>
               <p>For the Leukemia 1 and Lung Cancer datasets, we conducted feature selection by the depended degree, while for the Prostate Cancer, Breast Cancer and Leukemia 2 datasets, we implemented feature selection by the <it>&#945; </it>depended degree. For each dataset, we employed the LOOCV approach for the training set to identify high class-discrimination genes or gene pairs. That is, in the training set containing <it>n </it>samples, each sample is left out in turn, and the learning algorithm is trained on the remaining <it>n-1 </it>samples. Then, the training result is tested on the left-out sample. The final estimate is the average of <it>n </it>test results. We emphasize that only the single genes or gene pairs chosen by all of the leave-one-out training sets are used for LOOCV. In other words, when the depended degree standard is utilized, only those genes or gene pairs with a 100% depended degree in all leave-one-out training sets are selected; when the <it>&#945; </it>depended degree standard is used, only the genes and gene pairs satisfying <it>&#947;</it><sub><it>P</it></sub>(<it>D</it>, <it>&#945;</it>) = 1 in all of the leave-one-out training sets are chosen. According to the results of LOOCV, we finally determined the informative genes or gene pairs. Applying the classification rules induced by the single genes or gene pairs in the entire training set to classify the independent test set, we further verified their classification performance.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Classification results</p>
            </st>
            <sec>
               <st>
                  <p>Leukemia dataset 1</p>
               </st>
               <p>In this dataset, we first selected informative single genes. Among the 7129 genes, only gene #4847 had a 100% depended degree in all leave-one-out training sets. We denoted the expression level of gene x by g(x). The decision rules induced by gene #4847 in every leave-one-out training set are of the following form: if g(#4847) > t, then AML; if g(#4847) &#8804; t, then ALL, where t is equal or close to 994. One can apply the decision rules to classify the left-out sample. The final LOOCV accuracy resulting from the gene was 97.4%, with 37 of the 38 samples classified correctly, wherein all of the 27 ALL samples were classified correctly, and one AML sample was misclassified. Subsequently, we examined the depended degree of the gene in the whole training set of 38 samples. As expected, the gene had a 100% depended degree in the training set. The two consistent decision rules generated by this gene were as follows: if g(#4847) > 994, then AML; if g(#4847) &#8804; 994, then ALL. One can use the above rules to classify the independent test set with 91.2% classification accuracy. Among the 34 samples, 31 were classified correctly and 3 were classified incorrectly: 2 ALL samples were misclassified into AML, and 1 ALL sample was misclassified into AML.</p>
               <p>Next, we searched for informative gene pairs. Because there are 7129 genes, the combination number would be huge if all were taken into account. Therefore, for each leave-one-out training set, only the genes with more than 18/37 depended degree were considered in forming gene pairs (excluding the aforementioned gene #4847). As a result, 350 gene pairs were found to possess a 100% depended degree in all leave-one-out training sets. Every gene pair was capable of inducing four consistent decision rules, which were used for classification. We set the threshold of LOOCV accuracy such that at least 35 of the 38 samples were classified correctly. Accordingly, 347 gene pairs satisfied the condition. Likewise, using the decision rules induced by the gene pairs in the whole training set to classify the test set, we detected 13 gene pairs with no less than 32 test samples classified correctly (at most, 2 errors). Table <tblr tid="T3">3</tblr> lists data for these 13 pairs of genes. In this table, the classification results regarding LOOCV and the test set are shown in terms of both the number of correctly classified samples and accuracy. The results with respect to every class are presented in parentheses, and the optimal results are formatted in boldface.</p>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>Thirteen gene pairs with high classification accuracy in the Leukemia dataset 1</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c ca="left">
                           <p>
                              <b>1st - 2nd Probe ID</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in LOOCV</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in the test set</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>U46499_at - M92287_at</p>
                        </c>
                        <c ca="center">
                           <p>35 (26/9)</p>
                        </c>
                        <c ca="center">
                           <p>92.11 (96.30/81.82)</p>
                        </c>
                        <c ca="center">
                           <p>33 (20/13)</p>
                        </c>
                        <c ca="center">
                           <p>97.06 (100/92.86)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>U46499_at - M12959_s_at</p>
                        </c>
                        <c ca="center">
                           <p>36 (27/9)</p>
                        </c>
                        <c ca="center">
                           <p>94.74 (100/81.82)</p>
                        </c>
                        <c ca="center">
                           <p><b>34 </b>(20/14)</p>
                        </c>
                        <c ca="center">
                           <p><b>100 </b>(100/100)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>U46499_at - D63880_at</p>
                        </c>
                        <c ca="center">
                           <p>36 (27/9)</p>
                        </c>
                        <c ca="center">
                           <p>94.74 (100/81.82)</p>
                        </c>
                        <c ca="center">
                           <p>33 (20/13)</p>
                        </c>
                        <c ca="center">
                           <p>97.06 (100/92.86)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>U46499_at - S50223_at</p>
                        </c>
                        <c ca="center">
                           <p><b>37 </b>(27/10)</p>
                        </c>
                        <c ca="center">
                           <p><b>97.37 </b>(100/90.91)</p>
                        </c>
                        <c ca="center">
                           <p>33 (19/14)</p>
                        </c>
                        <c ca="center">
                           <p>97.06 (95/100)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>U46499_at - Z15115_at</p>
                        </c>
                        <c ca="center">
                           <p>35 (26/9)</p>
                        </c>
                        <c ca="center">
                           <p>92.11(96.30/81.82)</p>
                        </c>
                        <c ca="center">
                           <p>33 (20/13)</p>
                        </c>
                        <c ca="center">
                           <p>97.06 (100/92.86)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>L09209_s_at - M92287_at</p>
                        </c>
                        <c ca="center">
                           <p><b>37 </b>(27/10)</p>
                        </c>
                        <c ca="center">
                           <p><b>97.37 </b>(100/90.91)</p>
                        </c>
                        <c ca="center">
                           <p>33 (20/13)</p>
                        </c>
                        <c ca="center">
                           <p>97.06 (100/92.86)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>L09209_s_at - S50223_at</p>
                        </c>
                        <c ca="center">
                           <p><b>37 </b>(27/10)</p>
                        </c>
                        <c ca="center">
                           <p><b>97.37 </b>(100/90.91)</p>
                        </c>
                        <c ca="center">
                           <p>33 (19/14)</p>
                        </c>
                        <c ca="center">
                           <p>97.06 (95/100)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>X61587_at - M92287_at</p>
                        </c>
                        <c ca="center">
                           <p>36 (26/10)</p>
                        </c>
                        <c ca="center">
                           <p>94.74 (96.30/90.91)</p>
                        </c>
                        <c ca="center">
                           <p>33 (20/13)</p>
                        </c>
                        <c ca="center">
                           <p>97.06 (100/92.86)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>X61587_at - M12959_s_at</p>
                        </c>
                        <c ca="center">
                           <p><b>37 </b>(27/10)</p>
                        </c>
                        <c ca="center">
                           <p><b>97.37 </b>(100/90.91)</p>
                        </c>
                        <c ca="center">
                           <p>33 (19/14)</p>
                        </c>
                        <c ca="center">
                           <p>97.06 (95/100)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>L09209_s_at - D63880_at</p>
                        </c>
                        <c ca="center">
                           <p><b>37 </b>(27/10)</p>
                        </c>
                        <c ca="center">
                           <p><b>97.37 </b>(100/90.91)</p>
                        </c>
                        <c ca="center">
                           <p>32 (19/13)</p>
                        </c>
                        <c ca="center">
                           <p>94.12 (95/92.86)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>U05259_rna1_at - M92287_at</p>
                        </c>
                        <c ca="center">
                           <p>36 (26/10)</p>
                        </c>
                        <c ca="center">
                           <p>94.74 (96.30/90.91)</p>
                        </c>
                        <c ca="center">
                           <p>32 (20/12)</p>
                        </c>
                        <c ca="center">
                           <p>94.12 (100/100)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>L09209_s_at - X59417_at</p>
                        </c>
                        <c ca="center">
                           <p><b>37 </b>(27/10)</p>
                        </c>
                        <c ca="center">
                           <p><b>97.37 </b>(100/90.91)</p>
                        </c>
                        <c ca="center">
                           <p>32 (19/13)</p>
                        </c>
                        <c ca="center">
                           <p>94.12 (95/92.86)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>L09209_s_at - Z15115_at</p>
                        </c>
                        <c ca="center">
                           <p><b>37 </b>(27/10)</p>
                        </c>
                        <c ca="center">
                           <p><b>97.37 </b>(100/90.91)</p>
                        </c>
                        <c ca="center">
                           <p>32 (19/13)</p>
                        </c>
                        <c ca="center">
                           <p>94.12 (95/92.86)</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>Among the 13 gene pairs, the combination #3252-#6167 possessed 100% classification accuracy on the test set. The decision rules produced by the gene pair were as follows:</p>
               <p indent="1">&#8226; if g(#3252) &#8804; 156.5 and g(#6167) > 820.5, then ALL;</p>
               <p indent="1">&#8226; if g(#3252) &#8804; 156.5 and g(#6167) &#8804; 820.5, then ALL;</p>
               <p indent="1">&#8226; if g(#3252) > 156.5 and g(#6167) > 820.5, then ALL;</p>
               <p indent="1">&#8226; if g(#3252) > 156.5 and g(#6167) &#8804; 820.5, then AML.</p>
               <p>The above rules were then simplified into three equivalent rules:</p>
               <p indent="1">&#8226; if g(#3252) &#8804; 156.5, then ALL;</p>
               <p indent="1">&#8226; if g(#6167) > 820.5, then ALL;</p>
               <p indent="1">&#8226; if g(#3252) > 156.5 and g(#6167) &#8804; 820.5, then AML.</p>
               <p>These three rules are fairly simple and easily understood. Using these rules, we classified the test set without any errors. The rules derived from the other 12 gene pairs are provided in the Additional file <supplr sid="S1">1</supplr>, and we also provide information on the top 87 genes in the training set with depended degrees of no less than 0.5 in the Additional file <supplr sid="S2">2</supplr>.</p>
               <suppl id="S1">
                  <title>
                     <p>Additional file 1</p>
                  </title>
                  <text>
                     <p>
                        <b>The rules derived from each of the 12 gene pairs identified in the Leukemia dataset 1.</b>
                     </p>
                  </text>
                  <file name="1755-8794-2-64-S1.txt">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <suppl id="S2">
                  <title>
                     <p>Additional file 2</p>
                  </title>
                  <text>
                     <p>
                        <b>The top 87 genes with depended degrees of no less than 0.5 in the training set of the Leukemia dataset 1.</b>
                     </p>
                  </text>
                  <file name="1755-8794-2-64-S2.xls">
                     <p>Click here for file</p>
                  </file>
               </suppl>
            </sec>
            <sec>
               <st>
                  <p>Lung Cancer dataset</p>
               </st>
               <p>This dataset contained 16 genes with a 100% depended degree in all of the 32 leave-one-out training sets. The LOOCV accuracy of the 16 genes was between 93.75% and 100%. Namely, the number of correctly classified samples ranged from 30 to 32. In the training set, each of the 16 genes had a 100% depended degree. These observations indicate that each single gene among the 16 genes was likely to have high class-discriminative power in the training set. Using the rules generated by these single genes, we examined the test set. As expected, these genes showed high classification performance, with classification accuracy ranging from 79% to 97%. The classification results are presented in Table <tblr tid="T4">4</tblr>, which shows that some of the genes in the Lung Cancer dataset, such as gene 37716_at, have impressive classification performance. The rules induced by gene 37716_at were the following: if g(37716_at) > 197.75, then mesothelioma; if g(37716_at) &#8804; 197.75, then ADCA. Using these two rules, we could classify the test set with 97% accuracy. The rules produced by the 16 genes are provided in the Additional file <supplr sid="S3">3</supplr>. From these rules, we suspected that 2047_s_at, 2266_s_at, 32046_at, 33245_at, 41286_at, 41402_at, 575_s_at, and 988_at have higher expression levels in ADCA, while the others have higher expression levels in mesothelioma.</p>
               <suppl id="S3">
                  <title>
                     <p>Additional file 3</p>
                  </title>
                  <text>
                     <p>
                        <b>The rules produced by each of the 16 genes and 25 gene pairs identified in the Lung Cancer dataset.</b>
                     </p>
                  </text>
                  <file name="1755-8794-2-64-S3.txt">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <tbl id="T4">
                  <title>
                     <p>Table 4</p>
                  </title>
                  <caption>
                     <p>Sixteen genes with high classification accuracy in the Lung Cancer dataset</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c ca="left">
                           <p>
                              <b>Probe ID</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in LOOCV</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in the test set</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy(%)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy(%)</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>2047_s_at</p>
                        </c>
                        <c ca="center">
                           <p>30 (15/15)</p>
                        </c>
                        <c ca="center">
                           <p>93.75 (93.75/93.75)</p>
                        </c>
                        <c ca="center">
                           <p>122 (11/111)</p>
                        </c>
                        <c ca="center">
                           <p>81.88 (73.33/82.84)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>266_s_at</p>
                        </c>
                        <c ca="center">
                           <p><b>32 </b>(16/16)</p>
                        </c>
                        <c ca="center">
                           <p><b>100 </b>(100/100)</p>
                        </c>
                        <c ca="center">
                           <p>129 (13/116)</p>
                        </c>
                        <c ca="center">
                           <p>86.58 (86.67/86.57)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>32046_at</p>
                        </c>
                        <c ca="center">
                           <p>30 (15/15)</p>
                        </c>
                        <c ca="center">
                           <p>93.75 (93.75/93.75)</p>
                        </c>
                        <c ca="center">
                           <p>133 (12/121)</p>
                        </c>
                        <c ca="center">
                           <p>89.26 (80/90.30)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>32551_at</p>
                        </c>
                        <c ca="center">
                           <p>31 (15/16)</p>
                        </c>
                        <c ca="center">
                           <p>96.88 (93.75/100)</p>
                        </c>
                        <c ca="center">
                           <p>134 (14/120)</p>
                        </c>
                        <c ca="center">
                           <p>89.93 (93.33/89.55)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>33245_at</p>
                        </c>
                        <c ca="center">
                           <p>30 (15/15)</p>
                        </c>
                        <c ca="center">
                           <p>93.75 (93.75/93.75)</p>
                        </c>
                        <c ca="center">
                           <p>137 (14/123)</p>
                        </c>
                        <c ca="center">
                           <p>91.95 (93.33/91.79)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>33833_at</p>
                        </c>
                        <c ca="center">
                           <p><b>32 </b>(16/16)</p>
                        </c>
                        <c ca="center">
                           <p><b>100 </b>(100/100)</p>
                        </c>
                        <c ca="center">
                           <p>139 (13/126)</p>
                        </c>
                        <c ca="center">
                           <p>93.29 (86.67/94.03)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>35330_at</p>
                        </c>
                        <c ca="center">
                           <p>31 (15/16)</p>
                        </c>
                        <c ca="center">
                           <p>96.88 (93.75/100)</p>
                        </c>
                        <c ca="center">
                           <p>118 (14/104)</p>
                        </c>
                        <c ca="center">
                           <p>79.19 (93.33/77.61)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>36533_at</p>
                        </c>
                        <c ca="center">
                           <p>30 (15/15)</p>
                        </c>
                        <c ca="center">
                           <p>93.75 (93.75/93.75)</p>
                        </c>
                        <c ca="center">
                           <p>141 (13/128)</p>
                        </c>
                        <c ca="center">
                           <p>94.64 (86.67/95.52)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>37205_at</p>
                        </c>
                        <c ca="center">
                           <p>30 (15/15)</p>
                        </c>
                        <c ca="center">
                           <p>93.75 (93.75/93.75)</p>
                        </c>
                        <c ca="center">
                           <p>135 (12/123)</p>
                        </c>
                        <c ca="center">
                           <p>90.60 (80/91.79)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>37716_at</p>
                        </c>
                        <c ca="center">
                           <p>30 (15/15)</p>
                        </c>
                        <c ca="center">
                           <p>93.75 (93.75/93.75)</p>
                        </c>
                        <c ca="center">
                           <p><b>145 </b>(11/134)</p>
                        </c>
                        <c ca="center">
                           <p><b>97.32 </b>(73.33/100)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>39795_at</p>
                        </c>
                        <c ca="center">
                           <p>31 (16/15)</p>
                        </c>
                        <c ca="center">
                           <p>96.88 (100/93.75)</p>
                        </c>
                        <c ca="center">
                           <p>135 (14/121)</p>
                        </c>
                        <c ca="center">
                           <p>90.60 (93.33/90.30)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>40936_at</p>
                        </c>
                        <c ca="center">
                           <p>31 (15/16)</p>
                        </c>
                        <c ca="center">
                           <p>96.88 (93.75/100)</p>
                        </c>
                        <c ca="center">
                           <p>140 (12/128)</p>
                        </c>
                        <c ca="center">
                           <p>93.96 (80/95.52)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>41286_at</p>
                        </c>
                        <c ca="center">
                           <p>30 (15/15)</p>
                        </c>
                        <c ca="center">
                           <p>93.75 (93.75/93.75)</p>
                        </c>
                        <c ca="center">
                           <p>121 (13/108)</p>
                        </c>
                        <c ca="center">
                           <p>81.21 (86.67/80.60)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>41402_at</p>
                        </c>
                        <c ca="center">
                           <p>31 (16/15)</p>
                        </c>
                        <c ca="center">
                           <p>96.88 (100/93.75)</p>
                        </c>
                        <c ca="center">
                           <p>123 (13/110)</p>
                        </c>
                        <c ca="center">
                           <p>82.55 (86.67/82.09)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>575_s_at</p>
                        </c>
                        <c ca="center">
                           <p><b>32 </b>(16/16)</p>
                        </c>
                        <c ca="center">
                           <p><b>100 </b>(100/100)</p>
                        </c>
                        <c ca="center">
                           <p>141 (14/127)</p>
                        </c>
                        <c ca="center">
                           <p>94.64 (93.33/94.78)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>988_at</p>
                        </c>
                        <c ca="center">
                           <p>30 (15/15)</p>
                        </c>
                        <c ca="center">
                           <p>93.75 (93.75/93.75)</p>
                        </c>
                        <c ca="center">
                           <p>132 (13/119)</p>
                        </c>
                        <c ca="center">
                           <p>88.59 (86.67/88.81)</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>If more than one gene is considered when developing rules, higher classification accuracy should be achieved. Therefore, we carried out further classification tests using gene pairs. As before, we tried to find the gene pairs with high LOOCV accuracy. To avoid combination explosion, to constitute gene pairs we only selected genes with more than 12/31 and less than 100% depended degree in all 32 leave-one-out training sets. Furthermore, to avoid intricate classification rules produced by gene pairs, we excluded genes with more than two distinct discretized values. Accordingly, we found 82 gene pairs with a 100% depended degree in all 32 leave-one-out training sets. Among them, 25 pairs possessed 100% LOOCV accuracy. These pairs also had comparatively strong classification power in the test set. Their classification accuracy was between 71.14% and 96.64%; 21 pairs showed accuracy exceeding 80%, and nine pairs had accuracy exceeding 90%. Data for these 25 gene pairs are listed in Table <tblr tid="T5">5</tblr>. The classification rules induced by these pairs are presented in the Additional file <supplr sid="S3">3</supplr>.</p>
               <tbl id="T5">
                  <title>
                     <p>Table 5</p>
                  </title>
                  <caption>
                     <p>Twenty-five gene pairs with 100% LOOCV accuracy in the Lung Cancer dataset</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="left">
                           <p>
                              <b>1st - 2nd Probe ID</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in the test set</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>33754_at - 36562_at</p>
                        </c>
                        <c ca="center">
                           <p><b>144 </b>(13/131)</p>
                        </c>
                        <c ca="center">
                           <p><b>96.64 </b>(86.67/97.76)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>33754_at - 40496_at</p>
                        </c>
                        <c ca="center">
                           <p>143 (11/132)</p>
                        </c>
                        <c ca="center">
                           <p>95.97 (73.33/98.51)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>34105_f_at - 40496_at</p>
                        </c>
                        <c ca="center">
                           <p>141(9/132)</p>
                        </c>
                        <c ca="center">
                           <p>94.64 (60/98.51)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>34105_f_at - 36562_at</p>
                        </c>
                        <c ca="center">
                           <p>140 (10/130)</p>
                        </c>
                        <c ca="center">
                           <p>93.96 (66.67/97.01)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>37004_at - 40496_at</p>
                        </c>
                        <c ca="center">
                           <p>140 (11/129)</p>
                        </c>
                        <c ca="center">
                           <p>93.96 (73.33/96.27)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>36562_at - 37004_at</p>
                        </c>
                        <c ca="center">
                           <p>139 (13/126)</p>
                        </c>
                        <c ca="center">
                           <p>93.29 (86.67/94.03)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>38827_at - 40445_at</p>
                        </c>
                        <c ca="center">
                           <p>138 (15/123)</p>
                        </c>
                        <c ca="center">
                           <p>92.62 (100/91.79)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1882_g_at - 36562_at</p>
                        </c>
                        <c ca="center">
                           <p>136 (11/125)</p>
                        </c>
                        <c ca="center">
                           <p>91.28 (73.33/93.28)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1882_g_at - 40496_at</p>
                        </c>
                        <c ca="center">
                           <p>136 (10/126)</p>
                        </c>
                        <c ca="center">
                           <p>91.28 (66.67/94.03)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>33907_at - 36562_at</p>
                        </c>
                        <c ca="center">
                           <p>134 (10/124)</p>
                        </c>
                        <c ca="center">
                           <p>89.93 (66.67/92.54)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>36562_at - 40496_at</p>
                        </c>
                        <c ca="center">
                           <p>134 (9/125)</p>
                        </c>
                        <c ca="center">
                           <p>89.93 (60/93.28)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1882_g_at - 33907_at</p>
                        </c>
                        <c ca="center">
                           <p>133 (11/122)</p>
                        </c>
                        <c ca="center">
                           <p>89.26 (73.33/91.04)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1882_g_at - 37004_at</p>
                        </c>
                        <c ca="center">
                           <p>132 (13/119)</p>
                        </c>
                        <c ca="center">
                           <p>88.59 (86.67/88.81)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>35947_at - 36269_at</p>
                        </c>
                        <c ca="center">
                           <p>132 (12/120)</p>
                        </c>
                        <c ca="center">
                           <p>88.59 (80/89.55)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>33907_at - 34105_f_at</p>
                        </c>
                        <c ca="center">
                           <p>131(9/122)</p>
                        </c>
                        <c ca="center">
                           <p>87.92 (60/91.04)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>36269_at - 40445_at</p>
                        </c>
                        <c ca="center">
                           <p>131(14/117)</p>
                        </c>
                        <c ca="center">
                           <p>87.92 (93.33/87.31)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>35947_at - 40445_at</p>
                        </c>
                        <c ca="center">
                           <p>130 (14/116)</p>
                        </c>
                        <c ca="center">
                           <p>87.25 (93.33/86.57)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>38074_at - 38827_at</p>
                        </c>
                        <c ca="center">
                           <p>129 (14/115)</p>
                        </c>
                        <c ca="center">
                           <p>86.58 (93.33/85.82)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>33907_at - 40496_at</p>
                        </c>
                        <c ca="center">
                           <p>127(8/119)</p>
                        </c>
                        <c ca="center">
                           <p>85.23 (53.33/88.81)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>36269_at - 38074_at</p>
                        </c>
                        <c ca="center">
                           <p>125 (13/112)</p>
                        </c>
                        <c ca="center">
                           <p>83.89 (86.67/83.58)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>38074_at - 40445_at</p>
                        </c>
                        <c ca="center">
                           <p>122 (13/109)</p>
                        </c>
                        <c ca="center">
                           <p>81.88 (86.67/81.34)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1117_at - 38827_at</p>
                        </c>
                        <c ca="center">
                           <p>116 (15/101)</p>
                        </c>
                        <c ca="center">
                           <p>77.85 (100/75.37)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1117_at - 36269_at</p>
                        </c>
                        <c ca="center">
                           <p>113 (13/100)</p>
                        </c>
                        <c ca="center">
                           <p>75.84 (86.67/74.63)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1117_at - 35947_at</p>
                        </c>
                        <c ca="center">
                           <p>109 (12/97)</p>
                        </c>
                        <c ca="center">
                           <p>73.15 (80/72.39)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1117_at - 38074_at</p>
                        </c>
                        <c ca="center">
                           <p>106 (14/92)</p>
                        </c>
                        <c ca="center">
                           <p>71.14 (93.33/68.66)</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>To observe the relationship between the depended degrees of single genes and the classification accuracy of gene pairs, we carried out another experiment. In the discretized training set, we first excluded the genes with depended degrees 0 and 100%, as well as the genes with above two distinct values. As a result, there were 1428 genes left for pair combination. We set the threshold number of correctly classified samples as 148; that is, we searched for the gene pairs by which the test set are classified with at most one error. In addition, we set another threshold <it>k</it>, and required that the sizes of the positive regions caused by the selected genes must exceed <it>k</it>, with <it>k </it>varying from 13 to 0. When <it>k </it>equals 13, 61 genes are selected, and 743 pair combinations have 100% depended degree. Using the rules derived from each of the 743 gene pairs to classify the test set, we detected 4 combinations with 148 samples classified correctly. When <it>k </it>was 12, 11, and 10, only the same four combinations were found. When <it>k </it>decreased to 9 and 8, five and seven combinations were found, respectively. At lower values, no more combinations were found to classify 148 samples or more correctly, even when <it>k </it>was reduced to 0, and the selected gene number is 1428 accompanied by 33,390 combinations with a 100% depended degree. The results indicate that combinations between genes with higher depended degrees are more likely to produce accurate classification.</p>
               <p>To explore whether the combinations between the genes with 100% depended degrees and other genes with lower depended degrees would yield more gene pairs having no less than 148 samples classified correctly, we added the 16 genes with a 100% depended degree to the 1428 genes and repeated the above experiment. Surprisingly, the results were exactly the same as those of the first experiment; i.e., no new gene pair was found. This finding indicates that to obtain perfect classification performance by combined genes, although the class-discrimination ability of individual genes is important, the mutual information complement between individual genes might also be crucial. Additional details regarding this experiment are provided in Table S1 of the Additional file <supplr sid="S4">4</supplr>.</p>
               <suppl id="S4">
                  <title>
                     <p>Additional file 4</p>
                  </title>
                  <text>
                     <p>
                        <b>The experimental results and the seven gene pairs with high classification accuracy in the test set of the Lung Cancer dataset, identified without LOOCV.</b>
                     </p>
                  </text>
                  <file name="1755-8794-2-64-S4.doc">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <p>Table S2 of the Additional file <supplr sid="S4">4</supplr> shows the most seven pair combinations found in the experiment. Each of the seven gene pairs generates four rules, which can be simplified into three equivalent rules. The rules can be used to correctly classify 148 of 149 samples in the test set, with only one error (one mesothelioma was misclassified as ADCA). The detailed rules formed by the seven pairs of genes are presented in the Additional file <supplr sid="S3">3</supplr>.</p>
            </sec>
            <sec>
               <st>
                  <p>Prostate Cancer dataset</p>
               </st>
               <p>Because of differences in microarray intensity between the training set and the test set, we first normalized the attribute values for both sets. Every attribute value was normalized to a number between -1 and 1. In this dataset, if the depended degree standard is employed for gene selection, it is somewhat difficult to find authentically discriminative genes, as no gene has a 100% depended degree, and the highest depended degree in the training set is 36%. Therefore, we utilized the <it>&#945; </it>depended degree as the criterion for gene selection. For <it>&#945; </it>&#8805; 0.9, no common gene was detected among all of the 102 leave-one-out training sets; when <it>&#945; </it>= 0.85, gene #10493 was found; when <it>&#945; </it>= 0.80, nine genes were found. Of these nine genes, we excluded gene #5261 with three distinct values, and calculated the LOOCV accuracy of the other eight genes. Relatively high LOOCV outcomes were obtained. Applying the decision rules induced by each of the eight genes in the training set, we classified the test set and achieved satisfactory classification results (see Table <tblr tid="T6">6</tblr>). The classification rules generated by the eight genes are presented in the Additional file <supplr sid="S5">5</supplr>.</p>
               <suppl id="S5">
                  <title>
                     <p>Additional file 5</p>
                  </title>
                  <text>
                     <p>
                        <b>The classification rules generated by each of the eight genes and three gene pairs identified in the Prostate Cancer dataset.</b>
                     </p>
                  </text>
                  <file name="1755-8794-2-64-S5.txt">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <tbl id="T6">
                  <title>
                     <p>Table 6</p>
                  </title>
                  <caption>
                     <p>Eight genes with high classification accuracy in the Prostate Cancer dataset</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c ca="left">
                           <p>
                              <b>Probe ID</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in LOOCV</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in the test set</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>&#945;</it>
                              </b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="4">
                           <hr/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>32598_at</p>
                        </c>
                        <c ca="center">
                           <p><b>92 </b>(50/42)</p>
                        </c>
                        <c ca="center">
                           <p><b>90.20 </b>(96.15/84.00)</p>
                        </c>
                        <c ca="center">
                           <p>23 (17/6)</p>
                        </c>
                        <c ca="center">
                           <p>67.65 (68.00/66.67)</p>
                        </c>
                        <c ca="center">
                           <p>0.85</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>36491_at</p>
                        </c>
                        <c ca="center">
                           <p>84 (41/43)</p>
                        </c>
                        <c ca="center">
                           <p>82.35 (78.85/86.00)</p>
                        </c>
                        <c ca="center">
                           <p>30 (23/7)</p>
                        </c>
                        <c ca="center">
                           <p>88.24 (92.00/77.78)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>40856_at</p>
                        </c>
                        <c ca="center">
                           <p>85 (46/39)</p>
                        </c>
                        <c ca="center">
                           <p>83.33 (88.46/78.00)</p>
                        </c>
                        <c ca="center">
                           <p>23 (15/8)</p>
                        </c>
                        <c ca="center">
                           <p>67.65 (60.00/88.89)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>32243_g_at</p>
                        </c>
                        <c ca="center">
                           <p>84 (41/43)</p>
                        </c>
                        <c ca="center">
                           <p>82.35 (78.85/86.00)</p>
                        </c>
                        <c ca="center">
                           <p><b>31 </b>(22/9)</p>
                        </c>
                        <c ca="center">
                           <p><b>91.18 </b>(88.00/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>36601_at</p>
                        </c>
                        <c ca="center">
                           <p>85 (46/39)</p>
                        </c>
                        <c ca="center">
                           <p>83.33 (88.46/78.00)</p>
                        </c>
                        <c ca="center">
                           <p>17 (8/9)</p>
                        </c>
                        <c ca="center">
                           <p>50.00 (32.00/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>38044_at</p>
                        </c>
                        <c ca="center">
                           <p>81 (41/40)</p>
                        </c>
                        <c ca="center">
                           <p>79.41 (78.85/80.00)</p>
                        </c>
                        <c ca="center">
                           <p>29 (21/8)</p>
                        </c>
                        <c ca="center">
                           <p>85.29 (84.00/88.89)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>41288_at</p>
                        </c>
                        <c ca="center">
                           <p>88 (41/47)</p>
                        </c>
                        <c ca="center">
                           <p>86.27 (78.85/94.00)</p>
                        </c>
                        <c ca="center">
                           <p><b>31 </b>(22/9)</p>
                        </c>
                        <c ca="center">
                           <p><b>91.18 </b>(88.00/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1767_s_at</p>
                        </c>
                        <c ca="center">
                           <p>83 (40/43)</p>
                        </c>
                        <c ca="center">
                           <p>81.37 (76.92/86.00)</p>
                        </c>
                        <c ca="center">
                           <p>24 (22/2)</p>
                        </c>
                        <c ca="center">
                           <p>70.59 (88.00/22.22)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>As for gene pairs, when <it>&#945; </it>= 0.75 and the threshold of the positive region sizes caused by single genes was 13, 16 gene pairs were shared by all 102 of the leave-one-out training sets. The LOOCV accuracy of the 16 gene pairs was between 81% and 86%, yet there were three pairs of genes with relatively good classification performance in the test set (Table <tblr tid="T7">7</tblr>). The classification rules generated by the three pairs are presented in the Additional file <supplr sid="S5">5</supplr>.</p>
               <tbl id="T7">
                  <title>
                     <p>Table 7</p>
                  </title>
                  <caption>
                     <p>Three gene pairs with good classification accuracy in the Prostate Cancer dataset</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c ca="left">
                           <p>
                              <b>1st - 2nd Probe ID</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in LOOCV</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in the test set</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>&#945;</it>
                              </b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="4">
                           <hr/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>35178_at - 35277_at</p>
                        </c>
                        <c ca="center">
                           <p>83 (33/50)</p>
                        </c>
                        <c ca="left">
                           <p>81.37 (63.46/100)</p>
                        </c>
                        <c ca="center">
                           <p>26 (20/6)</p>
                        </c>
                        <c ca="center">
                           <p>76.47 (80.00/66.67)</p>
                        </c>
                        <c ca="center">
                           <p>0.75</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>35178_at - 38087_s_at</p>
                        </c>
                        <c ca="center">
                           <p>83 (33/50)</p>
                        </c>
                        <c ca="left">
                           <p>81.37 (63.46/100)</p>
                        </c>
                        <c ca="center">
                           <p><b>27 </b>(21/6)</p>
                        </c>
                        <c ca="center">
                           <p><b>79.41 </b>(84.00/66.67)</p>
                        </c>
                        <c ca="center">
                           <p>0.75</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>39331_at - 33121_g_at</p>
                        </c>
                        <c ca="center">
                           <p><b>86 </b>(38/48)</p>
                        </c>
                        <c ca="left">
                           <p><b>84.31 </b>(73.08/96.00)</p>
                        </c>
                        <c ca="center">
                           <p><b>27 </b>(18/9)</p>
                        </c>
                        <c ca="center">
                           <p><b>79.41 </b>(72.00/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.75</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>We also analyzed the training set based on the depended degree. We ranked all of the genes in the discretized training set by their depended degrees. The top two genes, 37639_at and 41755_at, had the highest depended degree of 36%. When we examined the rules formed by gene 37639_at, we found the following: if g(37639_at) > -0.491443, then Tumor (100% confidence); if g(37639_at) &#8804; -0.694377, then Normal (95% confidence). Both rules were highly reliable. Using the two rules, we correctly classified 33 of the 34 test samples. This result indicates that gene 37639_at possessed high class-discrimination power. The rules arising from this gene indicate that it is relatively highly expressed in tumor samples. Gene 41755_at produced the following two rules: if g(41755_at) > 0.261438, then Tumor (100% confidence); if g(41755_at) &#8804; -0.477124, then Normal (100% confidence). Using these two rules, 14 of the 34 test samples were classified correctly, whereas all 9 samples labeled "Normal" were classified correctly. The rules implied that gene 41755_at is expressed at a low level in normal samples. Apart from 37639_at and 41755_at, gene 38087_s_at produced the following rule: if g(38087_s_at) > -0.281725, then Normal (100% confidence). We correctly classified six of nine normal samples using the rule, indicating that this gene is comparatively highly expressed in normal samples. Information on the top 20 genes ranked based on depended degree is provided in the Additional file <supplr sid="S6">6</supplr>.</p>
               <suppl id="S6">
                  <title>
                     <p>Additional file 6</p>
                  </title>
                  <text>
                     <p>
                        <b>The top 20 genes ranked based on depended degree in the training set of the Prostate Cancer dataset.</b>
                     </p>
                  </text>
                  <file name="1755-8794-2-64-S6.xls">
                     <p>Click here for file</p>
                  </file>
               </suppl>
            </sec>
            <sec>
               <st>
                  <p>Breast Cancer dataset</p>
               </st>
               <p>In the dataset, when <it>&#945; </it>&#8805; 0.8, no shared gene was detected in all of the 78 leave-one-out training sets; when <it>&#945; </it>= 0.75, four genes were found; when <it>&#945; </it>= 0.70, 46 genes were found. Most of these 46 genes had LOOCV accuracy ranging from 70% to 80%, while a few had LOOCV accuracy slightly less than 70%. Using each of the 46 genes to classify the test set, we found eight genes by which no less than 13 of the 19 test samples were classified correctly. Information on the eight genes is listed in Table <tblr tid="T8">8</tblr>. The classification rules generated by each of the eight genes are available in the Additional file <supplr sid="S7">7</supplr>. In the dataset, we did not find any gene pairs with satisfactory classification performance. The best classification accuracy obtained by gene pairs was 12 test samples classified correctly; accuracy was 63.16%.</p>
               <suppl id="S7">
                  <title>
                     <p>Additional file 7</p>
                  </title>
                  <text>
                     <p>
                        <b>The classification rules generated by each of the eight genes identified in the Breast Cancer dataset.</b>
                     </p>
                  </text>
                  <file name="1755-8794-2-64-S7.txt">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <tbl id="T8">
                  <title>
                     <p>Table 8</p>
                  </title>
                  <caption>
                     <p>Eight genes with high classification accuracy in the Breast Cancer dataset</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c ca="center">
                           <p>
                              <b>GenBank accession number</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in LOOCV</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in the test set</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>&#945;</it>
                              </b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="4">
                           <hr/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <ext-link ext-link-type="gen" ext-link-id="NM_012261">NM_012261</ext-link>
                           </p>
                        </c>
                        <c ca="center">
                           <p>57 (21/36)</p>
                        </c>
                        <c ca="center">
                           <p>73.08 (61.76/81.82)</p>
                        </c>
                        <c ca="center">
                           <p><b>16 </b>(10/6)</p>
                        </c>
                        <c ca="center">
                           <p><b>84.21 </b>(83.33/85.71)</p>
                        </c>
                        <c ca="center">
                           <p>0.70</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <ext-link ext-link-type="gen" ext-link-id="AW237580">AW237580</ext-link>
                           </p>
                        </c>
                        <c ca="center">
                           <p><b>58 </b>(18/40)</p>
                        </c>
                        <c ca="center">
                           <p><b>74.36 </b>(52.94/90.91)</p>
                        </c>
                        <c ca="center">
                           <p>13 (8/5)</p>
                        </c>
                        <c ca="center">
                           <p>68.42 (66.67/71.43)</p>
                        </c>
                        <c ca="center">
                           <p>0.70</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <ext-link ext-link-type="gen" ext-link-id="U45975">U45975</ext-link>
                           </p>
                        </c>
                        <c ca="center">
                           <p><b>58 </b>(22/36)</p>
                        </c>
                        <c ca="center">
                           <p><b>74.36 </b>(64.71/81.82)</p>
                        </c>
                        <c ca="center">
                           <p>13 (9/4)</p>
                        </c>
                        <c ca="center">
                           <p>68.42 (75.00/57.14)</p>
                        </c>
                        <c ca="center">
                           <p>0.70</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <ext-link ext-link-type="gen" ext-link-id="AI742029">AI742029</ext-link>
                           </p>
                        </c>
                        <c ca="center">
                           <p>55 (17/38)</p>
                        </c>
                        <c ca="center">
                           <p>70.51 (50.00/86.36)</p>
                        </c>
                        <c ca="center">
                           <p>13 (11/2)</p>
                        </c>
                        <c ca="center">
                           <p>68.42 (91.67/28.57)</p>
                        </c>
                        <c ca="center">
                           <p>0.70</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <ext-link ext-link-type="gen" ext-link-id="NM_001689">NM_001689</ext-link>
                           </p>
                        </c>
                        <c ca="center">
                           <p>57 (22/35)</p>
                        </c>
                        <c ca="center">
                           <p>73.08 (64.71/79.55)</p>
                        </c>
                        <c ca="center">
                           <p>15 (9/6)</p>
                        </c>
                        <c ca="center">
                           <p>78.95 (75.00/85.71)</p>
                        </c>
                        <c ca="center">
                           <p>0.70</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <ext-link ext-link-type="gen" ext-link-id="TSPYL5">TSPYL5</ext-link>
                           </p>
                        </c>
                        <c ca="center">
                           <p><b>58 </b>(24/34)</p>
                        </c>
                        <c ca="center">
                           <p><b>74.36 </b>(70.59/77.27)</p>
                        </c>
                        <c ca="center">
                           <p><b>16 </b>(10/6)</p>
                        </c>
                        <c ca="center">
                           <p><b>84.21 </b>(83.33/85.71)</p>
                        </c>
                        <c ca="center">
                           <p>0.70</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <ext-link ext-link-type="gen" ext-link-id="NM_000271">NM_000271</ext-link>
                           </p>
                        </c>
                        <c ca="center">
                           <p>57 (20/37)</p>
                        </c>
                        <c ca="center">
                           <p>73.08 (58.82/84.09)</p>
                        </c>
                        <c ca="center">
                           <p>13 (9/4)</p>
                        </c>
                        <c ca="center">
                           <p>68.42 (75.00/57.14)</p>
                        </c>
                        <c ca="center">
                           <p>0.70</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <ext-link ext-link-type="gen" ext-link-id="AL049689">AL049689</ext-link>
                           </p>
                        </c>
                        <c ca="center">
                           <p>55 (22/33)</p>
                        </c>
                        <c ca="center">
                           <p>70.51 (64.71/75.00)</p>
                        </c>
                        <c ca="center">
                           <p>13 (10/3)</p>
                        </c>
                        <c ca="center">
                           <p>68.42 (83.33/42.86)</p>
                        </c>
                        <c ca="center">
                           <p>0.70</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Leukemia dataset 2</p>
               </st>
               <p>This dataset contains three classes, being a multi-class classification problem. When <it>&#945; </it>&#8805; 0.95, no shared gene was detected in the 57 leave-one-out training sets; when <it>&#945; </it>= 0.9 and 0.85, a single gene was found; when <it>&#945; </it>= 0.80, five genes were found; when <it>&#945; </it>= 0.75, eight genes were found; when <it>&#945; </it>= 0.70, 21 genes were identified. Almost every one of these 21 genes had a high LOOCV accuracy and good classification performance in the test set. Their classification information is listed in Table <tblr tid="T9">9</tblr>. Gene 36239_at had the best LOOCV accuracy and classification accuracy in the test set. The classification rules induced by this gene were as follows: if g(36239_at) > 1796.5, then ALL; if g(36239_at) > 214 and g(36239_at) &#8804; 1796.5, then MLL; if g(36239_at) &#8804; 214, then AML; with 95.24%, 93.33%, and 90.48% confidence, respectively. Using these three rules, we correctly classified 14 of the 15 test samples; accuracy reached 93.33%. The other genes produced similar classification rules. The classification rules generated by every gene can be found in the Additional file <supplr sid="S8">8</supplr>. We did not examine gene pairs for the classification, as the rules induced by gene pairs tended to be complex.</p>
               <suppl id="S8">
                  <title>
                     <p>Additional file 8</p>
                  </title>
                  <text>
                     <p>
                        <b>The classification rules generated by each of the 21 genes identified in the Leukemia dataset 2.</b>
                     </p>
                  </text>
                  <file name="1755-8794-2-64-S8.txt">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <tbl id="T9">
                  <title>
                     <p>Table 9</p>
                  </title>
                  <caption>
                     <p>Twenty-one genes with high classification accuracy in the Leukemia dataset 2</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c ca="left">
                           <p>
                              <b>Probe ID</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in LOOCV</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>Classification results in the test set</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>&#945;</it>
                              </b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b># Correctly classified samples</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Accuracy (%)</b>
                           </p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>36239_at</p>
                        </c>
                        <c ca="center">
                           <p><b>51 </b>(20/12/19)</p>
                        </c>
                        <c ca="center">
                           <p><b>89.47 </b>(100/70.59/95)</p>
                        </c>
                        <c ca="center">
                           <p><b>14 </b>(4/2/8)</p>
                        </c>
                        <c ca="center">
                           <p><b>93.33 </b>(100/66.67/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.90</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>39318_at</p>
                        </c>
                        <c ca="center">
                           <p>47 (17/11/19)</p>
                        </c>
                        <c ca="center">
                           <p>82.46 (85/64.71/95)</p>
                        </c>
                        <c ca="center">
                           <p>13 (2/3/8)</p>
                        </c>
                        <c ca="center">
                           <p>86.67 (50/100/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>40191_s_at</p>
                        </c>
                        <c ca="center">
                           <p>48 (17/13/18)</p>
                        </c>
                        <c ca="center">
                           <p>84.21 (85/76.47/90)</p>
                        </c>
                        <c ca="center">
                           <p>12 (2/2/8)</p>
                        </c>
                        <c ca="center">
                           <p>80 (50/66.67/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>840_at</p>
                        </c>
                        <c ca="center">
                           <p>47 (19/10/18)</p>
                        </c>
                        <c ca="center">
                           <p>82.46 (95/58.82/90)</p>
                        </c>
                        <c ca="center">
                           <p>11 (3/1/7)</p>
                        </c>
                        <c ca="center">
                           <p>73.33 (75/33.33/87.50)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>266_s_at</p>
                        </c>
                        <c ca="center">
                           <p>46 (19/11/16)</p>
                        </c>
                        <c ca="center">
                           <p>80.70 (95/64.71/80)</p>
                        </c>
                        <c ca="center">
                           <p>13 (4/1/8)</p>
                        </c>
                        <c ca="center">
                           <p>86.67 (100/33.33/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>37933_at</p>
                        </c>
                        <c ca="center">
                           <p>45 (20/7/18)</p>
                        </c>
                        <c ca="center">
                           <p>78.95 (100/41.18/90)</p>
                        </c>
                        <c ca="center">
                           <p>8 (2/0/6)</p>
                        </c>
                        <c ca="center">
                           <p>53.33 (50/0/75)</p>
                        </c>
                        <c ca="center">
                           <p>0.75</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>38989_at</p>
                        </c>
                        <c ca="center">
                           <p>43 (19/6/18)</p>
                        </c>
                        <c ca="center">
                           <p>75.44 (95/35.29/90)</p>
                        </c>
                        <c ca="center">
                           <p>12 (3/1/8)</p>
                        </c>
                        <c ca="center">
                           <p>80 (75/33.33/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.75</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>33833_at</p>
                        </c>
                        <c ca="center">
                           <p>44 (16/10/18)</p>
                        </c>
                        <c ca="center">
                           <p>77.19 (80/58.82/90)</p>
                        </c>
                        <c ca="center">
                           <p>10 (2/0/8)</p>
                        </c>
                        <c ca="center">
                           <p>66.67 (50/0/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.75</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>32874_at</p>
                        </c>
                        <c ca="center">
                           <p>43 (14/11/18)</p>
                        </c>
                        <c ca="center">
                           <p>75.44 (70/64.71/90)</p>
                        </c>
                        <c ca="center">
                           <p>10 (2/1/7)</p>
                        </c>
                        <c ca="center">
                           <p>66.67 (50/33.33/87.5)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>37487_at</p>
                        </c>
                        <c ca="center">
                           <p>41 (14/7/20)</p>
                        </c>
                        <c ca="center">
                           <p>71.93 (70/41.18/100)</p>
                        </c>
                        <c ca="center">
                           <p>11 (3/0/8)</p>
                        </c>
                        <c ca="center">
                           <p>73.33 (75/0/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>31886_at</p>
                        </c>
                        <c ca="center">
                           <p>42 (16/8/18)</p>
                        </c>
                        <c ca="center">
                           <p>73.68 (80/47.06/90)</p>
                        </c>
                        <c ca="center">
                           <p>13 (3/2/8)</p>
                        </c>
                        <c ca="center">
                           <p>86.67 (75/66.67/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>35164_at</p>
                        </c>
                        <c ca="center">
                           <p>48 (19/15/14)</p>
                        </c>
                        <c ca="center">
                           <p>84.21 (95/88.24/70)</p>
                        </c>
                        <c ca="center">
                           <p>13 (4/2/7)</p>
                        </c>
                        <c ca="center">
                           <p>86.67 (100/66.67/87.5)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>36905_at</p>
                        </c>
                        <c ca="center">
                           <p>46 (14/12/20)</p>
                        </c>
                        <c ca="center">
                           <p>80.70 (70/70.59/100)</p>
                        </c>
                        <c ca="center">
                           <p>9 (0/1/8)</p>
                        </c>
                        <c ca="center">
                           <p>60 (0/33.33/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>37539_at</p>
                        </c>
                        <c ca="center">
                           <p>50 (16/16/18)</p>
                        </c>
                        <c ca="center">
                           <p>87.72 (80/94.12/90)</p>
                        </c>
                        <c ca="center">
                           <p>10 (3/3/4)</p>
                        </c>
                        <c ca="center">
                           <p>66.67 (75/100/50)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>37910_at</p>
                        </c>
                        <c ca="center">
                           <p>45 (18/9/18)</p>
                        </c>
                        <c ca="center">
                           <p>78.95 (90/52.94/90)</p>
                        </c>
                        <c ca="center">
                           <p>9 (1/1/7)</p>
                        </c>
                        <c ca="center">
                           <p>60 (25/33.33/87.5)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>32847_at</p>
                        </c>
                        <c ca="center">
                           <p>44 (18/12/14)</p>
                        </c>
                        <c ca="center">
                           <p>77.19 (90/70.59/70)</p>
                        </c>
                        <c ca="center">
                           <p>11 (4/2/5)</p>
                        </c>
                        <c ca="center">
                           <p>73.33 (100/66.67/62.5)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>35260_at</p>
                        </c>
                        <c ca="center">
                           <p>42 (20/8/14)</p>
                        </c>
                        <c ca="center">
                           <p>73.68 (100/47.06/70)</p>
                        </c>
                        <c ca="center">
                           <p>9 (2/1/6)</p>
                        </c>
                        <c ca="center">
                           <p>60 (50/33.33/75)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>41790_at</p>
                        </c>
                        <c ca="center">
                           <p>47 (19/11/17)</p>
                        </c>
                        <c ca="center">
                           <p>82.46 (95/64.71/85)</p>
                        </c>
                        <c ca="center">
                           <p>13 (3/2/8)</p>
                        </c>
                        <c ca="center">
                           <p>86.67 (75/66.67/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>32579_at</p>
                        </c>
                        <c ca="center">
                           <p>48 (15/13/20)</p>
                        </c>
                        <c ca="center">
                           <p>84.21 (75/76.47/100)</p>
                        </c>
                        <c ca="center">
                           <p>11 (2/1/8)</p>
                        </c>
                        <c ca="center">
                           <p>73.33 (50/33.33/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1373_at</p>
                        </c>
                        <c ca="center">
                           <p>47 (16/12/19)</p>
                        </c>
                        <c ca="center">
                           <p>82.46 (80/70.59/95)</p>
                        </c>
                        <c ca="center">
                           <p>10 (1/1/8)</p>
                        </c>
                        <c ca="center">
                           <p>66.67 (25/33.33/100)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>1325_at</p>
                        </c>
                        <c ca="center">
                           <p>47 (19/14/14)</p>
                        </c>
                        <c ca="center">
                           <p>82.46 (95/82.35/70)</p>
                        </c>
                        <c ca="center">
                           <p>10 (3/3/4)</p>
                        </c>
                        <c ca="center">
                           <p>66.67 (75/100/50)</p>
                        </c>
                        <c ca="center">
                           <p>0.7</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Comparison and analysis of results</p>
            </st>
            <sec>
               <st>
                  <p>Leukemia dataset 1</p>
               </st>
               <p>Other researchers have explored the problem concerned with the classification of the dataset using rule-based machine-learning methods. In <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, the authors proposed first using feature ranking (t-test) and then rough sets attribute reduction for gene selection. They ultimately identified one gene, which classified 31 samples correctly in the test set. This gene was the gene identified in the present study: gene #4847. However, our method identified not only this gene, but also other informative genes, including one gene pair with 100% classification accuracy. In <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, the authors also used rough sets for gene selection. They chose genes with maximum relevance with respect to the class variable and the maximum positive interaction between different genes. We also selected genes with maximum relevance with respect to the class variable (i.e., the depended degree of a single gene), while we chose gene pairs with maximum relevance with respect to the class variable rather than maximum positive interaction between the genes, since the maximum positive interaction between two genes may counteract the depended degree of a single gene. Because this previous study assessed classification performance using LOOCV on a total of 72 samples instead of separating them into training and test sets, it is impractical to compare their results with those of the present study. Likewise, in <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> the authors took into account all attributes depending upon the degree of dependency. They selected the top <it>&#955; </it>attributes (<it>&#955; </it>= 2, 4, 6, 8, 10, 12, 14, 15) by the degree of dependency, and found all possible combinations of these <it>&#955; </it>attributes as a subset. The authors calculated the depended degrees of every subset and chose those with 100% depended degrees. Finally, they evaluated the classification performance of the selected subsets using <it>k</it>-NNs. In essence, their method was to find the reducts with limited sizes and to use them for classification. As we mentioned above, finding all of the reducts is computationally intensive, even for a small attribute number. Moreover, one reduct does not indicate high classification performance. Another difference between our method and that of <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> is that our classifier is based on rules, whereas theirs is not. Although they gain a classification score of 97% with gene subsets of size two, they did not find any gene pair with a classification score of 100%, and they did not identify any important genes. In <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, a method of combining rough sets with GAs was proposed to classify microarray gene expression patterns. A correct classification of 90.3% was obtained with a nine-gene classifier in the dataset.</p>
               <p>In <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, the authors used the EPs approach to mark one important gene, Zyxin, which is our gene #4847. Using the two rules induced by the gene, the authors accurately classified 31 samples, the same result as ours. However, they did not identify any gene pair with higher classification performance, as we did. In <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, the authors used decision trees (Single C4.5, Bagging C4.5, AdaBoost C4.5) to perform classification tasks on seven publicly available cancerous microarray datasets, including the ALL-AML leukemia data. They first employed Fayyad and Irani's <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> discretization method to filter out noise. The remaining 1038 genes were used in the actual learning process. Their highest accuracy was 91.2% (31 samples classified correctly). Since the authors did not report the size of the pruned decision trees, we have no knowledge of how many genes they used to reach the highest accuracy. In <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, 91.2% classification accuracy was achieved by using the rule classifiers containing gene subsets with sizes ranging from 10 to 40. In <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, the authors utilized a single pair of genes to correctly classify 31 test set samples.</p>
               <p>Besides, a number of different non-rule-based methods have been proposed for gene selection and cancer classification in the dataset. Golub et al. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> were the first to classify ALL-AML by gene expression data. The authors constructed the predictor using 50 informative genes, trained by weighted voting on the training set. The prediction rates included 36 samples classified correctly, with two samples labeled "uncertain" in LOOCV, as well as 29 of the 34 samples in the test set classified correctly, with no predictions made for the remaining five samples. In <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, the authors applied probabilistic neural networks (PNNs) to the class prediction of ALL-AML, and achieved 100% prediction accuracy in the test set using the 50-gene predictors derived from cross-validation tests of the training set by means of the signal-to-noise statistic feature selection method. In <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, the authors used a correlation-based feature (CBF) selector in conjunction with machine-learning algorithms such as decision trees (JP48), NB, and SVMs to analyze cancer microarray data. They reported one noteworthy gene, Zyxin, which classified 31 samples correctly. In <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, the authors proposed a maximal margin linear programming (MAMA) method for the classification of tumor samples based on microarray data. This procedure detected groups of genes and constructed models that strongly correlated with particular tumor types. They achieved 100% prediction accuracy on the test set using gene subsets ranging in size from 132 to 549. In <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, the authors proposed dimension reduction using partial least squares (PLS) and classification using logistic discrimination (LD) and quadratic discriminant analysis (QDA). By using gene subsets with sizes between 50 and 1500, the authors obtained correct classification of the test samples ranging from 28 to 33. In <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, the authors used SVMs trained and gene subsets selected in the training set to classify samples in the test set, resultng in the correct classification of between 30 and 32 of the 34 samples. Other SVM-based methods report zero test error with gene subsets ranging in size from 8 to 30 <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>.</p>
               <p>Table <tblr tid="T10">10</tblr> compares our methods with those employed in previous studies. The table reveals that our classification results are superior to almost all of those obtained in previous studies.</p>
               <tbl id="T10">
                  <title>
                     <p>Table 10</p>
                  </title>
                  <caption>
                     <p>Comparison of best classification accuracy for the Leukemia dataset 1</p>
                  </caption>
                  <tblbdy cols="4">
                     <r>
                        <c ca="center">
                           <p>
                              <b>Methods (feature selection + classification)</b>
                              <sup>a</sup>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>#Selected genes</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>#Correctly classified samples (accuracy)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Rule-based classifier</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>depended degree + decision rules [this work]</p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>31 (91.18%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>34 (100%)</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>t-test, attribute reduction + decision rules <abbrgrp><abbr bid="B7">7</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>31 (91.18%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>attribute reduction + <it>k</it>-NNs <abbrgrp><abbr bid="B9">9</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>33 (97.06%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>rough sets, GAs + <it>k</it>-NNs <abbrgrp><abbr bid="B10">10</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>9</p>
                        </c>
                        <c ca="center">
                           <p>31 (91.18%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>EPs <abbrgrp><abbr bid="B6">6</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>31 (91.18%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>discretization + decision trees <abbrgrp><abbr bid="B11">11</abbr></abbrgrp><sup>b</sup></p>
                        </c>
                        <c ca="center">
                           <p>unknown<sup>c</sup></p>
                        </c>
                        <c ca="center">
                           <p>31 (91.18%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>CBF + decision trees <abbrgrp><abbr bid="B24">24</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>31 (91.18%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>TSP <abbrgrp><abbr bid="B14">14</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>31 (91.18%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>RCBT <abbrgrp><abbr bid="B13">13</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>10-40</p>
                        </c>
                        <c ca="center">
                           <p>31 (91.18%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>neighborhood analysis + weighted voting <abbrgrp><abbr bid="B2">2</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>50</p>
                        </c>
                        <c ca="center">
                           <p>29 (85.29%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>signal to noise ratios + PNNs <abbrgrp><abbr bid="B23">23</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>50</p>
                        </c>
                        <c ca="center">
                           <p>34 (100%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>MAMA <abbrgrp><abbr bid="B25">25</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>132-549</p>
                        </c>
                        <c ca="center">
                           <p>34 (100%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PLS + LD or QDA <abbrgrp><abbr bid="B26">26</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>50-1500</p>
                        </c>
                        <c ca="center">
                           <p>28-33 (82.4%-97%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>prediction strength + SVMs <abbrgrp><abbr bid="B27">27</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>25-1000</p>
                        </c>
                        <c ca="center">
                           <p>30-32 (88.2%-94.1%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>SVMs <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>8-30</p>
                        </c>
                        <c ca="center">
                           <p>34 (100%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p><sup>a</sup>The text before "+" states the feature selection method, while that after it states the classification method. The absence of "+" means that the same method was used for both feature selection and classification.</p>
                     <p><sup>b</sup>The decision trees are also involved in feature selection.</p>
                     <p><sup>c</sup>"unknown" means that no related data are provided in the article.</p>
                     <p>These explanations apply to the other tables.</p>
                  </tblfn>
               </tbl>
               <p>In this dataset, we identified 11 genes that show good classification performance alone or in combination with another gene. These genes are Zyxin, MGST1, TCRA, APLP2, CCND3, HKR-T1, KIAA0159, TOP2B, MB-1, ARHG, and IOTA. Among these, Zyxin, CCND3, HKR-T1, TOP2B, MB-1, and IOTA also belong to the list of the 50 informative genes identified by Golub et al. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>; Zyxin is highly expressed in AML, and the rest are highly expressed in ALL. Our rules relevant to these genes revealed that Zyxin, MGST1, APLP2, and ARHG are upregulated in AML, while TCRA, CCND3, HKR-T1, KIAA0159, TOP2B, MB-1, and IOTA are upregulated in ALL. These results demonstrate that our rules are reasonable.</p>
               <p>Our method identified an outstanding gene, Zyxin, by which we classified the test set with 91.2% accuracy. The gene is also referred to by other researchers <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>. Our results and those of other related studies suggest that the expression level of Zyxin plays an important role in distinguishing ALL from AML. Zyxin is a focal-adhesion-associated phosphoprotein with one domain involved in the control of actin assembly and three protein-protein adapter domains implicated in the regulation of cell growth and differentiation. Zyxin may function as a messenger in the signal transduction pathway that mediates adhesion-stimulated changes in gene expression. As noted in <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, cell spreading, proliferation, and survival are modulated by focal adhesions linking extracellular matrix proteins, integrins, and the cytoskeleton. By supporting the involvement of the microfilament network in tumor cell behavior, several actin-binding proteins, including Zyxin, a potential regulator of actin polymerization, may play a role in oncogenesis. The gene encoding Zyxin maps at 7q32, a chromosomal region affected in a variety of human cancers. 7q monosomy or partial deletion of this chromosome arm is frequently found in myelodysplastic syndrome, acute myeloid, juvenile myelomonocytic, and acute lymphocytic leukemias, as well as in breast carcinoma <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>. Valdes et al. revealed that the actin cytoskeleton-associated protein Zyxin acts as a tumor suppressor in Ewing tumor cells <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Yagi et al. also identified Zyxin as one of 35 genes associated with pediatric AML prognosis <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Taken together, these lines of evidence suggest that Zyxin plays an important role in leukemia pathogenesis.</p>
               <p>The aforementioned gene pair, MGST1 vs. TCRA, is capable of classifying the test set with zero error. Their biological meanings are noteworthy. MGST1 is also one of the three core genes screened by Banerjee et al. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. In <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, the gene lies in the first 10 genes selected by the methods of <it>&#967;</it><sup>2</sup>, InfoGain, ReliefF, and symmetrical uncertainty. In <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, MGST1 belonged to the set of top 50 genes selected by signal-to-noise metric (10-fold cross-validation tests). In our 13 gene pairs with the highest classification performance, MGST1 occurred five times. These facts demonstrate that MGST1 is significant in the classification of ALL-AML. Although it has not been identified by other algorithms, the gene TCRA is clearly important in the pathogenesis of leukemia <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>.</p>
               <p>APLP2 was one of the first 10 genes selected by Wang et al. <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, and was identified by Huang et al. <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. It was also identified by Yagi et al. <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> as one of 35 genes associated with pediatric AML prognosis. CCND3 is also listed as one of the 50 genes selected by Huang et al. <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. KIAA0159 is an essential component of the human condensin complex required for mitotic chromosome condensation. In a brief examination of related literature, we found that the gene has not been identified by other algorithms. However, past studies have indicated that nonrandom chromosomal translocations are characteristic of most human hematopoietic malignancies <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. Because KIAA0159 is correlated with the structural maintenance of chromosomes, it may be associated with the pathogenesis of leukemia. TOP2B encodes the protein that is the principal target of the antileukemic drug etoposide <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp>. MB-1 encodes the Ig-alpha protein of the B-cell antigen component. Its dysregulation has been reported to be closely linked to leukemia and lymphoma <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>. ARHG is a member of the RAS superfamily of genes, which encode GTP-binding proteins that act in the pathway of signal transduction and play a key role in the regulation of cellular functions <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>.</p>
               <p>In general, the genes identified in the present study are all directly or indirectly relevant to hematopoietic or cancerous pathogenesis. Therefore, they are likely to play key roles in the pathogenesis of ALL or AML. It is possible that they have high performance in distinguishing ALL from AML.</p>
            </sec>
            <sec>
               <st>
                  <p>Lung Cancer dataset</p>
               </st>
               <p>In <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, the authors used rough sets to handle the same dataset as that considered in the present study. Their best result was 98% classification accuracy with genes of size two. As they employed a non-rule-based classifier, <it>k</it>-NN, no rule was given to explain the result. In <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>, in terms of classification performance, the authors compared prediction by collective likelihoods (PCLs), based on the concept of EPs, with other classification algorithms, including decision trees, SVMs, and <it>k-</it>NNs. Regarding the Lung Cancer dataset, they obtained classification results containing between 1 and 27 errors. The classification accuracy of our method is higher than that of other rule-based classification algorithms, including PCLs and the decision trees mentioned in <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. The highest classification accuracies on the dataset, using the three different decision trees reported in <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, were about 93%. In <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, the best result was 98% classification accuracy. In the initial research article on the dataset <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, the authors reported 99% classification accuracy using six genes. Table <tblr tid="T11">11</tblr> compares our results with those of other studies, revealing that our outcomes matched or outperformed those obtained using other methods.</p>
               <tbl id="T11">
                  <title>
                     <p>Table 11</p>
                  </title>
                  <caption>
                     <p>Comparison of best classification accuracy for the Lung Cancer dataset</p>
                  </caption>
                  <tblbdy cols="4">
                     <r>
                        <c ca="center">
                           <p>
                              <b>Methods (feature selection + classification)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>#Selected genes</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>#Correctly classified samples (accuracy)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Rule-based classifier</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>depended degree + decision rules [this work]</p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>145 (97.34%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>144 (96.64%)</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>attribute reduction + <it>k</it>-NNs <abbrgrp><abbr bid="B9">9</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>146 (97.99%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PCLs <abbrgrp><abbr bid="B50">50</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>146 (97.99%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>C4.5 <abbrgrp><abbr bid="B50">50</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>122 (81.88%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Bagging <abbrgrp><abbr bid="B50">50</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>131 (87.92%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Boosting <abbrgrp><abbr bid="B50">50</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>122 (81.88%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>SVMs <abbrgrp><abbr bid="B50">50</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>148 (99.33%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>k</it>-NNs <abbrgrp><abbr bid="B50">50</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>148 (99.33%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>discretization + decision trees <abbrgrp><abbr bid="B11">11</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>139 (93.29%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>RCBT <abbrgrp><abbr bid="B13">13</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>10-40</p>
                        </c>
                        <c ca="center">
                           <p>146 (97.99%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>gene expression ratios <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>6</p>
                        </c>
                        <c ca="center">
                           <p>148 (99.33%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>We now explain in more detail the results presented in <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. The article proposed to use the expression levels of a small number of genes for the diagnosis of MPM and lung cancer. The authors screened out eight genes with marked differences in average expression levels between the tumor types in the training set. They then calculated 15 expression ratios for each sample by dividing the expression value of each of the five genes expressed at relatively higher levels in MPM by the expression value of each of the three genes expressed at relatively higher levels in ADCA. Next, they employed these ratios for the test set. Samples with ratio values > 1 were classified as MPM, and those with ratio values &lt; 1 were classified as ADCA. They achieved classification accuracies ranging from 91% to 98%. In essence, they also utilized gene pairs for classification. Yet, when following the same protocol for training and testing, our results are superior to theirs, in that they used three ratios (i.e., six genes) to reach 148 of 149 correctly classified samples, while we obtained the same result using each of the seven gene pairs directly selected from the training set without the LOOCV procedure. Of note, six of the eight genes selected in this earlier study were also identified in the present study. The six genes are PTGIS, CD200, TACSTD1, TTF1, ANXA8, and CALB2, the first three of which have a 100% depended degree.</p>
               <p>The genes selected by our method are associated primarily with the pathogenesis of MPM or ADCA or some other tumor. According to our rules, JUP, CD24, PRKCD, MAPK13, TACSTD2, DKFZP564O0823 protein, TACSTD1, CEACAM1, XBP1, TTF1, SFTPB, AGR2, ELF3, EVI1, and CDA are highly expressed in ADCA, while EGF, SPTAN1, FLNC, PTGIS, FBXL7, CD200, AP2 M1, ANXA8, HAS1, CALB2, GFPT2, KIAA0427, C1S, EIF4G3, TGM1, Adamts3, hypothetical protein dJ465N24.2.1, and AP3S1 are highly expressed in mesothelioma. CALB2 encodes calretinin, which is a component of several immunohistochemical panels currently used in the diagnosis of MPM and lung cancer <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. HAS1 is a member of gene family HA, which has been correlated with tumor metastasis. In <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, HAS1 was identified as a prognostic gene for mesothelioma. In <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>, HAS1 belongs to the list of the genes with elevated expression levels in C1 MPM tumors. We have one rule arising from HAS1: if g(HAS1) > 7.3, then MPM. This rule is consistent with the results of <abbrgrp><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp>. ANXA8, PTGIS, and CLAB2 are also marked as more highly expressed genes in C1 MPM tumors <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. These observations are supported by the following rules of the present study: if g(ANXA8) > 130.8, then MPM; if g(CALB2) > 490.5, then MPM; if g(PTGIS) > 193.25, then MPM. Other genes that we chose (e.g., CD24, TACSTD1, TACSTD2, CEACAM1, and PRKCD) are correlated with lung carcinoma or other tumors. TTF1 is a transcription factor that regulates the expression of multiple genes involved in lung development. It is preferentially expressed in ADCAs of the lung and has been investigated as a potential prognostic parameter in patients with lung cancer <abbrgrp><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Prostate Cancer dataset</p>
               </st>
               <p>Regarding the Prostate Cancer dataset, a previous study reported a 95% prediction rate using a gene pair <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The best classification results on the dataset, based on three different decision tree approaches (Single C4.5, Bagging C4.5, and AdaBoost C4.5), are 67.65%, 73.53%, and 67.65%, respectively <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. In <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, a 97% classification result was reported, but the employed gene numbers were not provided. In <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, the authors built predictors using a <it>k</it>-NN algorithm, and achieved 77% and 86% prediction accuracy on the test set with 4 and 16 genes, respectively. Table <tblr tid="T12">12</tblr> summarizes the best results of classification on the dataset.</p>
               <tbl id="T12">
                  <title>
                     <p>Table 12</p>
                  </title>
                  <caption>
                     <p>Comparison of best classification accuracy for the Prostate Cancer dataset</p>
                  </caption>
                  <tblbdy cols="4">
                     <r>
                        <c ca="center">
                           <p>
                              <b>Methods (feature selection + classification)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>#Selected genes</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>#Correctly classified samples (accuracy)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Rule-based classifier</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>depended degree + decision rules [this work]</p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>31 (91.18%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>27 (79.41%)</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>TSP <abbrgrp><abbr bid="B14">14</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>32 (94.12%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PCLs <abbrgrp><abbr bid="B50">50</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>33 (97.06%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>discretization + Single C4.5 <abbrgrp><abbr bid="B11">11</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>23 (67.65%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>discretization + Bagging C4.5 <abbrgrp><abbr bid="B11">11</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>25 (73.53%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>discretization + AdaBoost C4.5 <abbrgrp><abbr bid="B11">11</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>23 (67.65%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>RCBT <abbrgrp><abbr bid="B13">13</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>33 (97.06%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>SVMs <abbrgrp><abbr bid="B13">13</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>27 (79.41%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>signal to noise ratios + <it>k</it>-NNs <abbrgrp><abbr bid="B18">18</abbr></abbrgrp><sup>d</sup></p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c ca="center">
                           <p>26 (77.2%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>16</p>
                        </c>
                        <c ca="center">
                           <p>29 (85.7%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p><sup>d</sup>In <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, as both raw and normalized datasets were used, two groups of prediction results were obtained. Here, we chose their results from the normalized dataset. Another small difference is that we obtained the dataset from the Kent Ridge Bio-medical Data Set Repository, where the prostate test set includes 25 tumor and 9 normal samples instead of the 27 tumor and 8 normal samples studied in <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>. To facilitate comparison, the correctly classified sample numbers were calculated according to the total of 34 samples.</p>
                  </tblfn>
               </tbl>
               <p>In the Prostate Cancer dataset, we identified 13 genes using the LOOCV approach. Seven of the eight single genes had relatively good classification performance, of which five genes had established names: NRP2, TMSB15A, PEDF, FAM107A and TGFB3. Our rules imply that TMSB15A, also named thymosin beta15, is highly expressed, while NRP2, PEDF, FAM107A and TGFB3 are expressed at low levels in tumor tissue. As revealed in <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>, thymosin beta15 levels are elevated in human prostate cancer and correlate positively with the Gleason tumor grade. Thymosin beta 15 may represent a potential new biochemical marker for the progression of human prostate cancer; our rules strengthen this perspective. Previous investigations have revealed that PEDF expression is negatively correlated with tumor malignancy <abbrgrp><abbr bid="B58">58</abbr><abbr bid="B59">59</abbr><abbr bid="B60">60</abbr><abbr bid="B61">61</abbr><abbr bid="B62">62</abbr></abbrgrp>; our rules support this viewpoint. FAM107A has been consistently reported to be downregulated in human cancer <abbrgrp><abbr bid="B63">63</abbr><abbr bid="B64">64</abbr></abbrgrp>; that conforms to our rules. In the gene pairs, our rules indicate that KIAA0762 is downregulated, while TUBB and RGS10 are upregulated in tumor tissue; however, there exists insufficient evidence to directly link the three genes with prostate cancer.</p>
               <p>The three genes that we identified directly from the training set are hepsin (37639_at), KIAA0977 (41755_at), and S100A4 (38087_s_at). Hepsin performs reasonably well in differentiating two classes of samples, and the latter two genes are good indicators of normal samples. Hepsin is the human hepatoma mRNA for serine protease. Numerous studies have revealed that it is closely linked to prostate cancer. Hepsin is widely reported to be highly over-expressed in more than 90% of human prostate tumors, making it a significant marker and a target for prostate cancer <abbrgrp><abbr bid="B65">65</abbr><abbr bid="B66">66</abbr><abbr bid="B67">67</abbr><abbr bid="B68">68</abbr><abbr bid="B69">69</abbr><abbr bid="B70">70</abbr><abbr bid="B71">71</abbr><abbr bid="B72">72</abbr></abbrgrp>. In <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, hepsin was identified as the first over-expressed gene in tumor samples and was selected as one of 16 genes used for creating a prediction model. All of these outcomes strongly support our rules involved in hepsin. Another gene, KIAA0977, has also been listed as a highly expressed gene in tumor samples <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The third gene, S100A4, was associated with cancer pathogenesis, chromosomal rearrangements and altered expression of which have been implicated in tumor metastasis <abbrgrp><abbr bid="B73">73</abbr><abbr bid="B74">74</abbr><abbr bid="B75">75</abbr></abbrgrp>. In <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, S100A4 was identified as one of the highly expressed genes in normal samples and chosen as one member of a 16-gene model of prediction. In addition, <abbrgrp><abbr bid="B76">76</abbr></abbrgrp> noted that S100A4 protein was not expressed in benign or malignant prostatic epithelium or in LNCaP and Du145 cells. Our rules related to this gene support these previous findings. A surprising result is that many observations have revealed that S100A4 is over-expressed in most other tumors <abbrgrp><abbr bid="B77">77</abbr><abbr bid="B78">78</abbr><abbr bid="B79">79</abbr><abbr bid="B80">80</abbr><abbr bid="B81">81</abbr><abbr bid="B82">82</abbr></abbrgrp>, yet in <abbrgrp><abbr bid="B76">76</abbr></abbrgrp> the authors suggested that the mechanism of changes in the expression level of S100A4 may involve methylation of the S100A4 gene.</p>
            </sec>
            <sec>
               <st>
                  <p>Breast Cancer dataset</p>
               </st>
               <p>In the Breast Cancer dataset, our best LOOCV accuracy was 74.34%, and the highest classification accuracy in the test set was 84.21% with one gene. In <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, the authors reported 83.33% LOOCV accuracy and 89.47% accuracy in the test set using the 70-gene predictor. These prediction results are moderately superior to those attained in the present study, although using a much larger number of genes. Likewise, Tan et al. <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> obtained a slightly better classification outcome than that of the present study, although they used far more genes. Table <tblr tid="T13">13</tblr> lists some of the best classification results for this dataset, as obtained using a variety of methods.</p>
               <tbl id="T13">
                  <title>
                     <p>Table 13</p>
                  </title>
                  <caption>
                     <p>Comparison of best classification accuracy for the Breast Cancer dataset</p>
                  </caption>
                  <tblbdy cols="4">
                     <r>
                        <c ca="center">
                           <p>
                              <b>Methods (feature selection + classification)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>#Selected genes</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>#Correctly classified samples (accuracy)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Rule-based classifier</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>&#945; </it>depended degree + decision rules [this work]</p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>16 (84.21%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>TSP <abbrgrp><abbr bid="B14">14</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>79.38%<sup>e</sup></p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>RBF <abbrgrp><abbr bid="B50">50</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>67</p>
                        </c>
                        <c ca="center">
                           <p>79.38%<sup>e</sup></p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>discretization + decision trees <abbrgrp><abbr bid="B11">11</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>unknown</p>
                        </c>
                        <c ca="center">
                           <p>17 (89.47%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>correlation coefficient <abbrgrp><abbr bid="B19">19</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>70</p>
                        </c>
                        <c ca="center">
                           <p>17 (89.47%)</p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p><sup>e</sup>LOOCV result in the total of 97 samples.</p>
                  </tblfn>
               </tbl>
               <p>In this dataset, we identified eight genes with relatively high individual classification performance. Our rules indicated that the overexpression of ATP5G3, TSPYL5, or NPC1 means an unfavorable prognosis, while the overexpression of HS1119D91, Contig38726_RC, PIB5PA, Contig51517_RC, or LOC63923 implies a favorable prognosis. TSPYL5 had the best classification accuracy in our model; it was also chosen as one of 70 prognostic marker genes and ranked first according to the correlation coefficient with the two prognostic groups <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. It follows that our gene selection approach is reasonable. In <abbrgrp><abbr bid="B83">83</abbr></abbrgrp>, the authors proposed a prognostic predictor of breast cancer with multiple fuzzy neural models using the same dataset. Surprisingly, although these methods are distinct from those of the present study, there is an overlap of 3 genes between the 10 highest-ranked genes they chose for prediction and our 8-gene group.</p>
            </sec>
            <sec>
               <st>
                  <p>Leukemia dataset 2</p>
               </st>
               <p>Although this dataset is involved in a multi-class classification problem, we still achieved relatively good classification outcomes. Our best prediction rate was 93.33% in the test set and 89.47% LOOCV accuracy in the training set, each by one gene, compared with a 90% prediction rate in the test set by 100 genes and 95% LOOCV accuracy in the training set by 40 genes, as reported by Armstrong et al. <abbrgrp><abbr bid="B84">84</abbr></abbrgrp>. In addition, Wang et al. reported 100% LOOCV accuracy in all 72 samples using 26 genes; however, their methods were not verified by an independent test set. These outcomes are presented in Table <tblr tid="T14">14</tblr>.</p>
               <tbl id="T14">
                  <title>
                     <p>Table 14</p>
                  </title>
                  <caption>
                     <p>Comparison of best classification accuracy for the Leukemia dataset 2</p>
                  </caption>
                  <tblbdy cols="4">
                     <r>
                        <c ca="center">
                           <p>
                              <b>Methods (feature selection + classification)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>#Selected genes</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>#Correctly classified samples (accuracy)</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Rule-based classifier</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>&#945; </it>depended degree + decision rules [this work]</p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>14 (93.33%)</p>
                        </c>
                        <c ca="center">
                           <p>yes</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>HykGene + <it>k</it>-NNs, SVMs, C4.5, NB <abbrgrp><abbr bid="B85">85</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>26</p>
                        </c>
                        <c ca="center">
                           <p>100%<sup>f</sup></p>
                        </c>
                        <c ca="center">
                           <p>no<sup>i</sup></p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>signal to noise ratios + <it>k</it>-NNs <abbrgrp><abbr bid="B20">20</abbr></abbrgrp></p>
                        </c>
                        <c ca="center">
                           <p>40</p>
                        </c>
                        <c ca="center">
                           <p>95%<sup>g</sup></p>
                        </c>
                        <c ca="center">
                           <p>no</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>9 (90%)<sup>h</sup></p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p><sup>f</sup>LOOCV result in a total of 72 samples.</p>
                     <p><sup>g</sup>LOOCV result in a total of 57 training samples.</p>
                     <p><sup>h</sup>In <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, only 3 of 8 AML testing samples in the dataset were mentioned. Thus, their test set contained 10 rather than 15 samples.</p>
                     <p><sup>i</sup>Except for C4.5, all the others are not rule-based classifiers.</p>
                  </tblfn>
               </tbl>
               <p>Regarding the Leukemia dataset 2, each chosen gene induced 3 rules with the following form: if g(x) > a, then class 1; if b &lt; g(x) &#8804; a, then class 2; if g(x) &#8804; b, then class 3. That is, if the expression level of gene x in a sample is relatively high, then the sample is assigned to class 1; if the expression level is moderate, then the sample is assigned to class 2; if the expression level is relatively low, then the sample is assigned to class 3. According to the standard, we predicted the class of every sample based on its expression value on the chosen genes. In total, we identified 21 genes with comparatively strong prediction power. Of these genes, 36239_at (OBF-1) and 31886_at (human placental cDNA coding for 5' nucleotidase) are also contained in the best 26-gene prediction model proposed in <abbrgrp><abbr bid="B85">85</abbr></abbrgrp>. It is noteworthy that OBF-1 was ranked as the top of these 26 genes, and it yields the best prediction outcome in our methods. This finding demonstrates that our decision-rule-based classification approach is superior to the clustering analysis-based classification approach of <abbrgrp><abbr bid="B83">83</abbr></abbrgrp>, as we achieved a similar level of classification performance using just a single gene instead of 26. In addition, six of the genes identified using the present methods are mentioned as high-class discrimination genes in <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. These six genes are OBF-1, CD24, MLCK, KIAA0867, SMARCA4, and cDNA wg66 h09. Indeed, our rules induced by each of the six genes are well in accordance with the outcomes presented in <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, demonstrating that these genes are highly expressed in ALL, moderately expressed in MLL, and expressed at a low level in AML.</p>
               <p>In summary, we have identified some important genes that not only possess potent classification ability but also are closely associated with the pathogenesis of specific or general cancers in every dataset. In the Leukemia dataset 1, significant genes such as Zyxin and MGST1, frequently identified by previous researchers, were also identified in the present study. At the same time, we selected some genes rarely identified by other methods (e.g., TCRA, KIAA0159, and MB-1), which have been proven to correlate directly or indirectly with AML-ALL class prediction. Our results demonstrate that the genes with excellent performance in AML-ALL classification are not only the markers of hematopoietic lineage, but also related to general cancer pathogenesis. Therefore, the genes we have identified, which are useful for AML-ALL classification, are also indicators of cancer pathogenesis and pharmacology. This is consistent with the conclusion of Golub et al. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. In the Lung Cancer dataset, we succeeded in identifying highly discriminative genes (e.g., CALB2, HAS1, and ANXA8) implicated in the pathogenesis of MPM, ADCA, or other tumors. In the Prostate Cancer dataset, we identified some important genes with significant biological relevance, such as TMSB15A, PEDF, hepsin, KIAA0977, and S100A4. In particular, hepsin, which has the highest depended degree, has been reported to have significant involvement in the pathogenesis of prostate cancer. In the Breast Cancer dataset, TSPYL5 was regarded as the most valuable prognostic marker by our methods and by the correlation-based approach used in <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. In the Leukemia dataset 2, we identified OBF-1 and others, which excellently separate ALL, MLL, and AML. Overall, the majority of genes relevant to tumors encode proteins functioning in cell growth, motility and differentiation, apoptosis, angiogenesis, metabolism, chromosomal rearrangement and translocation, and immune reactions.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Microarray-based cancerous gene classification is a particular classification problem: the quantity of features (genes) greatly exceeds the number of instances (samples). As the majority of features are redundant for the classification task, feature selection is of vital importance. At the same time, the discovery of important gene markers relevant to cancer remains a significant task. To this end, we proposed a method of feature selection based on the depended degree of attributes by classes, by which we screened single or double informative genes for classification. We built classifiers on the basis of the decision rules arising from these genes or gene pairs. Using just a small number of features, we gained high-quality solutions to classification problems in the analysis of high-dimensional gene expression data.</p>
         <p>In general, our approach has advantages over other methods. For example, our methods are based on rules. In contrast to non-rule-based methods (e.g., SVMs, ANNs, GAs, <it>k-</it>NNs and NB) rule-based methods are understandable and logical, so that biologists and clinicians are more inclined to adopt them. More importantly, as we utilize very few genes (one or two) to construct classification rules, the derived classifiers are quite simple and easily understood. Hence, our rule-based method has an advantage over other rule-based methods that involve more complicated rules.</p>
         <p>Our work is consistent with the opinion expressed in <abbrgrp><abbr bid="B86">86</abbr><abbr bid="B87">87</abbr></abbrgrp>: simple approaches perform well in microarray-based cancer prediction. This opinion is supportive of the principle of Occam's razor. It is not strange that single or double genes can result in accurate classification of cancer, as the single genes or gene pairs might be the potential biomarkers of cancer <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. In contrast, when complex prediction models achieve highly accurate prediction rates using a large number of genes, it is difficult to assess which genes are the significant biomarkers of cancer. In fact, molecular classification of cancer is a specific classification problem, as it incorporates essential double implications: classification and identifying biomarkers of cancer. Although accurate classification must be guaranteed, the detection of biomarkers is also important, sometimes even more so than accuracy; otherwise, the (accurate) classification results have only limited significance. Because simple classification models may be advantageous in finding important biomarkers with a high classification accuracy, it is worthwhile applying simple prediction approaches rather than complex methods for the molecular classification of cancer. Furthermore, it is better to utilize simple rule-based classification methods because of their interpretability.</p>
         <p>It should be noted that because we only verified the classification accuracy using one independent test set for every dataset, the stability of the classifier was not assessed. That is, if the different training and test sets are chosen, the classification results maybe vary, although not necessarily significantly deviate from our estimates. Therefore, the present classification accuracies only roughly reflect the quality of our classifiers. One more unbiased estimate should be based on the average of the results obtained by repeating the partition of samples between training and test set many times, which is time consuming for our methods.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Our microarray-based cancer classification methods are simple and interpretable relative to most other approaches, since our classifiers are based on decision rules, and the decision rules are based on single or double genes. We demonstrated the efficacy of our methods by their application to several well-known gene expression datasets. In these datasets, our methods identified the single genes or gene pairs that perform well in distinguishing different classes of cancer. Moreover, a large proportion of the genes screened by our methods may have biological relevance to malignancy or cell type, meaning that they can be regarded as candidate biomarkers of cancer.</p>
         <p>Generally speaking, simple classification models are capable of giving good performance in most classification problems, including the molecular classification of cancer, if a small number of features are correctly selected <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B12">12</abbr><abbr bid="B14">14</abbr><abbr bid="B88">88</abbr><abbr bid="B89">89</abbr></abbrgrp>. The present results lend support to this notion. One recommended follow-up study is to combine our methods with other established machine-learning algorithms to address the problem of molecular classification of cancer.</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>LOOCV: Leave-One-Out Cross-Validation; CNS: Central Nervous System; SVMs: Support Vector Machines; DA: Discriminant Analysis; ANNs: Artificial Neural Networks; GAs: Genetic Algorithms; NB: Naive Bayes; <it>k</it>-NNs: <it>k</it>-Nearest Neighbors; EPs: Emerging Patterns; AML: Acute Myeloid Leukemia; ALL: Acute Lymphoblastic Leukemia; MLL: Mixed-Lineage Leukemia; MPM: Malignant Pleural Mesothelioma; ADCA: adenocarcinoma; MDL: Minimum Description Length; PNNs: Probabilistic Neural Networks; CBF: Correlation-Based Feature; MAMA: MAximal MArgin linear programming; RCBT: Refined Classification method Based on Top-<it>k </it>covering rule groups; PLS: Partial Least Squares; LD: Logistic Discrimination; QDA: Quadratic Discriminant Analysis; PCLs: Prediction by Collective Likelihoods.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>XW designed and performed research. XW wrote programming codes and analyzed data. XW wrote the paper. OG first proposed the idea of using the AML-ALL dataset to realize our algorithm. OG helped to draft the manuscript. OG provided helpful instructions in programming and wrote partial codes. Both authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We would like to thank our colleagues, Dr. T. Yada, Dr. N. Ichinose, Dr. S. Park, and Ph.D candidate R. Nakato, for their helpful advice. Particularly, we would like to thank Dr. R. Menezes, Dr. S. Bilke, Dr. A. Sims and Dr. J. Li for their invaluable comments. This work was supported in part by KAKENHI (Grant-in-Aid for Scientific Research) on Priority Areas "Comparative Genomics" awarded by the Ministry of Education, Culture, Sports, Science and Technology of Japan.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Quantitative monitoring of gene expression patterns with a complementary DNA microarray</p>
            </title>
            <aug>
               <au>
                  <snm>Schena</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shalon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <issue>5235</issue>
            <fpage>467</fpage>
            <lpage>470</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7569999</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Molecular classification of cancer: class discovery and class prediction by gene expression monitoring</p>
            </title>
            <aug>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gaasenbeek</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Coller</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Loh</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Downing</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Caligiuri</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Bloomfield</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>286</volume>
            <issue>5439</issue>
            <fpage>531</fpage>
            <lpage>537</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10521349</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Feature selection for high-dimensional genomic microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Xing</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Jordan</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Karp</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Proceedings of the Eighteenth International Conference on Machine Learning: June 28 - July 1 2001; Williams</source>
            <publisher>San Francisco: Morgan Kaufmann Publishers Inc</publisher>
            <editor>Brodley CE, Danyluk AP</editor>
            <pubdate>2001</pubdate>
            <fpage>601</fpage>
            <lpage>608</lpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Induction of decision trees</p>
            </title>
            <aug>
               <au>
                  <snm>Quinlan</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Machine Learning</source>
            <pubdate>1986</pubdate>
            <volume>1</volume>
            <fpage>81</fpage>
            <lpage>106</lpage>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Rough sets</p>
            </title>
            <aug>
               <au>
                  <snm>Pawlak</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>International Journal of Computer and Information Sciences</source>
            <pubdate>1982</pubdate>
            <volume>11</volume>
            <fpage>341</fpage>
            <lpage>356</lpage>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>5</issue>
            <fpage>725</fpage>
            <lpage>734</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12050069</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Efficient gene selection with rough sets from gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Sun</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Miao</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Proceedings of the Third International Conference on Rough Sets and Knowledge Technology: 17-19 May 2008; Chengdu</source>
            <publisher>Berlin/Heidelberg: Springer</publisher>
            <editor>Wang G, Li T, Grzymala-Busse JW, Miao D, Skowron A, Yao Y</editor>
            <pubdate>2008</pubdate>
            <fpage>164</fpage>
            <lpage>171</lpage>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Gene selection using rough set theory</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proceedings of the First International Conference on Rough Sets and Knowledge Technology: 24-26 July 2006; Chongquing</source>
            <publisher>Berlin/Heidelberg: Springer</publisher>
            <editor>Wang G, Peters JF, Skowron A, Yao Y</editor>
            <pubdate>2006</pubdate>
            <fpage>778</fpage>
            <lpage>785</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Reduct generation and classification of gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Momin</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Mitra</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>proceedings of the First International Conference on Hybrid Information Technology: 9-11 November 2006; Jeju Island</source>
            <publisher>Berlin/Heidelberg: Springer</publisher>
            <editor>Szczuka MS, Howard D, Slezak D, Kim HK, Kim TH, Ko IS, Lee G, Sloot PMA</editor>
            <pubdate>2006</pubdate>
            <fpage>699</fpage>
            <lpage>708</lpage>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Evolutinary-rough feature selection in gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Banerjee</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mitra</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Banka</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>IEEE Transaction on Systems, Man, and Cybernetics, Part C: Application and Reviews</source>
            <pubdate>2007</pubdate>
            <issue>37</issue>
            <fpage>622</fpage>
            <lpage>632</lpage>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Ensemble machine learning on gene expression data for cancer classification</p>
            </title>
            <aug>
               <au>
                  <snm>Tan</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Appl Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>2</volume>
            <issue>3 Suppl</issue>
            <fpage>S75</fpage>
            <lpage>83</lpage>
            <xrefbib>
               <pubid idtype="pmpid">15130820</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Downing</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Yeoh</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>1</issue>
            <fpage>71</fpage>
            <lpage>78</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12499295</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Mining top-k covering rule groups for gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Cong</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>K-L</fnm>
               </au>
               <au>
                  <snm>Tung</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>Proceedings of the 24th ACM SIGMOD International Conference on Management of Data: 14-16 June 2005, Baltimore</source>
            <publisher>Association for Computing Machinery</publisher>
            <editor>&#214;zcan F</editor>
            <pubdate>2005</pubdate>
            <fpage>670</fpage>
            <lpage>681</lpage>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Classifying gene expression profiles from pairwise mRNA comparisons</p>
            </title>
            <aug>
               <au>
                  <snm>Geman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>d'Avignon</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Naiman</snm>
                  <fnm>DQ</fnm>
               </au>
               <au>
                  <snm>Winslow</snm>
                  <fnm>RL</fnm>
               </au>
            </aug>
            <source>Stat Appl Genet Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <fpage>Article 19</fpage>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma</p>
            </title>
            <aug>
               <au>
                  <snm>Gordon</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>RV</fnm>
               </au>
               <au>
                  <snm>Hsiao</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Gullans</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Blumenstock</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Ramaswamy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>Sugarbaker</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Bueno</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2002</pubdate>
            <volume>62</volume>
            <issue>17</issue>
            <fpage>4963</fpage>
            <lpage>4967</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12208747</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Rough sets-Theoretical aspects of reasoning about data</p>
            </title>
            <aug>
               <au>
                  <snm>Pawlak</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <publisher>Dordrecht; Boston: Kluwer Academic Publishers</publisher>
            <pubdate>1991</pubdate>
            <volume>9</volume>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Microarray-Based Cancer Prediction Using Soft Computing Approach</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Gotoh</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Cancer Informatics</source>
            <pubdate>2009</pubdate>
            <volume>7</volume>
            <fpage>123</fpage>
            <lpage>139</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2730177</pubid>
                  <pubid idtype="pmpid" link="fulltext">19718448</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Gene expression correlates of clinical prostate cancer behavior</p>
            </title>
            <aug>
               <au>
                  <snm>Singh</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Febbo</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Ross</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jackson</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Manola</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ladd</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Renshaw</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>D'Amico</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Richie</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Loda</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kantoff</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Sellers</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>Cancer Cell</source>
            <pubdate>2002</pubdate>
            <volume>1</volume>
            <issue>2</issue>
            <fpage>203</fpage>
            <lpage>209</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12086878</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Gene expression profiling predicts clinical outcome of breast cancer</p>
            </title>
            <aug>
               <au>
                  <snm>van 't Veer</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Dai</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Vijver</snm>
                  <mnm>van de</mnm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Mao</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Peterse</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Kooy</snm>
                  <mnm>van der</mnm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Marton</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Witteveen</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Schreiber</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Kerkhoven</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Linsley</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Bernards</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Friend</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <issue>6871</issue>
            <fpage>530</fpage>
            <lpage>536</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11823860</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia</p>
            </title>
            <aug>
               <au>
                  <snm>Armstrong</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Staunton</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Silverman</snm>
                  <fnm>LB</fnm>
               </au>
               <au>
                  <snm>Pieters</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>den Boer</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Minden</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Sallan</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Korsmeyer</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <issue>1</issue>
            <fpage>41</fpage>
            <lpage>47</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11731795</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Multi-interval discretization of continuous-valued attributes for classification learning</p>
            </title>
            <aug>
               <au>
                  <snm>Fayyad</snm>
                  <fnm>UM</fnm>
               </au>
               <au>
                  <snm>Irani</snm>
                  <fnm>KB</fnm>
               </au>
            </aug>
            <source>Proceedings of the 13th International Joint Conference of Artificial Intelligence: August 28-September 3 1993; Chamb&#233;ry</source>
            <editor>Ruzena B, Morgan Kaufmann</editor>
            <pubdate>1993</pubdate>
            <fpage>1022</fpage>
            <lpage>1027</lpage>
         </bibl>
         <bibl id="B22">
            <aug>
               <au>
                  <snm>Witten</snm>
                  <fnm>IH</fnm>
               </au>
               <au>
                  <snm>Frank</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Data mining: practical machine learning tools and techniques</source>
            <publisher>San Francisco: Morgan Kaufmann</publisher>
            <edition>second</edition>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Application of probabilistic neural networks to the class prediction of leukemia and embryonal tumor of central nervous system</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Liao</snm>
                  <fnm>WC</fnm>
               </au>
            </aug>
            <source>Neural Processing Letters</source>
            <pubdate>2004</pubdate>
            <volume>19</volume>
            <fpage>211</fpage>
            <lpage>226</lpage>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Gene selection from microarray data for cancer classification--a machine learning approach</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tetko</snm>
                  <fnm>IV</fnm>
               </au>
               <au>
                  <snm>Hall</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Frank</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Facius</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mayer</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Mewes</snm>
                  <fnm>HW</fnm>
               </au>
            </aug>
            <source>Comput Biol Chem</source>
            <pubdate>2005</pubdate>
            <volume>29</volume>
            <issue>1</issue>
            <fpage>37</fpage>
            <lpage>46</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15680584</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Optimization models for cancer classification: extracting gene interaction information from microarray expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Antonov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Tetko</snm>
                  <fnm>IV</fnm>
               </au>
               <au>
                  <snm>Mader</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Budczies</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mewes</snm>
                  <fnm>HW</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>5</issue>
            <fpage>644</fpage>
            <lpage>652</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15033871</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Tumor classification by partial least squares using microarray gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>DV</fnm>
               </au>
               <au>
                  <snm>Rocke</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>1</issue>
            <fpage>39</fpage>
            <lpage>50</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11836210</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Support vector machine classification and validation of cancer tissue samples using microarray expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Cristianini</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Duffy</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bednarski</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Schummer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <issue>10</issue>
            <fpage>906</fpage>
            <lpage>914</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11120680</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Choosing multiple parameters for support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Chapelle</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Vapnik</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Bousquet</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Mukherjee</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Machine Learning</source>
            <pubdate>2002</pubdate>
            <volume>46</volume>
            <fpage>131</fpage>
            <lpage>159</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Feature selection for SVMs</p>
            </title>
            <aug>
               <au>
                  <snm>Weston</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Muckerjee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chapelle</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Pontil</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Poggio</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vapnik</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Neural Information Processing Systems</source>
            <pubdate>2002</pubdate>
            <volume>13</volume>
            <fpage>668</fpage>
            <lpage>674</lpage>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Gene selection for cancer classification using support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Guyon</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Weston</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barnhill</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vapnik</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Machine Learning</source>
            <pubdate>2002</pubdate>
            <volume>46</volume>
            <fpage>389</fpage>
            <lpage>422</lpage>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Identification of a gene expression signature associated with pediatric AML prognosis</p>
            </title>
            <aug>
               <au>
                  <snm>Yagi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Morimoto</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eguchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hibi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sako</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ishii</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mizutani</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Imashuku</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ohki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ichikawa</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Blood</source>
            <pubdate>2003</pubdate>
            <volume>102</volume>
            <issue>5</issue>
            <fpage>1849</fpage>
            <lpage>1856</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12738660</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The actin cytoskeleton-associated protein zyxin acts as a tumor suppressor in Ewing tumor cells</p>
            </title>
            <aug>
               <au>
                  <snm>Amsellem</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Kryszke</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Hervy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Subra</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Athman</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Leh</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Brachet-Ducos</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Auclair</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Experimental Cell Research</source>
            <pubdate>2005</pubdate>
            <volume>304</volume>
            <fpage>443</fpage>
            <lpage>456</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15748890</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Reliable classification of two class cancer data using evolutionary algorithms</p>
            </title>
            <aug>
               <au>
                  <snm>Deb</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Reddy</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Bio Systems</source>
            <pubdate>2003</pubdate>
            <fpage>111</fpage>
            <lpage>129</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14642662</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Classification gene expression data of cancer using classifier ensemble with mutually exclusive features</p>
            </title>
            <aug>
               <au>
                  <snm>Cho</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Ryu</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proceedings of the IEEE</source>
            <pubdate>2002</pubdate>
            <volume>90</volume>
            <issue>11</issue>
            <fpage>1744</fpage>
            <lpage>1753</lpage>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Gene discovery in leukemia revisited: a computational intelligence perspective</p>
            </title>
            <aug>
               <au>
                  <snm>Valdes</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Proceedings of the 17th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems: 17-20 May 2004; Ottawa</source>
            <publisher>Berlin/Heidelberg: Springer</publisher>
            <editor>Orchard R, Yang C, Ali M</editor>
            <pubdate>2004</pubdate>
            <fpage>118</fpage>
            <lpage>127</lpage>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Role of Zyxin in differential cell spreading and proliferation of melanoma cells and melanocytes</p>
            </title>
            <aug>
               <au>
                  <snm>Gaag</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Leccia</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Dekker</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Jalbert</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Amodeo</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Byers</snm>
                  <fnm>HR</fnm>
               </au>
            </aug>
            <source>Journal of Investigative Dermatology</source>
            <pubdate>2002</pubdate>
            <volume>118</volume>
            <fpage>246</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11841540</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Myelodysplastic syndrome, juvenile myelomonocytic leukemia, and acute myeloid leukemia associated with complete or partial monosomy 7. European Working Group on MDS in Childhood (EWOG-MDS)</p>
            </title>
            <aug>
               <au>
                  <snm>Hasle</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Arico</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Basso</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Biondi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cantu Rajnoldi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Creutzig</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Fenu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fonatsch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Haas</snm>
                  <fnm>OA</fnm>
               </au>
               <au>
                  <snm>Harbott</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kardos</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kerndrup</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mann</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Niemeyer</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Ptoszkova</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ritter</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Slater</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Stary</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stollmann-Gibbels</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Testi</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>van Wering</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Zimmermann</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Leukemia</source>
            <pubdate>1999</pubdate>
            <volume>13</volume>
            <issue>3</issue>
            <fpage>376</fpage>
            <lpage>385</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10086728</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Cytogenetic deletion maps of hematologic neoplasms: circumstantial evidence for tumor suppressor loci</p>
            </title>
            <aug>
               <au>
                  <snm>Johansson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mertens</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Mitelman</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Genes Chromosomes Cancer</source>
            <pubdate>1993</pubdate>
            <volume>8</volume>
            <issue>4</issue>
            <fpage>205</fpage>
            <lpage>218</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7512363</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Abnormalities at 14q32.1 in T cell malignancies involve two oncogenes</p>
            </title>
            <aug>
               <au>
                  <snm>Pekarsky</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hallas</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Isobe</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Russo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Croce</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <issue>6</issue>
            <fpage>2949</fpage>
            <lpage>2951</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">15875</pubid>
                  <pubid idtype="pmpid" link="fulltext">10077617</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The role of TCL1 in human T-cell leukemia</p>
            </title>
            <aug>
               <au>
                  <snm>Pekarsky</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hallas</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Croce</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Oncogene</source>
            <pubdate>2001</pubdate>
            <volume>20</volume>
            <issue>40</issue>
            <fpage>5638</fpage>
            <lpage>5643</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11607815</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Identification of the TCL1 gene involved in T-cell malignancies</p>
            </title>
            <aug>
               <au>
                  <snm>Virgilio</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Narducci</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Isobe</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Billips</snm>
                  <fnm>LG</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Croce</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Russo</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1994</pubdate>
            <volume>91</volume>
            <issue>26</issue>
            <fpage>12530</fpage>
            <lpage>12534</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">45472</pubid>
                  <pubid idtype="pmpid">7809072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Oncogene activation by chromosome translocation in human malignancy</p>
            </title>
            <aug>
               <au>
                  <snm>Haluska</snm>
                  <fnm>FG</fnm>
               </au>
               <au>
                  <snm>Tsujimoto</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Croce</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>1987</pubdate>
            <volume>21</volume>
            <fpage>321</fpage>
            <lpage>345</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3327468</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Human LPP gene is fused to MLL in a secondary acute leukemia with a t(3;11) (q28;q23)</p>
            </title>
            <aug>
               <au>
                  <snm>Daheron</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Veinstein</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brizard</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Drabkin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lacotte</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Guilhot</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Larsen</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Brizard</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Roche</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genes Chromosomes Cancer</source>
            <pubdate>2001</pubdate>
            <volume>31</volume>
            <issue>4</issue>
            <fpage>382</fpage>
            <lpage>389</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11433529</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Role of topoisomerase II in mediating epipodophyllotoxin-induced DNA cleavage</p>
            </title>
            <aug>
               <au>
                  <snm>Ross</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Rowe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Glisson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Yalowich</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>1984</pubdate>
            <volume>44</volume>
            <issue>12 Pt 1</issue>
            <fpage>5857</fpage>
            <lpage>5860</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">6094001</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Somatic hypermutation of the B cell receptor genes B29 (Igbeta, CD79b) and mb1 (Igalpha, CD79a)</p>
            </title>
            <aug>
               <au>
                  <snm>Gordon</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Kanegai</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Doerr</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Wall</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <issue>7</issue>
            <fpage>4126</fpage>
            <lpage>4131</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">153059</pubid>
                  <pubid idtype="pmpid" link="fulltext">12651942</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>CytCD79a expression in acute leukemia with t(8;21): biphenotypic or myeloid leukemia?</p>
            </title>
            <aug>
               <au>
                  <snm>He</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Xue</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Jin</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Qiu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Miao</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Cancer Genet Cytogenet</source>
            <pubdate>2007</pubdate>
            <volume>174</volume>
            <issue>1</issue>
            <fpage>76</fpage>
            <lpage>77</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17350472</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Lower levels of surface B-cell-receptor expression in chronic lymphocytic leukemia are associated with glycosylation and folding defects of the mu and CD79a chains</p>
            </title>
            <aug>
               <au>
                  <snm>Vuillier</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Dumas</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Magnac</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Prevost</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Lalanne</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Oppezzo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Melanitou</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dighiero</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Payelle-Brogard</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Blood</source>
            <pubdate>2005</pubdate>
            <volume>105</volume>
            <issue>7</issue>
            <fpage>2933</fpage>
            <lpage>2940</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15591116</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>[Prognostic effect of cytoplasmic CD79a expression in acute myeloid leukemia with t(8;21)]</p>
            </title>
            <aug>
               <au>
                  <snm>Chung</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Chi</snm>
                  <fnm>HS</fnm>
               </au>
               <au>
                  <snm>Cho</snm>
                  <fnm>YU</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Jang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Seo</snm>
                  <fnm>EJ</fnm>
               </au>
            </aug>
            <source>Korean J Lab Med</source>
            <pubdate>2007</pubdate>
            <volume>27</volume>
            <issue>6</issue>
            <fpage>388</fpage>
            <lpage>393</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18160827</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Activation of Rac1 by RhoG regulates cell migration</p>
            </title>
            <aug>
               <au>
                  <snm>Katoh</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hiramoto</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Negishi</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Cell Sci</source>
            <pubdate>2006</pubdate>
            <volume>119</volume>
            <issue>Pt 1</issue>
            <fpage>56</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16339170</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Using rules to analyse bio-medical data: a comparison between C4.5 and PCL</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Proceeding of the Fourth International Conference on Web-Age Information Management: 17-19 August 2003; Chengdu</source>
            <publisher>Berlin/Heidelberg: Springer</publisher>
            <editor>Dong G, Tang C, Wang W</editor>
            <pubdate>2003</pubdate>
            <fpage>254</fpage>
            <lpage>265</lpage>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Using gene expression ratios to predict outcome among patients with mesothelioma</p>
            </title>
            <aug>
               <au>
                  <snm>Gordon</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>RV</fnm>
               </au>
               <au>
                  <snm>Hsiao</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Gullans</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Blumenstock</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>Jaklitsch</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Sugarbaker</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Bueno</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Natl Cancer Inst</source>
            <pubdate>2003</pubdate>
            <volume>95</volume>
            <issue>8</issue>
            <fpage>598</fpage>
            <lpage>605</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12697852</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Identification of novel candidate oncogenes and tumor suppressors in malignant pleural mesothelioma using large-scale transcriptional profiling</p>
            </title>
            <aug>
               <au>
                  <snm>Gordon</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Rockwell</snm>
                  <fnm>GN</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>RV</fnm>
               </au>
               <au>
                  <snm>Rheinwald</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Glickman</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Aronson</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Pottorf</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Nitz</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>Sugarbaker</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Bueno</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Am J Pathol</source>
            <pubdate>2005</pubdate>
            <volume>166</volume>
            <issue>6</issue>
            <fpage>1827</fpage>
            <lpage>1840</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1363736</pubid>
                  <pubid idtype="pmpid" link="fulltext">15920167</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Thyroid transcription factor 1 is an independent prognostic factor for patients with stage I lung adenocarcinoma</p>
            </title>
            <aug>
               <au>
                  <snm>Anagnostou</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Syrigos</snm>
                  <fnm>KN</fnm>
               </au>
               <au>
                  <snm>Bepler</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Homer</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Rimm</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>J Clin Oncol</source>
            <pubdate>2009</pubdate>
            <volume>27</volume>
            <issue>2</issue>
            <fpage>271</fpage>
            <lpage>278</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">19064983</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Expression of thyroid transcription factor-1 in the spectrum of neuroendocrine cell lung proliferations with special interest in carcinoids</p>
            </title>
            <aug>
               <au>
                  <snm>Sturm</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Rossi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lantuejoul</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Papotti</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Frachon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Claraz</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Brichon</snm>
                  <fnm>PY</fnm>
               </au>
               <au>
                  <snm>Brambilla</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Brambilla</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Hum Pathol</source>
            <pubdate>2002</pubdate>
            <volume>33</volume>
            <issue>2</issue>
            <fpage>175</fpage>
            <lpage>182</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11957142</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Mutational analysis of thyroid transcription factor-1 gene (TTF-1) in lung carcinomas</p>
            </title>
            <aug>
               <au>
                  <snm>Bai</snm>
                  <fnm>XY</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>In Vitro Cell Dev Biol Anim</source>
            <pubdate>2008</pubdate>
            <volume>44</volume>
            <issue>1-2</issue>
            <fpage>17</fpage>
            <lpage>25</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18071837</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Genomic profiling identifies TITF1 as a lineage-specific oncogene amplified in lung cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Kwei</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Girard</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kao</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pacyna-Gengelbach</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Salari</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>YL</fnm>
               </au>
               <au>
                  <snm>Sato</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hernandez-Boussard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gazdar</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Petersen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Minna</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Pollack</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Oncogene</source>
            <pubdate>2008</pubdate>
            <volume>27</volume>
            <issue>25</issue>
            <fpage>3635</fpage>
            <lpage>3640</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18212743</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Thymosin beta 15: a novel regulator of tumor cell motility upregulated in metastatic prostate cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Bao</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Loda</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Janmey</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Anand-Apte</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Zetter</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Nat Med</source>
            <pubdate>1996</pubdate>
            <volume>2</volume>
            <issue>12</issue>
            <fpage>1322</fpage>
            <lpage>1328</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8946830</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Expression of pigment epithelium-derived factor decreases liver metastasis and correlates with favorable prognosis for patients with ductal pancreatic adenocarcinoma</p>
            </title>
            <aug>
               <au>
                  <snm>Uehara</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Miyamoto</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kato</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ebihara</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kaneko</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hashimoto</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Murakami</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hase</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Takahashi</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Mega</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shichinohe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kawarada</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Okushiba</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kondo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Katoh</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2004</pubdate>
            <volume>64</volume>
            <issue>10</issue>
            <fpage>3533</fpage>
            <lpage>3537</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15150108</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>In vivo gene transfer of pigment epithelium-derived factor inhibits tumor growth in syngeneic murine models of thoracic malignancies</p>
            </title>
            <aug>
               <au>
                  <snm>Mahtabifard</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Merritt</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Yamada</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Crystal</snm>
                  <fnm>RG</fnm>
               </au>
               <au>
                  <snm>Korst</snm>
                  <fnm>RJ</fnm>
               </au>
            </aug>
            <source>J Thorac Cardiovasc Surg</source>
            <pubdate>2003</pubdate>
            <volume>126</volume>
            <issue>1</issue>
            <fpage>28</fpage>
            <lpage>38</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12878936</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Expression of pigment epithelial derived factor is reduced in non-small cell lung cancer and is linked to clinical outcome</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ke</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Mansel</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>WG</fnm>
               </au>
            </aug>
            <source>Int J Mol Med</source>
            <pubdate>2006</pubdate>
            <volume>17</volume>
            <issue>5</issue>
            <fpage>937</fpage>
            <lpage>944</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16596284</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Pigment epithelium-derived factor overexpression inhibits orthotopic osteosarcoma growth, angiogenesis and metastasis</p>
            </title>
            <aug>
               <au>
                  <snm>Ek</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Dass</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Contreras</snm>
                  <fnm>KG</fnm>
               </au>
               <au>
                  <snm>Choong</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Cancer Gene Ther</source>
            <pubdate>2007</pubdate>
            <volume>14</volume>
            <issue>7</issue>
            <fpage>616</fpage>
            <lpage>626</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17479108</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Involvement of the collagen I-binding motif in the anti-angiogenic activity of pigment epithelium-derived factor</p>
            </title>
            <aug>
               <au>
                  <snm>Hosomichi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yasui</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Koide</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Soma</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Morita</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Biochem Biophys Res Commun</source>
            <pubdate>2005</pubdate>
            <volume>335</volume>
            <issue>3</issue>
            <fpage>756</fpage>
            <lpage>761</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16102727</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Methylation-associated silencing of TU3A in human cancers</p>
            </title>
            <aug>
               <au>
                  <snm>Awakura</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ito</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kamoto</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ogawa</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Int J Oncol</source>
            <pubdate>2008</pubdate>
            <volume>33</volume>
            <issue>4</issue>
            <fpage>893</fpage>
            <lpage>899</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18813805</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>Down regulation of 3p genes, LTF, SLC38A3 and DRR1, upon growth of human chromosome 3-mouse fibrosarcoma hybrids in severe combined immunodeficiency mice</p>
            </title>
            <aug>
               <au>
                  <snm>Kholodnyuk</snm>
                  <fnm>ID</fnm>
               </au>
               <au>
                  <snm>Kozireva</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kost-Alimova</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kashuba</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Klein</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Imreh</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Int J Cancer</source>
            <pubdate>2006</pubdate>
            <volume>119</volume>
            <issue>1</issue>
            <fpage>99</fpage>
            <lpage>107</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16432833</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Laminin-332 is a substrate for hepsin, a protease associated with prostate cancer progression</p>
            </title>
            <aug>
               <au>
                  <snm>Tripathi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nandana</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ganesan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kirchhofer</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Quaranta</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2008</pubdate>
            <volume>283</volume>
            <issue>45</issue>
            <fpage>30576</fpage>
            <lpage>30584</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2576550</pubid>
                  <pubid idtype="pmpid" link="fulltext">18784072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Delineation of prognostic biomarkers in prostate cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Dhanasekaran</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Barrette</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Ghosh</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Shah</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Varambally</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kurachi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pienta</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Chinnaiyan</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>412</volume>
            <issue>6849</issue>
            <fpage>822</fpage>
            <lpage>826</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11518967</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Molecular genetic profiling of Gleason grade 4/5 prostate cancers compared to benign prostatic hyperplasia</p>
            </title>
            <aug>
               <au>
                  <snm>Stamey</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Warrington</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Caldwell</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Fan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Mahadevappa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>McNeal</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Nolley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>J Urol</source>
            <pubdate>2001</pubdate>
            <volume>166</volume>
            <issue>6</issue>
            <fpage>2171</fpage>
            <lpage>2177</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11696729</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Expression profiling reveals hepsin overexpression in prostate cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Magee</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Araki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Patil</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ehrig</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>True</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Humphrey</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Catalona</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Watson</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Milbrandt</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2001</pubdate>
            <volume>61</volume>
            <issue>15</issue>
            <fpage>5692</fpage>
            <lpage>5696</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11479199</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Welsh</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Sapinoso</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Su</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Kern</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Wang-Rodriguez</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Moskaluk</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Frierson</snm>
                  <fnm>HF</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Hampton</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2001</pubdate>
            <volume>61</volume>
            <issue>16</issue>
            <fpage>5974</fpage>
            <lpage>5978</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11507037</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>Hepsin is highly over expressed in and a new candidate for a prognostic indicator in prostate cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Stephan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yousef</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Scorilas</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Jung</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jung</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kristiansen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hauptmann</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kishi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Loening</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Diamandis</snm>
                  <fnm>EP</fnm>
               </au>
            </aug>
            <source>J Urol</source>
            <pubdate>2004</pubdate>
            <volume>171</volume>
            <issue>1</issue>
            <fpage>187</fpage>
            <lpage>191</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14665873</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Use of multiple biomarkers for a molecular diagnosis of prostate cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Landers</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Burger</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Tebay</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Purdie</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Scells</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Samaratunga</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lavin</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Gardiner</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Int J Cancer</source>
            <pubdate>2005</pubdate>
            <volume>114</volume>
            <issue>6</issue>
            <fpage>950</fpage>
            <lpage>956</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15609297</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Hepsin promotes prostate cancer progression and metastasis</p>
            </title>
            <aug>
               <au>
                  <snm>Klezovitch</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Chevillet</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mirosevich</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Matusik</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Vasioukhin</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Cancer Cell</source>
            <pubdate>2004</pubdate>
            <volume>6</volume>
            <issue>2</issue>
            <fpage>185</fpage>
            <lpage>195</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15324701</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>Activation of NF-kappaB by extracellular S100A 4: analysis of signal transduction mechanisms and identification of target genes</p>
            </title>
            <aug>
               <au>
                  <snm>Boye</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Grotterod</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Aasheim</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Hovig</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Maelandsmo</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Int J Cancer</source>
            <pubdate>2008</pubdate>
            <volume>123</volume>
            <issue>6</issue>
            <fpage>1301</fpage>
            <lpage>1310</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18548584</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>S100A4, a mediator of metastasis</p>
            </title>
            <aug>
               <au>
                  <snm>Garrett</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Varney</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Bresnick</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2006</pubdate>
            <volume>281</volume>
            <issue>2</issue>
            <fpage>677</fpage>
            <lpage>680</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16243835</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>Methionine aminopeptidase 2 is a new target for the metastasis-associated protein, S100A4</p>
            </title>
            <aug>
               <au>
                  <snm>Endo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Takenaga</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kanno</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Satoh</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mori</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2002</pubdate>
            <volume>277</volume>
            <issue>29</issue>
            <fpage>26396</fpage>
            <lpage>26402</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11994292</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B76">
            <title>
               <p>DNA methylation and immunohistochemical analysis of the S100A4 calcium binding protein in human prostate cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Rehman</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Goodarzi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cross</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Leiblich</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Catto</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Hamdy</snm>
                  <fnm>FC</fnm>
               </au>
            </aug>
            <source>Prostate</source>
            <pubdate>2007</pubdate>
            <volume>67</volume>
            <issue>4</issue>
            <fpage>341</fpage>
            <lpage>347</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17219414</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>Expression of calcium-binding proteins S100A2 and S100A4 in Barrett's adenocarcinomas</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>OJ</fnm>
               </au>
               <au>
                  <snm>Hong</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Razvi</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Peng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Powell</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Smoklin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Moskaluk</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>El-Rifai</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Neoplasia</source>
            <pubdate>2006</pubdate>
            <volume>8</volume>
            <issue>10</issue>
            <fpage>843</fpage>
            <lpage>850</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1715926</pubid>
                  <pubid idtype="pmpid" link="fulltext">17032501</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B78">
            <title>
               <p>Overexpression of S100A4 in pancreatic ductal adenocarcinomas is associated with poor differentiation and DNA hypomethylation</p>
            </title>
            <aug>
               <au>
                  <snm>Rosty</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ueki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Argani</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Jansen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yeo</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Cameron</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Hruban</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Goggins</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Am J Pathol</source>
            <pubdate>2002</pubdate>
            <volume>160</volume>
            <issue>1</issue>
            <fpage>45</fpage>
            <lpage>50</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1867115</pubid>
                  <pubid idtype="pmpid" link="fulltext">11786397</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B79">
            <title>
               <p>S100A4 contributes to the suppression of BNIP3 expression, chemoresistance, and inhibition of apoptosis in pancreatic cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Mahon</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Baril</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bhakta</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Chelala</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Caulee</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Harada</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lemoine</snm>
                  <fnm>NR</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2007</pubdate>
            <volume>67</volume>
            <issue>14</issue>
            <fpage>6786</fpage>
            <lpage>6795</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17638890</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B80">
            <title>
               <p>Overexpression of S100A4 is closely associated with progression of colorectal cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Cho</snm>
                  <fnm>YG</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Nam</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Yoon</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>JY</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>WS</fnm>
               </au>
            </aug>
            <source>World J Gastroenterol</source>
            <pubdate>2005</pubdate>
            <volume>11</volume>
            <issue>31</issue>
            <fpage>4852</fpage>
            <lpage>4856</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16097057</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B81">
            <title>
               <p>Differential expression of S100A2 and S100A4 in lung adenocarcinomas: clinicopathological significance, relationship to p53 and identification of their target genes</p>
            </title>
            <aug>
               <au>
                  <snm>Matsubara</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Niki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ishikawa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Goto</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ohara</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Yokomizo</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Heizmann</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Aburatani</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Moriyama</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Moriyama</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nishimura</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Funata</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Fukayama</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Cancer Sci</source>
            <pubdate>2005</pubdate>
            <volume>96</volume>
            <issue>12</issue>
            <fpage>844</fpage>
            <lpage>857</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16367903</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B82">
            <title>
               <p>Poor outcome of patients with pulmonary adenocarcinoma showing decreased E-cadherin combined with increased S100A4 expression</p>
            </title>
            <aug>
               <au>
                  <snm>Miyazaki</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Abe</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Oida</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Suemizu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nishi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yamazaki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Iwasaki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Inoue</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ueyama</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Int J Oncol</source>
            <pubdate>2006</pubdate>
            <volume>28</volume>
            <issue>6</issue>
            <fpage>1369</fpage>
            <lpage>1374</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16685438</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B83">
            <title>
               <p>Prognostic predictor with multiple fuzzy neural models using expression profiles from DNA microarray for metastases of breast cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Takahashi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Masuda</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ando</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kobayashi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Honda</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Biosci Bioeng</source>
            <pubdate>2004</pubdate>
            <volume>98</volume>
            <issue>3</issue>
            <fpage>193</fpage>
            <lpage>199</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16233689</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B84">
            <title>
               <p>Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Alon</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Barkai</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Notterman</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ybarra</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mack</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <issue>12</issue>
            <fpage>6745</fpage>
            <lpage>6750</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">21986</pubid>
                  <pubid idtype="pmpid" link="fulltext">10359783</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B85">
            <title>
               <p>HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Makedon</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Ford</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Pearlman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>8</issue>
            <fpage>1530</fpage>
            <lpage>1537</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15585531</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B86">
            <title>
               <p>A protocol for building and evaluating predictors of disease state based on microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Wessels</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Reinders</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Veenman</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Dai</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>van't Veer</snm>
                  <fnm>LJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>19</issue>
            <fpage>3755</fpage>
            <lpage>3762</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15817694</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B87">
            <title>
               <p>Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification</p>
            </title>
            <aug>
               <au>
                  <snm>Simon</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Radmacher</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Dobbin</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>McShane</snm>
                  <fnm>LM</fnm>
               </au>
            </aug>
            <source>J Natl Cancer Inst</source>
            <pubdate>2003</pubdate>
            <volume>95</volume>
            <issue>1</issue>
            <fpage>14</fpage>
            <lpage>18</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12509396</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B88">
            <title>
               <p>Supervised analysis when the number of candidate feature (p) greatly exceeds the number of cases (n)</p>
            </title>
            <aug>
               <au>
                  <snm>Simon</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>ACM SIGKDD Explorations Newsletter</source>
            <pubdate>2003</pubdate>
            <volume>5</volume>
            <issue>2</issue>
            <fpage>31</fpage>
            <lpage>36</lpage>
         </bibl>
         <bibl id="B89">
            <title>
               <p>Very simple classification rules perform well on most commonly used datasets</p>
            </title>
            <aug>
               <au>
                  <snm>Holte</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Machine Learning</source>
            <pubdate>1993</pubdate>
            <fpage>63</fpage>
            <lpage>91</lpage>
         </bibl>
      </refgrp>
      <sec>
         <st>
            <p>Pre-publication history</p>
         </st>
         <p>The pre-publication history for this paper can be accessed here:</p>
         <p>
            <url>http://www.biomedcentral.com/1755-8794/2/64/prepub</url>
         </p>
      </sec>
   </bm>
</art>

