<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1755-8794-6-S1-S3</ui><ji>1755-8794</ji><fm>
<dochead>Research</dochead>
<bibl>
<title>
<p>TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection</p>
</title>
<aug>
<au ce="yes" id="A1"><snm>Wang</snm><fnm>Haiyan</fnm><insr iid="I1"/><email>hwang@ksu.edu</email></au>
<au ce="yes" id="A2"><snm>Zhang</snm><fnm>Hongyan</fnm><insr iid="I2"/><insr iid="I3"/><insr iid="I4"/><email>hongyan_zhang6@yahoo.com.cn</email></au>
<au id="A3"><snm>Dai</snm><fnm>Zhijun</fnm><insr iid="I2"/><insr iid="I4"/><email>daizhijun@foxmail.com</email></au>
<au id="A4"><snm>Chen</snm><fnm>Ming-shun</fnm><insr iid="I5"/><email>mchen@ksu.edu</email></au>
<au ca="yes" id="A5"><snm>Yuan</snm><fnm>Zheming</fnm><insr iid="I2"/><insr iid="I4"/><email>zhmyuan@sina.com</email></au>
</aug>
<insg>
<ins id="I1"><p>Department of Statistics, Kansas State University, Manhattan, KS 66506, USA; this work was done while Haiyan Wang was on sabbatical leave at Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Changsha 410128, China</p></ins>
<ins id="I2"><p>Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Changsha 410128, China</p></ins>
<ins id="I3"><p>College of Information Science and Technology, Hunan Agricultural University, Changsha 410128, China</p></ins>
<ins id="I4"><p>College of Bio-safety Science and Technology, Hunan Agricultural University, Changsha 410128, China</p></ins>
<ins id="I5"><p>USDA-ARS and Department of Entomology, Kansas State University, Manhattan, KS 66506, USA</p></ins>
</insg>
<source>BMC Medical Genomics</source>


<supplement><title><p>Proceedings of the 2011 International Conference on Bioinformatics and Computational Biology (BIOCOMP'11)</p></title><editor>Ke K Zhang, Hamid R Arabnia and Mehdi Pirooznia</editor><sponsor><note>Publication of this supplement has been supported by the International Society of Intelligent Biological Medicine.</note></sponsor><note>Research</note></supplement><conference><title><p>The 2011 International Conference on Bioinformatics and Computational Biology (BIOCOMP'11)</p></title><location>Las Vegas, NV, USA</location><date-range>18-21 July 2011</date-range><url>http://www.world-academy-of-science.org/worldcomp11/ws/conferences/biocomp11</url></conference><issn>1755-8794</issn>
<pubdate>2013</pubdate>
<volume>6</volume>
<issue>Suppl 1</issue>
<fpage>S3</fpage>
<url>http://www.biomedcentral.com/1755-8794/6/S1/S3</url>
<xrefbib><pubidlist><pubid idtype="pmpid">23445528</pubid><pubid idtype="doi">10.1186/1755-8794-6-S1-S3</pubid></pubidlist></xrefbib>
</bibl>
<history><pub><date><day>23</day><month>1</month><year>2013</year></date></pub></history>
<cpyrt><year>2013</year><collab>Yuan; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>One of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair (TSP), k-Top Scoring Pairs (k-TSP), Support Vector Machines (SVM), and prediction analysis of microarrays (PAM) are four popular classifiers that have comparable performance on multiple cancer datasets. SVM and PAM tend to use a large number of genes and TSP, k-TSP always use even number of genes. In addition, the selection of distinct gene pairs in k-TSP simply combined the pairs of top ranking genes without considering the fact that the gene set with best discrimination power may not be the combined pairs. The k-TSP algorithm also needs the user to specify an upper bound for the number of gene pairs. Here we introduce a computational algorithm to address the problems. The algorithm is named Chisquare-statistic-based Top Scoring Genes (Chi-TSG) classifier simplified as TSG.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>The TSG classifier starts with the top two genes and sequentially adds additional gene into the candidate gene set to perform informative gene selection. The algorithm automatically reports the total number of informative genes selected with cross validation. We provide the algorithm for both binary and multi-class cancer classification. The algorithm was applied to 9 binary and 10 multi-class gene expression datasets involving human cancers. The TSG classifier outperforms TSP family classifiers by a big margin in most of the 19 datasets. In addition to improved accuracy, our classifier shares all the advantages of the TSP family classifiers including easy interpretation, invariant to monotone transformation, often selects a small number of informative genes allowing follow-up studies, resistant to sampling variations due to within sample operations.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Redefining the scores for gene set and the classification rules in TSP family classifiers by incorporating the sample size information can lead to better selection of informative genes and classification accuracy. The resulting TSG classifier offers a useful tool for cancer classification based on numerical molecular data.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>With the availability of high throughput genomics data, methods for cancer class classification and prediction based on molecular information have been vigorously pursued in recent years. The objective of this study is to find important molecular markers and/or build a classifier such that the classifier with selected markers as the independent variables can accurately classify the diagnostic disease status of a sample using expression data. Popular methods for this problem include Prediction Analysis of Microarrays (PAM, <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>), Top Scoring Pair (TSP, <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>), k-Top Scoring Pair (k-TSP, <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>), Support Vector Machine (SVM, <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>) etc. There are also many other endeavors such as individual-gene-ranking by evaluating the discriminating power of classes (see <abbrgrp>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
</abbrgrp> and the references therein), gene filtering through relevance and correlation analyses <abbrgrp>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
</abbrgrp>, gene selection for classification based on the Bayes error <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>, comparing the distributions of within-class correlations with between-class correlations via Kullback-Leibler divergence <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp>, recursive feature addition with Lagging Prediction Peephole Optimization to choose the final optimal marker set <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp>, SVM based recursive feature elimination <abbrgrp>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
</abbrgrp>, random forests <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp> and random subspace search <abbrgrp>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
</abbrgrp>, among others.</p>
<p>There are a few challenges associated with such study. One of them is that the number of independent variables (markers) is typically much more than the number of available samples, often referred as curse of dimensionality. To identify possibly nonlinear effects of many variables and their interactions, it is often necessary to estimate a large number of modeling parameters. A direct consequence of the curse of dimensionality is that the total number of parameters that the data can estimate is restricted by the number of the samples. When the total number of parameters greatly exceeds the number of samples, overfitting occurs such that the prediction of the phenotype works well for the learning data but the performance of the classifier applied to independent test samples exhibit poor classification accuracy. The informative marker selection process unfortunately needs to consider modeling with each possible combination of all markers in order to find the globally best marker set, which has the best discriminating power for the different disease categories and may or may not be the primary biological and pathological driving factors underlying disease progression. Hence, an effective practice is to first reduce the dimensionality of the marker space.</p>
<p>The TSP and k-TSP classifiers are two simple algorithms that select gene pairs with top scores to build classifiers. They were shown to perform well for binary classification with gene expression data <abbrgrp>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
</abbrgrp>. The gene pairs were selected based on simple pairwise comparisons between two marker expression levels within the same sample. Specifically, let <it>p<sub>ij</sub>(C<sub>1</sub>) </it>be the percentage of training samples in class 1 that the expression of one marker is less than that of the other marker in the same sample and let <it>p<sub>ij</sub>(C<sub>2</sub>) </it>be similarly defined. The score for a gene pair is defined as the estimated difference between the two percentages <it>p<sub>ij</sub>(C<sub>1</sub>) - p<sub>ij</sub>(C<sub>2</sub>)</it>. Then the gene pair that received the highest score is selected as the marker set for TSP classifier and the top k gene pairs with highest scores are used for the k-TSP classifier. Tan et al. <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp> extended the two classifiers to multi-class classification through one-vs-others, one-vs-one, and hierarchical classification (HC) schemes. They reported that the HC schemes for TSP and k-TSP gave better performance than the other two schemes.</p>
<p>There are advantages and disadvantages with the TSP and k-TSP classifiers. Some advantages of the two classifiers are that they are simple to implement and the resulting classifiers are easy to interpret. They are also invariant to monotone transformations as they only depend on relative rankings of gene expressions within the same sample. The overfitting problem is largely avoided due to simple comparisons. In addition, they are different from most algorithms in that comparisons in other algorithms were mostly between expressions from different samples. Comparison of expressions within the same sample in TSP and k-TSP helps to eliminate the influence of sampling variability due to different subjects.</p>
<p>A disadvantage is related to how the scores for gene pairs are defined. As the scores were calculated from percentages, the sample size information was not fully utilized in TSP and k-TSP. For example, suppose 4 out of 10 samples in class 1 and 6 out of 10 samples in class 2 satisfy the condition: Marker 1 has smaller expression value than marker 2. The score for the pair with markers 1 and 2 is 0.2, which is the absolute difference between the two percentages. In another case, suppose all the counts are multiplied by 10, i.e. 40 out of 100 samples in class 1 and 60 out of 100 samples in class 2 satisfy the condition. Then the score for the marker pair is identical to the previous case. So the additional information with extra sample size is completely ignored in TSP and k-TSP classifiers.</p>
<p>The multi-class classifiers HC-TSP and HC-k-TSP are two versions that showed best performance among all TSP family classifiers <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. They were derived from a scheme that performs sequential binary classification. At the root node, the training samples are partitioned into two classes, the largest class and the composite class. The largest class containing the largest number of samples is treated as a leaf node for final classification of the phenotype. The composite class is then further partitioned similarly as the root partition. This scheme intends to balance the two classes during each binary partition. However, the markers selected at each binary partition with TSP or k-TSP are not necessarily the best marker set to separate all the classes since they are selected based on their differentiating ability to separate the largest class from the composite class at the node. In addition, the selection of markers at each partition does not have a mechanism to control the redundancy of the candidate marker set. For example, in prostate cancer LNCaP cells, forkhead transcription factor (FOXO3a) that is the phosphatidylinositol 3-kinase (PI3K/Akt) downstream substrate, is a positive regulator for the induction of androgen receptor (AR) gene expression. The blocking of AR functions by AR interfering RNA leads to dramatic LNCaP cell death. Hence the inhibition of the PI3K/Akt pathway may result in the activation of the FOXO3a transcription factor, which may then induce the AR gene expression to protect cells from apoptosis of LNCaP prostate cancer cells. The PI3K/Akt and FOXO3a could both be selected in the marker selection algorithm of HC-TSP or HC-k-TSP. Apparently, they are highly correlated.</p>
<p>In this article, we propose a new algorithm to overcome the above problems of TSP family classifiers. We introduce a new definition of the score for each marker set so that the sample size information is fully utilized. In addition, it is unrealistic to assume that the number of informative genes is always even as in TSP family classifiers. We present a new algorithm that performs sequential search and do not restrict the informative markers to be even numbered. The binary class and multi-class cases are unified into a single framework. The algorithm was applied to 9 binary class and 10 multi-class cancer genomics datasets. The TSG classifier achieved better leave-one-out cross validation accuracy for the binary classification than TSP or k-TSP classifiers. For the multi-class problems, our TSG classifier gives comparable performance or outperform TSP family and other popular classifiers with a big margin in independent test accuracy for several cancer datasets. Beyond high accuracy, our new algorithm also has the advantage of giving small number of informative marker set and all the advantages of the TSP family classifiers.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<p>For generality, we describe the method in terms of markers, which could represent genes, probe sets, or other molecular units whose intensity is measured with high throughput instruments. Consider expression data from <it>P </it>markers and suppose there are <it>N </it>samples. The data can be expressed as a matrix <b>
<it>X </it>
</b>of dimension <it>N</it>x<it>P</it>. The (<it>i, j</it>) element <it>x<sub>ij </sub>
</it>of the matrix gives the expression value of the j<sup>th </sup>marker in the <it>i<sup>th </sup>
</it>sample. Let (<it>y<sub>1</sub>,..., y<sub>N</sub>
</it>) be the class labels for the N samples, where <it>y<sub>k </sub>
</it>takes one of the values in the set of all possible classes {<it>C<sub>1</sub>,..., C<sub>M </sub>
</it>}. C<sub>i </sub>represents the class phenotype that may be cancerous tumor, normal, or cancer subclasses such as different stages of a cancer. Denote <it>x<sub>i </sub>= </it>(<it>x<sub>i1</sub>,..., x<sub>iP</sub>
</it>) to be the <it>P </it>expression values from the <it>i<sup>th </sup>
</it>sample. The <it>P </it>is typically much larger than <it>N</it>, and could be in the neighborhood of exponential order O(<it>e<sup>N</sup>
</it>) with high density microarrays. The objective is to use &#937; = { (<it>x<sub>i </sub>, y<sub>i</sub>
</it>), <it>i = 1,..., N</it>} to select a parsimonious set of informative markers and build a classifier with these markers such that the diagnosis status of a test sample can be accurately classified by modeling the expression data of the selected markers.</p>
<sec>
<st>
<p>Score of a marker set</p>
</st>
<p>To consider the differentiating power of a set of markers consisting of k markers, we first define the score of the marker set. A normal sample contains normal proto-oncogenes that promote cell growth and mitosis and tumor suppressor genes that discourage cell growth. During cancer development, proto-oncogenes can be mutated by carcinogenic agents to become oncogenes, which produce excessive levels of growth promoting proteins. Cancer results from cumulative mutations of proto-oncogenes and suppressor genes which together allow the unregulated growth of cells. Hence, cancer development involves uncontrolled cell division resulting from a series (progression) of gene mutations that typically involve two categories of function: promotion of cell division and inactivation of cell cycle suppression. The expression values of excessive growth genes tend to be much higher than the same genes of a normal sample. Similarly, the level of tumor suppressor genes of a cancer sample tends to be much lower than a normal sample. It is relatively rare that a single marker alone could offer sufficient power to differentiate all the classes well in multi-class data. So we consider marker sets with at least two markers keeping in mind that over-growth of some genes and inactivation of other genes often happen together in cancer cases. Later stages of cancer involve tissue invasiveness, during which malignant cells travel among tissues via the circulatory and/or lymphatic system and grow and thrive in their new locations. Therefore the relative amount of two or multiple markers in a sample could be an indication of the cancer stages.</p>
</sec>
<sec>
<st>
<p>Score of marker pairs</p>
</st>
<p>For markers <it>i </it>and <it>j</it>, we use the following notation. Let <it>f</it>
<sub>1mij </sub>, <it>m </it>= 1,... <it>M</it>, represent the frequency count of samples in class <it>C<sub>m </sub>
</it>that satisfy the condition: the expression value for marker <it>i </it>is less than the expression value of marker <it>j</it>. Similarly, let <it>f</it>
<sub>2mij </sub>, <it>m </it>= 1,... <it>M</it>, be the frequency count of samples that satisfy the condition: the expression value for marker <it>i </it>is greater than or equal to the expression value of marker <it>j</it>. These counts can be presented in a cross-tabulation table as shown in Table <tblr tid="T1">1</tblr>, where <it>f</it>
<sub>1mij </sub>, m = 1,... M, are the entries in the first column and <it>f</it>
<sub>2mij </sub>, m = 1,... M, are the entries in the second column.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Frequency counts of samples in each class for marker pairs.</p></caption><tblbdy cols="4">
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b><it>E</it><sub>i </sub>&lt; E<sub>j</sub></b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b><it>E</it><sub>i </sub>&#8805; <it>E</it><sub>j</sub></b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Total</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>Class <it>C</it><sub>1</sub></b>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>f</it>
               <sub>11ij</sub>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>f</it>
               <sub>21ij</sub>
            </p>
         </c>
         <c ca="center">
            <p>
               <b><it>n</it><sub>1 </sub>= <it>f</it><sub>11ij </sub><it>+f</it><sub>21ij</sub></b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>&#8942;</p>
         </c>
         <c ca="center">
            <p>&#8942;</p>
         </c>
         <c ca="center">
            <p>&#8942;</p>
         </c>
         <c ca="center">
            <p>&#8942;</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>Class <it>C</it><sub>M</sub></b>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>f</it>
               <sub>1Mij</sub>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>f</it>
               <sub>2Mij</sub>
            </p>
         </c>
         <c ca="center">
            <p>
               <b><it>n</it><sub>M </sub>=<it>f</it><sub>1Mij </sub><it>+f</it><sub>2Mij</sub></b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>Total</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i1">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>T</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>1</m:mn>
                           </m:mrow>
                        </m:msub>
                        <m:mo class="MathClass-rel">=</m:mo>
                        <m:msubsup>
                           <m:mrow>
                              <m:mo mathsize="big">&#931;</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>m</m:mi>
                              <m:mo class="MathClass-rel">=</m:mo>
                              <m:mn>1</m:mn>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>M</m:mi>
                           </m:mrow>
                        </m:msubsup>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>1</m:mn>
                              <m:mi>m</m:mi>
                              <m:mi>i</m:mi>
                              <m:mi>j</m:mi>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i2">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>T</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>2</m:mn>
                           </m:mrow>
                        </m:msub>
                        <m:mo class="MathClass-rel">=</m:mo>
                        <m:msubsup>
                           <m:mrow>
                              <m:mo mathsize="big">&#931;</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>m</m:mi>
                              <m:mo class="MathClass-rel">=</m:mo>
                              <m:mn>1</m:mn>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>M</m:mi>
                           </m:mrow>
                        </m:msubsup>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>2</m:mn>
                              <m:mi>m</m:mi>
                              <m:mi>i</m:mi>
                              <m:mi>j</m:mi>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>
                  <it>N</it>
               </b>
            </p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>The <it>E<sub>i </sub></it>and <it>E<sub>j </sub></it>represent the population of expression values for markers i and j respectively.</p>
      <p>The n<sub>i </sub>are the total number of samples in class <it>C</it><sub>i</sub>, <it>i = 1,..., M</it>.</p>
   </tblfn></tbl>
<p>Based on the cancer mechanism that there is excessive growth in tumor cells and inactivation of suppressor genes, the best informative genes would consist of some genes overly expressed and some other genes that are down-regulated. In particular, a marker pair with genes i and j become increasingly more informative of the cancer status as the difference of their expression values diverges away from the corresponding difference between the same marker pairs of a normal patient. Consequently, for two markers encoding genes or proteins that are important to differentiate the cancer status, their relative magnitude of the expressions are inter-related and whether the expression value for marker <it>i </it>is less than the expression value of marker <it>j </it>is not independent of the class status. To incorporate the sample size information, the Chisquare statistic defined in equation (1) can be used to assess whether the pair of markers i and j are informative for classification of cancer status:</p>
<p>
<display-formula id="M1">
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i3"><m:mrow>
   <m:msup>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msup>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfenced close=")" open="(" separators="">
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#931;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>q</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msubsup>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#931;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>m</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>M</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:mfrac>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mrow>
                        <m:mo class="MathClass-open">(</m:mo>
                        <m:mrow>
                           <m:msub>
                              <m:mrow>
                                 <m:mi>f</m:mi>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>q</m:mi>
                                 <m:mi>m</m:mi>
                                 <m:mi>i</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo class="MathClass-bin">-</m:mo>
                           <m:msub>
                              <m:mrow>
                                 <m:mi>n</m:mi>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>m</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:msub>
                              <m:mrow>
                                 <m:mi>T</m:mi>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>q</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo class="MathClass-bin">/</m:mo>
                           <m:mi>N</m:mi>
                        </m:mrow>
                        <m:mo class="MathClass-close">)</m:mo>
                     </m:mrow>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>2</m:mn>
                  </m:mrow>
               </m:msup>
            </m:mrow>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>n</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>m</m:mi>
                  </m:mrow>
               </m:msub>
               <m:msub>
                  <m:mrow>
                     <m:mi>T</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-bin">/</m:mo>
               <m:mi>N</m:mi>
            </m:mrow>
         </m:mfrac>
      </m:mrow>
   </m:mfenced>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mi>N</m:mi>
   <m:mfenced close=")" open="(" separators="">
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#931;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>q</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msubsup>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#931;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>m</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>M</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:mfrac>
            <m:mrow>
               <m:msubsup>
                  <m:mrow>
                     <m:mi>f</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>q</m:mi>
                     <m:mi>m</m:mi>
                     <m:mi>i</m:mi>
                     <m:mi>j</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>2</m:mn>
                  </m:mrow>
               </m:msubsup>
            </m:mrow>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>n</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>m</m:mi>
                  </m:mrow>
               </m:msub>
               <m:msub>
                  <m:mrow>
                     <m:mi>T</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
         </m:mfrac>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:mfenced>
   <m:mo class="MathClass-punc">,</m:mo>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>where <it>n<sub>m </sub>
</it>and <it>T<sub>q </sub>
</it>are the row and column totals from the <it>m<sup>th </sup>
</it>row and <it>q<sup>th </sup>
</it>column, respectively. If all the counts in Table <tblr tid="T1">1</tblr> are large and all cell counts are at least five, a traditional way to declare significance for the pair is to compare the calculated statistic with the chi-squared-distribution with <it>M -</it>1 degrees of freedom. However, the significance level for declaring significance of a single test needs to be adjusted for multiple comparisons. There are various directions including family-wise error rate control, false-discovery rate (FDR) control, among others. The family-wise error rate control tends to be conservative while the FDR control could lead to high false positive rates. In this work, we do not use the chi-squared-distribution and do not decide how many pairs are significant. Instead, we only use the Chi-square statistic in (1) as an indication of how much departure from independence between the class and the chance of observing marker i expression value less than that of marker j. As the departure from independence increases, the chi-squared statistic value increases. We select the top pair of markers that yield the highest value of the chi-squared statistic. Additional marker selection will follow the algorithm in section 2.2.</p>
</sec>
<sec>
<st>
<p>Score of a <it>k</it>-marker set</p>
</st>
<p>For a set that contains k markers, we consider the cross tabulated Table <tblr tid="T2">2</tblr> that contains frequency counts of all unique pairwise comparisons among the <it>k </it>markers. There are <it>k</it>(<it>k-1</it>) columns of counts. The sum of counts in each column remains to be the same as that for two-marker case. The row totals are now <it>k </it>times that of the sample sizes in corresponding classes. We calculate the Chisqure statistic as in equation (2).</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>Frequency counts of samples in each class for a set of k markers.</p></caption><tblbdy cols="7">
      <r>
         <c ca="center">
            <p>
               <b>Class</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i4">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>E</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                        <m:mo class="MathClass-rel">&lt;</m:mo>
                        <m:msub>
                           <m:mrow>
                              <m:mi>E</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i5">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>E</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                        <m:mo class="MathClass-rel">&#8805;</m:mo>
                        <m:msub>
                           <m:mrow>
                              <m:mi>E</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>...</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i6">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>E</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo class="MathClass-bin">-</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                        <m:mo class="MathClass-rel">&lt;</m:mo>
                        <m:msub>
                           <m:mrow>
                              <m:mi>E</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i7">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>E</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo class="MathClass-bin">-</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                        <m:mo class="MathClass-rel">&#8805;</m:mo>
                        <m:msub>
                           <m:mrow>
                              <m:mi>E</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Total</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>
                  <it>C</it>
                  <sub>1</sub>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i8">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>11</m:mn>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i9">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>21</m:mn>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>...</p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i10">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>11</m:mn>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo class="MathClass-bin">-</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i11">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>21</m:mn>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo class="MathClass-bin">-</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>
                  <it>kn<sub>1</sub></it>
               </b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>&#8942;</p>
         </c>
         <c ca="center">
            <p>&#8942;</p>
         </c>
         <c ca="center">
            <p>&#8942;</p>
         </c>
         <c ca="center">
            <p>&#8942;</p>
         </c>
         <c ca="center">
            <p>&#8942;</p>
         </c>
         <c ca="center">
            <p>&#8942;</p>
         </c>
         <c ca="center">
            <p>&#8942;</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>
                  <it>C</it>
                  <sub>M</sub>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i12">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>1</m:mn>
                              <m:mi>M</m:mi>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i13">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>2</m:mn>
                              <m:mi>M</m:mi>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>...</p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i14">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>1</m:mn>
                              <m:mi>M</m:mi>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo class="MathClass-bin">-</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i15">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>2</m:mn>
                              <m:mi>M</m:mi>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo class="MathClass-bin">-</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>
                  <it>kn<sub>M</sub></it>
               </b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>Total</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i16">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>T</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>1</m:mn>
                           </m:mrow>
                        </m:msub>
                        <m:mo class="MathClass-rel">=</m:mo>
                        <m:msubsup>
                           <m:mrow>
                              <m:mo mathsize="big">&#931;</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>m</m:mi>
                              <m:mo class="MathClass-rel">=</m:mo>
                              <m:mn>1</m:mn>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>M</m:mi>
                           </m:mrow>
                        </m:msubsup>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>1</m:mn>
                              <m:mi>m</m:mi>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i17">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>T</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>2</m:mn>
                           </m:mrow>
                        </m:msub>
                        <m:mo class="MathClass-rel">=</m:mo>
                        <m:msubsup>
                           <m:mrow>
                              <m:mo mathsize="big">&#931;</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>m</m:mi>
                              <m:mo class="MathClass-rel">=</m:mo>
                              <m:mn>1</m:mn>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>M</m:mi>
                           </m:mrow>
                        </m:msubsup>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>2</m:mn>
                              <m:mi>m</m:mi>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>...</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i18">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>T</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mstyle class="text">
                                       <m:mtext class="textsf">k</m:mtext>
                                    </m:mstyle>
                                    <m:mo class="MathClass-bin">-</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                              <m:mstyle class="text">
                                 <m:mtext class="textsf">k</m:mtext>
                              </m:mstyle>
                              <m:mi>/</m:mi>
                              <m:mn>2</m:mn>
                              <m:mo class="MathClass-bin">-</m:mo>
                              <m:mn>1</m:mn>
                           </m:mrow>
                        </m:msub>
                        <m:mo class="MathClass-rel">=</m:mo>
                        <m:msubsup>
                           <m:mrow>
                              <m:mo mathsize="big">&#931;</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>m</m:mi>
                              <m:mo class="MathClass-rel">=</m:mo>
                              <m:mn>1</m:mn>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>M</m:mi>
                           </m:mrow>
                        </m:msubsup>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>1</m:mn>
                              <m:mi>m</m:mi>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <inline-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i19">
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mi>T</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mstyle class="text">
                                       <m:mtext class="textsf">k</m:mtext>
                                    </m:mstyle>
                                    <m:mo class="MathClass-bin">-</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                              <m:mstyle class="text">
                                 <m:mtext class="textsf">k</m:mtext>
                              </m:mstyle>
                              <m:mi>/</m:mi>
                              <m:mn>2</m:mn>
                           </m:mrow>
                        </m:msub>
                        <m:mo class="MathClass-rel">=</m:mo>
                        <m:msubsup>
                           <m:mrow>
                              <m:mo mathsize="big">&#931;</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>m</m:mi>
                              <m:mo class="MathClass-rel">=</m:mo>
                              <m:mn>1</m:mn>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>M</m:mi>
                           </m:mrow>
                        </m:msubsup>
                        <m:msub>
                           <m:mrow>
                              <m:mi>f</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>2</m:mn>
                              <m:mi>m</m:mi>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:msub>
                     </m:mrow>
                  </m:math>
               </inline-formula>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>
                  <it>K</it>
               </b>
               <it>N</it>
            </p>
         </c>
      </r>
   </tblbdy></tbl>
<p>
<display-formula id="M2">
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i20"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>a</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>b</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mi>a</m:mi>
         <m:mo class="MathClass-bin">+</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
      </m:mrow>
   </m:msubsup>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>m</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>M</m:mi>
      </m:mrow>
   </m:msubsup>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>q</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:mfrac>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mfenced close=")" open="(" separators="">
                  <m:mrow>
                     <m:msub>
                        <m:mrow>
                           <m:mi>f</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>q</m:mi>
                           <m:mi>m</m:mi>
                           <m:msub>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>a</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:msub>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>b</m:mi>
                              </m:mrow>
                           </m:msub>
                        </m:mrow>
                     </m:msub>
                     <m:mo class="MathClass-bin">-</m:mo>
                     <m:mfrac>
                        <m:mrow>
                           <m:mi>k</m:mi>
                           <m:msub>
                              <m:mrow>
                                 <m:mi>n</m:mi>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>m</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:msub>
                              <m:mrow>
                                 <m:mi>T</m:mi>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>q</m:mi>
                              </m:mrow>
                           </m:msub>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>N</m:mi>
                        </m:mrow>
                     </m:mfrac>
                  </m:mrow>
               </m:mfenced>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>n</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>m</m:mi>
            </m:mrow>
         </m:msub>
         <m:msub>
            <m:mrow>
               <m:mi>T</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">/</m:mo>
         <m:mi>N</m:mi>
      </m:mrow>
   </m:mfrac>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mi>N</m:mi>
   <m:mfenced close=")" open="(" separators="">
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#931;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>a</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msubsup>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#931;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mi>a</m:mi>
               <m:mo class="MathClass-bin">+</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#931;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>m</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>M</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#931;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>q</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msubsup>
         <m:mfrac>
            <m:mrow>
               <m:msubsup>
                  <m:mrow>
                     <m:mi>f</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>q</m:mi>
                     <m:mi>m</m:mi>
                     <m:msub>
                        <m:mrow>
                           <m:mi>i</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>a</m:mi>
                        </m:mrow>
                     </m:msub>
                     <m:msub>
                        <m:mrow>
                           <m:mi>i</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>b</m:mi>
                        </m:mrow>
                     </m:msub>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>2</m:mn>
                  </m:mrow>
               </m:msubsup>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
               <m:msub>
                  <m:mrow>
                     <m:mi>n</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>m</m:mi>
                  </m:mrow>
               </m:msub>
               <m:msub>
                  <m:mrow>
                     <m:mi>T</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
         </m:mfrac>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:mfenced>
   <m:mo class="MathClass-punc">,</m:mo>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>Note that the <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i21"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>only differs from <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i22"><m:mrow>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>a</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>b</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mi>a</m:mi>
         <m:mo class="MathClass-bin">+</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
      </m:mrow>
   </m:msubsup>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>a</m:mi>
            </m:mrow>
         </m:msub>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>by the division factor <it>k </it>in the first term. So comparison of <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i23"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula> and <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i24"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula> for two sets of <it>k</it>-markers {<it>i<sub>1</sub>,..., i<sub>k</sub>
</it>} and {<it>j<sub>1</sub>,..., j<sub>k</sub>
</it>} is equivalent to comparing <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i25"><m:mrow>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>a</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>b</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mi>a</m:mi>
         <m:mo class="MathClass-bin">+</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
      </m:mrow>
   </m:msubsup>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>a</m:mi>
            </m:mrow>
         </m:msub>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>and <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i26"><m:mrow>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>a</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>b</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mi>a</m:mi>
         <m:mo class="MathClass-bin">+</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
      </m:mrow>
   </m:msubsup>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>a</m:mi>
            </m:mrow>
         </m:msub>
         <m:msub>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>. The latter can be calculated easily without much computational cost after the statistics for marker pairs have been computed. The statistics given in equation (2) should be restricted to comparing marker sets with the same number of markers. All the k-marker sets can be ranked according to the magnitude of the Chisquare statistic values. The k-marker set with the highest Chisquare value is the most informative set among all k-marker sets.</p>
</sec>
<sec>
<st>
<p>Comparing marker sets of different sizes or identical Chisquare statistic</p>
</st>
<p>For comparing multiple sets with different numbers of markers, the Chisquare statistics given earlier can not be used because they accumulate different numbers of terms. In such case, we use the leave-one-out cross validation (LOOCV) accuracy within the training data obtained with the procedure below as the objective function.</p>
<p>Suppose the training data contains <it>N<sub>tr </sub>
</it>samples. Without loss of generality, we use &#937;<sub>tr </sub>= {<it>S<sub>1</sub>
</it>,..., <it>S<sub>Ntr</sub>
</it>} to denote the collection of these samples. For a marker set {<it>i<sub>1</sub>,..., i<sub>k</sub>
</it>}, the LOOCV is performed within this training sample. In particular, we</p>
<p indent="1">1. Leave out one training sample <it>S<sub>l </sub>
</it>to be used as the test data and use the rest of the training samples &#937;<sub>tr</sub>\ <it>S<sub>l </sub>
</it>as training data.</p>
<p indent="1">2. For class <it>m</it>, (<it>1 </it>&#8804; <it>m </it>&#8804; <it>M</it>), assign <it>S<sub>l </sub>
</it>to this class and calculate the Chisquare statistic for the marker set {<it>i<sub>1</sub>,..., i<sub>k</sub>
</it>}. We obtain M Chisquare statistics {<inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i27"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>m</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>, <it>m = 1,..., M</it>}. The predicted class <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i28"><m:mrow>
   <m:mover accent="true">
      <m:mrow>
         <m:mi>m</m:mi>
      </m:mrow>
      <m:mo class="MathClass-op">^</m:mo>
   </m:mover>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>S</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>l</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math>
</inline-formula>for <it>S<sub>l </sub>
</it>is the class that has the maximum Chisquare statistics, i.e., <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i29"><m:mrow>
   <m:mover accent="true">
      <m:mrow>
         <m:mi>m</m:mi>
      </m:mrow>
      <m:mo class="MathClass-op">^</m:mo>
   </m:mover>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>S</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>l</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mstyle class="text">
      <m:mtext class="textsf">arg</m:mtext>
   </m:mstyle>
   <m:mspace class="thinspace" width="0.3em"/>
   <m:munder class="msub">
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf">max</m:mtext>
         </m:mstyle>
      </m:mrow>
      <m:mrow>
         <m:mn>1</m:mn>
         <m:mo class="MathClass-rel">&#8804;</m:mo>
         <m:mi>m</m:mi>
         <m:mo class="MathClass-rel">&#8804;</m:mo>
         <m:mi>M</m:mi>
      </m:mrow>
   </m:munder>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>m</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>.</p>
<p indent="1">3. Repeat 1 and 2 for all the samples in &#937;<sub>tr </sub>to get the prediction of all samples in &#937;<sub>tr</sub>: <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i28">
<m:mrow>
<m:mover accent="true">
<m:mrow>
<m:mi>m</m:mi>
</m:mrow>
<m:mo class="MathClass-op">^</m:mo>
</m:mover>
<m:mrow>
<m:mo class="MathClass-open">(</m:mo>
<m:mrow>
<m:msub>
<m:mrow>
<m:mi>S</m:mi>
</m:mrow>
<m:mrow>
<m:mi>l</m:mi>
</m:mrow>
</m:msub>
</m:mrow>
<m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:mrow>
</m:math>
</inline-formula>, <it>l </it>= <it>1</it>,..., <it>N<sub>tr</sub>
</it>.</p>
<p indent="1">4. The LOOCV accuracy for marker set {<it>i<sub>1</sub>,..., i<sub>k</sub>
</it>} is LOOCV(<it>i<sub>1</sub>,..., i<sub>k</sub>
</it>) = the proportion of correctly classified samples in &#937;<sub>tr</sub>.</p>
<p>The comparison of marker sets of different sizes can be based on the LOOCV accuracy within the training data. The marker set with highest LOOCV is more informative than other marker sets. For marker sets with the same number of markers and identical Chisquare statistic value, we also use the LOOCV accuracy as a measure of their differentiating power toward the different cancer classes.</p>
</sec>
<sec>
<st>
<p>Marker selection algorithm</p>
</st>
<p>For a given upper bound <it>B </it>on the cardinality of the marker set, the marker selection process first selects the top scoring pairs and then sequentially adds additional markers into the active set until the total number of markers in the active set reaches the upper bound <it>B</it>. This is done following the algorithm below: Denote the set of remaining markers as &#7838;. The initial value of &#7838; is the list of all markers.</p>
<p indent="1">1. Calculate and record the Chisquare statistics for all marker pairs using the training data.</p>
<p indent="1">2. If the highest value of the Chisquare statistics is achieved by a unique marker pair, select this pair and denote it as TS<sub>2</sub>. Calculate the LOOCV for this pair of markers and denote it as LOOCV<sub>2</sub>. Update the remaining marker set &#7838; by removing the marker pair selected.</p>
<p indent="1">3. If there are multiple marker pairs that have identical maximum Chisquare statistic value, calculate the LOOCV accuracy of these marker pairs using the training data. Keep the marker pairs that have the highest LOOCV accuracy. If the highest accuracy is achieved by more than one pairs, denote the different pairs as TS<sub>2,1</sub>, TS<sub>2,2</sub>, etc.</p>
<p indent="1">4. Find the top scoring triplets by adding additional marker to the top scoring pairs. This is done as follows. For each of the top scoring pairs resulting from 2 and 3, find the marker from the list of remaining markers &#7838; such that the triplet has the highest Chisquare statistic value. If there are multiple triplets with identical maximum Chisquare value, calculate the LOOCV accuracy of these triplets and record those triplets that yield the highest LOOCV accuracy. Denote the top scoring triplet as TS<sub>3 </sub>if it is unique, and TS<sub>3,1</sub>, TS<sub>3,2</sub>, etc otherwise if multiple sets achieved identical accuracy. Record the LOOCV accuracy of the top scoring triplets.</p>
<p indent="1">5. For <it>k </it>= <it>4, 5</it>,..., <it>B</it>, find TS<sub>k </sub>and their corresponding LOOCV accuracy LOOCV<sub>k</sub>. As <it>k </it>increases, the set TS<sub>k </sub>tend to be unique.</p>
<p indent="1">6. Select the smallest k-marker set such that the LOOCV is maximized over all TS<sub>k</sub>, <it>k = 1</it>,..., <it>B</it>. If the marker set is not unique, randomly select one of them as the final set. Denote the final selected informative marker set as TS<sub>G</sub>, where G = arg max<sub>1 &#8804; k &#8804; B </sub>LOOCV<sub>k</sub>.</p>
<p>As discussed in section 2.1.2, the comparison of the Chisquare statistics for <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i30"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>can be simplified by comparing the summation of all Chisquare statistics from unique marker pairs <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i31"><m:mrow>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>a</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big">&#931;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>b</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mi>a</m:mi>
         <m:mo class="MathClass-bin">+</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
      </m:mrow>
   </m:msubsup>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>a</m:mi>
            </m:mrow>
         </m:msub>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>.</p>
<p>As an illustration, Figure <figr fid="F1">1</figr> shows the the accuracy of the training and test data for <it>k </it>= <it>2</it>,..., <it>100 </it>for the 6-class DLBCL cancer microarray data <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>. It can be seen that the training accuracy reached maximum when <it>k </it>= <it>16</it>. So the selected marker set is TS<sub>16</sub>.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Accuracy of TS<sub>k </sub>for training and test data from DLBCL cancer (Alizadeh et al., 2000)</p></caption><text>
   <p><b>Accuracy of TS<sub>k </sub>for training and test data from DLBCL cancer (Alizadeh et al., 2000)</b>.</p>
</text><graphic file="1755-8794-6-S1-S3-1"/></fig>
<p>Our experience suggests that it is sufficient to set the upper bound B to be 50 if the total number of classes M &#8804; 4, 100 if 5 &#8804; M &#8804; 8, and 150 if M &#8805; 9.</p>
</sec>
<sec>
<st>
<p>Prediction with TSG classifier</p>
</st>
<p>To predict the class information for each sample in the test data, we use the selected marker set and calculate the scores of this sample belonging to each class. A large value for a class suggests that putting this sample in that class helps to increase the separation of different classes. The predicted class is set to be the one that has the largest score. In particular, suppose the selected marker set consists of markers <it>m<sub>1</sub>, m<sub>2</sub>,..., m<sub>k</sub>
</it>, the training data is &#937;<sub>tr</sub>, and the sample to be predicted is <it>x</it>
<sub>new</sub>. Let<inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i32"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">|</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>C</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>be the Chisquare statistic value when we put the sample in class <it>C<sub>i</sub>, i = 1,..., M</it>. There are M Chisquare values. We assign the sample to the class with the largest Chisquare value:</p>
<p>Class of <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i33"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>x</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf">new</m:mtext>
         </m:mstyle>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mstyle class="text">
      <m:mtext class="textsf">arg</m:mtext>
   </m:mstyle>
   <m:munder class="msub">
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf">max</m:mtext>
         </m:mstyle>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>M</m:mi>
      </m:mrow>
   </m:munder>
   <m:msub>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mi>&#967;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:mi>.</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
         <m:msub>
            <m:mrow>
               <m:mo class="MathClass-rel">|</m:mo>
            </m:mrow>
            <m:mrow/>
         </m:msub>
         <m:msub>
            <m:mrow>
               <m:mi>C</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>.</p>
<p>If multiple classes reach the same maximum Chisquare value, we further calculate the LOOCV accuracy for these classes. The final prediction is based on which class achieves the highest LOOCV accuracy.</p>
</sec>
</sec>
<sec>
<st>
<p>Results and discussion</p>
</st>
<sec>
<st>
<p>Microarray data and method of comparison</p>
</st>
<p>The performance of the proposed TSG marker selection and classifier is evaluated on both binary and multi-class microarray expression data. We consider the 19 datasets that were used for evaluation of TSP, k-TSP and their multi-class version classifiers in Tan et al. 2005. There are 9 binary and 10 multi-class datasets. These datasets are related to human cancers including colon cancer, leukemia, central nervous system, diffuse large B-cell lymphoma, breast cancer, lung cancer, and prostate cancer. The reference, sample size, number of genes in each dataset, and the number of samples in each class are summarized in Tables <tblr tid="T3">3</tblr> and <tblr tid="T4">4</tblr>. The number of classes ranges from 2 to 14. The number of markers ranges from 2000 to 16063. Average number of samples per class ranges from 13 to 140. The ratio between the number of samples per class and the number of markers ranges from 0.000845 to 0.0155.</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>Binary class gene expression datasets</p></caption><tblbdy cols="6">
      <r>
         <c ca="center">
            <p>
               <b>Dataset</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Platform</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>No. of</b>
            </p>
            <p>
               <b>Genes</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>No. of samples</b>
            </p>
            <p>
               <b>in class I</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>No. of samples</b>
            </p>
            <p>
               <b>in class II</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Source</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Colon</p>
         </c>
         <c ca="center">
            <p>cDNA</p>
         </c>
         <c ca="center">
            <p>2000</p>
         </c>
         <c ca="center">
            <p>40(T)</p>
         </c>
         <c ca="center">
            <p>22(N)</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B18">18</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Leukemia</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>7129</p>
         </c>
         <c ca="center">
            <p>25(AML)</p>
         </c>
         <c ca="center">
            <p>47(ALL)</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B19">19</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>CNS</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>7129</p>
         </c>
         <c ca="center">
            <p>25(C)</p>
         </c>
         <c ca="center">
            <p>9(D)</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B20">20</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>DLBCL</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>7129</p>
         </c>
         <c ca="center">
            <p>58(D)</p>
         </c>
         <c ca="center">
            <p>19(F)</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B21">21</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Lung</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>12533</p>
         </c>
         <c ca="center">
            <p>150(A)</p>
         </c>
         <c ca="center">
            <p>31(M)</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B22">22</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Prostate1</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>12600</p>
         </c>
         <c ca="center">
            <p>52(T)</p>
         </c>
         <c ca="center">
            <p>50(N)</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B23">23</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Prostate2</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>12625</p>
         </c>
         <c ca="center">
            <p>38(T)</p>
         </c>
         <c ca="center">
            <p>50(N)</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B24">24</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Prostate3</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>12626</p>
         </c>
         <c ca="center">
            <p>24(T)</p>
         </c>
         <c ca="center">
            <p>9(N)</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B25">25</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>GCM</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>16063</p>
         </c>
         <c ca="center">
            <p>190(C)</p>
         </c>
         <c ca="center">
            <p>90(N)</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B26">26</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
   </tblbdy></tbl>
<tbl id="T4"><title><p>Table 4</p></title><caption><p>Multi-class gene expression datasets</p></caption><tblbdy cols="7">
      <r>
         <c ca="center">
            <p>
               <b>Dataset</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Platform</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>No. of</b>
            </p>
            <p>
               <b>Classes</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>No. of</b>
            </p>
            <p>
               <b>Genes</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>No. of samples</b>
            </p>
            <p>
               <b>in training</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>No. of samples</b>
            </p>
            <p>
               <b>in test</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Source</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Leukemia1</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>7129</p>
         </c>
         <c ca="center">
            <p>38</p>
         </c>
         <c ca="center">
            <p>34</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B19">19</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Lung1</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>7129</p>
         </c>
         <c ca="center">
            <p>64</p>
         </c>
         <c ca="center">
            <p>32</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B27">27</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Leukemia2</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>12582</p>
         </c>
         <c ca="center">
            <p>57</p>
         </c>
         <c ca="center">
            <p>15</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B28">28</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>SRBCT</p>
         </c>
         <c ca="center">
            <p>cDNA</p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>2308</p>
         </c>
         <c ca="center">
            <p>63</p>
         </c>
         <c ca="center">
            <p>20</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B29">29</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Breast</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>9216</p>
         </c>
         <c ca="center">
            <p>54</p>
         </c>
         <c ca="center">
            <p>30</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B30">30</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Lung2</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>12600</p>
         </c>
         <c ca="center">
            <p>136</p>
         </c>
         <c ca="center">
            <p>67</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B31">31</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>DLBCL</p>
         </c>
         <c ca="center">
            <p>cDNA</p>
         </c>
         <c ca="center">
            <p>6</p>
         </c>
         <c ca="center">
            <p>4026</p>
         </c>
         <c ca="center">
            <p>58</p>
         </c>
         <c ca="center">
            <p>30</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B17">17</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Leukemia3</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>12558</p>
         </c>
         <c ca="center">
            <p>215</p>
         </c>
         <c ca="center">
            <p>112</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B32">32</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Cancers</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>11</p>
         </c>
         <c ca="center">
            <p>12533</p>
         </c>
         <c ca="center">
            <p>100</p>
         </c>
         <c ca="center">
            <p>74</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B33">33</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>GCM</p>
         </c>
         <c ca="center">
            <p>Affy</p>
         </c>
         <c ca="center">
            <p>14</p>
         </c>
         <c ca="center">
            <p>16063</p>
         </c>
         <c ca="center">
            <p>144</p>
         </c>
         <c ca="center">
            <p>46</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B26">26</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
   </tblbdy></tbl>
<p>First, we consider comparison of TSG and k-TSP classifiers for binary datasets based on 5-fold cross validation. The subjects in each class are randomly partitioned into 5 parts, 4 of which form the training data and the rest of the subjects constitute the test data. The feature selection and modeling were conducted on the training data and prediction for each subject in the test data was given. For the results to be comparable to the TSP family classifiers, we also follow the same comparison methods as in Tan et al. <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. In particular, we perform LOOCV for binary datasets and perform independent test for multi-class datasets. In the LOOCV, each sample is taken out and the remaining N-1 samples are used to train the classifier, which is then used to predict the class label of the leave-out sample. The LOOCV accuracy is the proportion of correctly classified samples. Each of the multi-class datasets is partitioned into training and test data. We follow exactly the same partition scheme as in Tan et al. <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>.</p>
<p>Since an objective of this research is to improve the TSP family classifiers, we present the percent of increase in classification accuracy in barplots. The percent of increase for TSG over TSP is defined as (accuracy of TSG - accuracy of TSP)/accuracy of TSP x 100%.</p>
<p>The percent of increase for any two classifiers are similarly defined. As TSP classifier uses two genes, k-TSP and TSG use at least two genes, we are particularly interested in comparing the improvement in accuracy for TSG over TSP and k-TSP over TSP in binary classifications. Similarly, in the multi-class cases, we are interested in comparing the increase in accuracy for TSG over HC-TSP and HC-k-TSP over HC-TSP.</p>
<p>For reference, we also include the classification accuracy of decision trees (DT), Naive Bayes (NB), k-nearest neighbor (k-NN), Support Vector Machines (SVM) and prediction analysis of microarrays (PAM) in our comparison tables when they are available from the literature. These results were reported in Tan et al. <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp> for leave-one-out cross validation for binary data and independent test for multiclass data. We include them only for convenience of discussion. DT and PAM have feature selection function while NB, k-NN and SVM perform classification using the entire set of features. Since DT, k-NN, and NB in general have lower accuracy than the other classifiers, we focus our discussion on other classifiers.</p>
</sec>
<sec>
<st>
<p>Accuracy for binary cancer data</p>
</st>
<p>In this section we present the comparison of the TSG marker selection algorithm with other algorithms using 9 benchmark binary class cancer expression datasets. These data have been analyzed extensively by many authors with wrapper, filtering and ensemble methods.</p>
<sec>
<st>
<p>Comparison of TSG and k-TSP classifiers based on 5-fold cross validation</p>
</st>
<p>Here we present our comparison of TSG and k-TSP classifiers based on 5-fold cross validation. For each binary cancer dataset, we randomly split the subjects in each class into 5 parts. One part will be left out as the test data and the remaining 4 parts are used as the training data. We then train the TSG, k-TSP and TSP classifiers using the training data to select features and build models. TSP is included for reference purpose since both TSG and k-TSP extend the TSP classifier. The resulting features and models are further used to predict the class of each subject in the test data. Each of the 5 parts in turn serves as the test data. We record the accuracy of the prediction calculated as the proportion of correctly classified subjects among all collected subjects. The procedure is repeated 10 times. The average and standard error of the accuracy are reported in Table <tblr tid="T5">5</tblr>.</p>
<tbl id="T5"><title><p>Table 5</p></title><caption><p>Average and standard error of accuracy from 5-fold cross validation based on 10 runs.</p></caption><tblbdy cols="7">
      <r>
         <c>
            <p/>
         </c>
         <c ca="center" cspan="2">
            <p>
               <b>TSG</b>
            </p>
         </c>
         <c ca="center" cspan="2">
            <p>
               <b>k-TSP</b>
            </p>
         </c>
         <c ca="center" cspan="2">
            <p>
               <b>TSP</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Dataset</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Average</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Standard error</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Average</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Standard error</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Average</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Standard error</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CNS</p>
         </c>
         <c ca="left">
            <p>81.1</p>
         </c>
         <c ca="left">
            <p>1.61</p>
         </c>
         <c ca="left">
            <p>77.0</p>
         </c>
         <c ca="left">
            <p>1.48</p>
         </c>
         <c ca="left">
            <p>67.0</p>
         </c>
         <c ca="left">
            <p>2.48</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Colon</p>
         </c>
         <c ca="left">
            <p>84.8</p>
         </c>
         <c ca="left">
            <p>0.85</p>
         </c>
         <c ca="left">
            <p>87.4</p>
         </c>
         <c ca="left">
            <p>0.31</p>
         </c>
         <c ca="left">
            <p>86.7</p>
         </c>
         <c ca="left">
            <p>1.85</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>DLBCL</p>
         </c>
         <c ca="left">
            <p>95.2</p>
         </c>
         <c ca="left">
            <p>0.69</p>
         </c>
         <c ca="left">
            <p>94.6</p>
         </c>
         <c ca="left">
            <p>0.88</p>
         </c>
         <c ca="left">
            <p>92.9</p>
         </c>
         <c ca="left">
            <p>2.25</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GCM</p>
         </c>
         <c ca="left">
            <p>81.7</p>
         </c>
         <c ca="left">
            <p>0.66</p>
         </c>
         <c ca="left">
            <p>81.9</p>
         </c>
         <c ca="left">
            <p>0.41</p>
         </c>
         <c ca="left">
            <p>76.0</p>
         </c>
         <c ca="left">
            <p>1.04</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Leukemia</p>
         </c>
         <c ca="left">
            <p>93.7</p>
         </c>
         <c ca="left">
            <p>0.75</p>
         </c>
         <c ca="left">
            <p>91.8</p>
         </c>
         <c ca="left">
            <p>1.13</p>
         </c>
         <c ca="left">
            <p>89.4</p>
         </c>
         <c ca="left">
            <p>1.56</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Lung</p>
         </c>
         <c ca="left">
            <p>100</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>98.6</p>
         </c>
         <c ca="left">
            <p>0.15</p>
         </c>
         <c ca="left">
            <p>96.3</p>
         </c>
         <c ca="left">
            <p>0.69</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Prostate1</p>
         </c>
         <c ca="left">
            <p>90.2</p>
         </c>
         <c ca="left">
            <p>0.72</p>
         </c>
         <c ca="left">
            <p>91</p>
         </c>
         <c ca="left">
            <p>0.75</p>
         </c>
         <c ca="left">
            <p>89.3</p>
         </c>
         <c ca="left">
            <p>1.27</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Prostate2</p>
         </c>
         <c ca="left">
            <p>74.4</p>
         </c>
         <c ca="left">
            <p>0.6</p>
         </c>
         <c ca="left">
            <p>77.9</p>
         </c>
         <c ca="left">
            <p>0.69</p>
         </c>
         <c ca="left">
            <p>70.4</p>
         </c>
         <c ca="left">
            <p>2.25</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Prostate3</p>
         </c>
         <c ca="left">
            <p>100</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>95.4</p>
         </c>
         <c ca="left">
            <p>0.79</p>
         </c>
         <c ca="left">
            <p>95.1</p>
         </c>
         <c ca="left">
            <p>1.44</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Average</p>
         </c>
         <c ca="left">
            <p>89.0</p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>88.4</p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>84.8</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
   </tblbdy></tbl>
<p>In five of the nine datasets (CNS, DLBCL, Leukemia, Lung, Prostate3), TSG has slightly better performance than k-TSP. In the remaining four datasets, k-TSP has slightly better performance. On average, TSG and k-TSP have comparable 5-fold cross validation performance. The k-TSP improvement over TSP is also marginal in this 5-fold cross validation setting for most of the datasets.</p>
</sec>
<sec>
<st>
<p>LOOCV accuracy</p>
</st>
<p>Among the LOOCV accuracy reported in the literature, we find that TSP, k-TSP, PAM and SVM are often the top performing classifiers.</p>
<p>The LOOCV accuracy of the proposed TSG and aforementioned competing classifiers for the 9 datasets are presented in Table <tblr tid="T6">6</tblr>. In terms of accuracy, TSG consistently gives the best performance for all but the leukemia dataset, for which NB yields an accuracy of 100% while TSG gives 98.61%. For the CNS data, TSG and k-TSP have tied performance of 97% that is much better than the rest of the classifiers (all below 83%). For the prostate 1 data, TSG and TSP have equally best performance. For the prostate 3 data, TSG and SVM both have 100% accuracy.</p>
<tbl id="T6"><title><p>Table 6</p></title><caption><p>LOOCV accuracy and the number of genes used in classifiers (in parenthesis) for binary class expression datasets</p></caption><tblbdy cols="11">
      <r>
         <c ca="left">
            <p>
               <b>Method</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Colon</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Leuk</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>CNS</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>DLBCL</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Lung</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Pros1</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Pros2</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Pros3</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>GCM</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Aver</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="11">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>TSG<sup>&#8224;</sup></p>
         </c>
         <c ca="left">
            <p>
               <b>93.55</b>
            </p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>98.61</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>
               <b>97.06</b>
            </p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>
               <b>98.7</b>
            </p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>
               <b>95.1</b>
            </p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>
               <b>86.36</b>
            </p>
            <p>(10)</p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>87.5</p>
            <p>(7)</p>
         </c>
         <c ca="left">
            <p>
               <b>95.21</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>TSP*</p>
         </c>
         <c ca="left">
            <p>91.10</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>93.80</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>77.90</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>98.10</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>98.30</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>
               <b>95.10</b>
            </p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>67.60</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>97.00</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>75.40</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>88.26</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>k-TSP*</p>
         </c>
         <c ca="left">
            <p>90.30</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>95.83</p>
            <p>(18)</p>
         </c>
         <c ca="left">
            <p>
               <b>97.10**</b>
            </p>
            <p>(10)</p>
         </c>
         <c ca="left">
            <p>97.40</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>98.90</p>
            <p>(10)</p>
         </c>
         <c ca="left">
            <p>91.18</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>75.00</p>
            <p>(18)</p>
         </c>
         <c ca="left">
            <p>97.00</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>85.40</p>
            <p>(10)</p>
         </c>
         <c ca="left">
            <p>92.01</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>DT*</p>
         </c>
         <c ca="left">
            <p>77.42</p>
            <p>(3)</p>
         </c>
         <c ca="left">
            <p>73.61</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>67.65</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>80.52</p>
            <p>(3)</p>
         </c>
         <c ca="left">
            <p>96.13</p>
            <p>(3)</p>
         </c>
         <c ca="left">
            <p>87.25</p>
            <p>(4)</p>
         </c>
         <c ca="left">
            <p>64.77</p>
            <p>(4)</p>
         </c>
         <c ca="left">
            <p>84.85</p>
            <p>(1)</p>
         </c>
         <c ca="left">
            <p>77.86</p>
            <p>(14)</p>
         </c>
         <c ca="left">
            <p>78.90</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>NB*<sup>&#8225;</sup></p>
         </c>
         <c ca="left">
            <p>56.45</p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
         </c>
         <c ca="left">
            <p>82.35</p>
         </c>
         <c ca="left">
            <p>80.52</p>
         </c>
         <c ca="left">
            <p>97.79</p>
         </c>
         <c ca="left">
            <p>62.75</p>
         </c>
         <c ca="left">
            <p>73.86</p>
         </c>
         <c ca="left">
            <p>90.91</p>
         </c>
         <c ca="left">
            <p>84.29</p>
         </c>
         <c ca="left">
            <p>80.99</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>k-NN*<sup>&#8225;</sup></p>
         </c>
         <c ca="left">
            <p>74.19</p>
         </c>
         <c ca="left">
            <p>84.72</p>
         </c>
         <c ca="left">
            <p>82.35</p>
         </c>
         <c ca="left">
            <p>89.61</p>
         </c>
         <c ca="left">
            <p>98.34</p>
         </c>
         <c ca="left">
            <p>74.51</p>
         </c>
         <c ca="left">
            <p>73.86</p>
         </c>
         <c ca="left">
            <p>93.94</p>
         </c>
         <c ca="left">
            <p>86.79</p>
         </c>
         <c ca="left">
            <p>84.26</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SVM*<sup>&#8225;</sup></p>
         </c>
         <c ca="left">
            <p>82.26</p>
         </c>
         <c ca="left">
            <p>98.61</p>
         </c>
         <c ca="left">
            <p>82.35</p>
         </c>
         <c ca="left">
            <p>97.40</p>
         </c>
         <c ca="left">
            <p>99.45</p>
         </c>
         <c ca="left">
            <p>91.18</p>
         </c>
         <c ca="left">
            <p>76.14</p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>93.21</b>
            </p>
         </c>
         <c ca="left">
            <p>91.18</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>PAM*</p>
         </c>
         <c ca="left">
            <p>89.52</p>
            <p>(15)</p>
         </c>
         <c ca="left">
            <p>94.03</p>
            <p>(2296)</p>
         </c>
         <c ca="left">
            <p>82.35</p>
            <p>(4)</p>
         </c>
         <c ca="left">
            <p>85.45</p>
            <p>(17)</p>
         </c>
         <c ca="left">
            <p>97.90</p>
            <p>(9)</p>
         </c>
         <c ca="left">
            <p>90.89</p>
            <p>(47)</p>
         </c>
         <c ca="left">
            <p>81.25</p>
            <p>(13)</p>
         </c>
         <c ca="left">
            <p>94.24</p>
            <p>(701)</p>
         </c>
         <c ca="left">
            <p>82.32</p>
            <p>(47)</p>
         </c>
         <c ca="left">
            <p>88.66</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>*Results reported in Tan et al. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp><sup>&#8224;</sup>Results obtained with our method (TSG) <sup>&#8225;</sup>NB, k-NN, SVM used entire set of genes</p>
      <p>**The 97.10 reported in Tan et al. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> may be a result of rounding 97.06, which is the accuracy of correctly classifying 33 of the 34 samples.</p>
   </tblfn></tbl>
<p>To assess how much improvement the TSG classifier achieves by considering arbitrary number of genes instead of only top scoring pairs as in TSP and k-TSP, we give the percent of increase in LOOCV accuracy in Figure <figr fid="F2">2</figr>. The first bar in each panel is the percent of improvement in accuracy for k-TSP over TSP classifier; the second gives that for TSG over k-TSP and the third gives that for TSG over TSP. Comparing the bar heights of the first and third bars in each panel gives us an idea of whether k-TSP and TSG improve TSP with similar performance. The second bar in each panel tells how much improvement TSG achieves over k-TSP. For the CNS data, TSG and k-TSP have same improvement over TSP. The much taller heights for the third bar in all panels except for the CNS data suggest that TSG gained much more accuracy than k-TSP. There are two reasons for this observed accuracy gain: (1) the set of informative genes could be odd numbered but k-TSP can only use even number of genes; (2) selection of additional genes after the top pair by the TSG classifier considers the joint effect of all selected genes on differentiating the cancer classes whereas the k-TSP classifier only consider the marginal effect of pairs. Without considering the joint effect, collection of top scoring pairs in k-TSP could easily accumulate redundant genes. So TSG naturally gives better performance than k-TSP.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Percent of improvement in accuracy for TSG over TSP family classifiers in binary classification</p></caption><text>
   <p><b>Percent of improvement in accuracy for TSG over TSP family classifiers in binary classification</b>. 1: k-TSP improve upon TSP, 2:TSG improve upon k-TSP, 3: TSG improve upon TSP.</p>
</text><graphic file="1755-8794-6-S1-S3-2"/></fig>
<p>To ease the discussion about how the performance of TSG, k-TSP, HC-k-TSP, SVM, and PAM classifiers compare to each other, we plot the LOOCV accuracy of TSG, k-TSP, and SVM relative to that of PAM in the left panel of Figure <figr fid="F3">3</figr>. It can be seen that TSG, k-TSP, and SVM are in general better than PAM for binary data since most of the accuracy values for TSG, k-TSP, and SVM are above the straight line. The performance of TSG is consistently the best as its values are highest for all datasets. SVM has similar accuracy as k-TSP in four datasets. SVM is worse than k-TSP clearly in two datasets (Colon and CNS) and k-TSP is obviously not as accurate as SVM in the Leuk, GCM and Pros3 datasets. On average performance, k-TSP and SVM are comparable in binary classifications with the 9 datasets. In summary, the TSG classifier outperforms k-TSP and SVM in LOOCV accuracy for the 9 binary classification problems, the latter two have comparable performance and are both better than PAM in accuracy.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>The accuracy of TSG, k-TSP, HC-k-TSP, and SVM compared to PAM</p></caption><text>
   <p><b>The accuracy of TSG, k-TSP, HC-k-TSP, and SVM compared to PAM</b>. Left: LOOCV accuracy of TSG (T), k-TSP (k) and SVM (S) compared to PAM in binary datasets; Right: Accuracy for independent test samples for TSG (T), HC-k-TSP (H), and SVM (S) compared to PAM in multi-class datasets. The straight lines in the plots have slope 1 and intercept 0 such that points on the lines have equal horizontal and vertical values.</p>
</text><graphic file="1755-8794-6-S1-S3-3"/></fig>
</sec>
</sec>
<sec>
<st>
<p>Accuracy of independent test for multi-class cancer data</p>
</st>
<p>For the multi-class datasets, the accuracy of classifiers on independent test data is presented in Table <tblr tid="T7">7</tblr>.</p>
<tbl id="T7"><title><p>Table 7</p></title><caption><p>Accuracy of classifiers and the number of genes used in classifiers (in parenthesis) for the independent test set for multi-class expression datasets</p></caption><tblbdy cols="12">
      <r>
         <c ca="left">
            <p>
               <b>Method</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Leuk1</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Lung1</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Leuk2</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>SRBCT</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Breast</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Lung2</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>DLBCL</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Leuk3</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Cancers</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>GCM</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Aver</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="12">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>TSG<sup>&#8224;</sup></p>
         </c>
         <c ca="left">
            <p>
               <b>97.06</b>
            </p>
            <p>(6)</p>
         </c>
         <c ca="left">
            <p>81.25</p>
            <p>(20)</p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
            <p>(44)</p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
            <p>(13)</p>
         </c>
         <c ca="left">
            <p>86.67</p>
            <p>(63)</p>
         </c>
         <c ca="left">
            <p>95.52</p>
            <p>(60)</p>
         </c>
         <c ca="left">
            <p>93.33</p>
            <p>(16)</p>
         </c>
         <c ca="left">
            <p>91.07</p>
            <p>(95)</p>
         </c>
         <c ca="left">
            <p>79.73</p>
            <p>(81)</p>
         </c>
         <c ca="left">
            <p>
               <b>67.39</b>
            </p>
            <p>(112)</p>
         </c>
         <c ca="left">
            <p>
               <b>89.20</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>HC-TSP*</p>
         </c>
         <c ca="left">
            <p>
               <b>97.06</b>
            </p>
            <p>(4)</p>
         </c>
         <c ca="left">
            <p>71.88</p>
            <p>(4)</p>
         </c>
         <c ca="left">
            <p>80</p>
            <p>(4)</p>
         </c>
         <c ca="left">
            <p>95</p>
            <p>(6)</p>
         </c>
         <c ca="left">
            <p>66.67</p>
            <p>(8)</p>
         </c>
         <c ca="left">
            <p>83.58</p>
            <p>(8)</p>
         </c>
         <c ca="left">
            <p>83.33</p>
            <p>(10)</p>
         </c>
         <c ca="left">
            <p>77.68</p>
            <p>(12)</p>
         </c>
         <c ca="left">
            <p>74.32</p>
            <p>(20)</p>
         </c>
         <c ca="left">
            <p>52.17</p>
            <p>(26)</p>
         </c>
         <c ca="left">
            <p>78.17</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>HC-k-TSP*</p>
         </c>
         <c ca="left">
            <p>
               <b>97.06</b>
            </p>
            <p>(36)</p>
         </c>
         <c ca="left">
            <p>78.13</p>
            <p>(20)</p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
            <p>(24)</p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
            <p>(30)</p>
         </c>
         <c ca="left">
            <p>66.67</p>
            <p>(24)</p>
         </c>
         <c ca="left">
            <p>94.03</p>
            <p>(28)</p>
         </c>
         <c ca="left">
            <p>83.33</p>
            <p>(46)</p>
         </c>
         <c ca="left">
            <p>82.14</p>
            <p>(64)</p>
         </c>
         <c ca="left">
            <p>82.43</p>
            <p>(128)</p>
         </c>
         <c ca="left">
            <p>
               <b>67.39</b>
            </p>
            <p>(134)</p>
         </c>
         <c ca="left">
            <p>85.12</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>DT*</p>
         </c>
         <c ca="left">
            <p>85.29</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>78.13</p>
            <p>(4)</p>
         </c>
         <c ca="left">
            <p>80</p>
            <p>(2)</p>
         </c>
         <c ca="left">
            <p>75</p>
            <p>(3)</p>
         </c>
         <c ca="left">
            <p>73.33</p>
            <p>(4)</p>
         </c>
         <c ca="left">
            <p>88.06</p>
            <p>(5)</p>
         </c>
         <c ca="left">
            <p>86.67</p>
            <p>(5)</p>
         </c>
         <c ca="left">
            <p>75.89</p>
            <p>(16)</p>
         </c>
         <c ca="left">
            <p>68.92</p>
            <p>(10)</p>
         </c>
         <c ca="left">
            <p>52.17</p>
            <p>(18)</p>
         </c>
         <c ca="left">
            <p>76.35</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>NB<sup>&#8225;</sup>*</p>
         </c>
         <c ca="left">
            <p>85.29</p>
         </c>
         <c ca="left">
            <p>81.25</p>
         </c>
         <c ca="left">
            <p>100</p>
         </c>
         <c ca="left">
            <p>60</p>
         </c>
         <c ca="left">
            <p>66.67</p>
         </c>
         <c ca="left">
            <p>88.06</p>
         </c>
         <c ca="left">
            <p>86.67</p>
         </c>
         <c ca="left">
            <p>32.14</p>
         </c>
         <c ca="left">
            <p>79.73</p>
         </c>
         <c ca="left">
            <p>52.17</p>
         </c>
         <c ca="left">
            <p>73.2</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>k-NN<sup>&#8225;</sup>*</p>
         </c>
         <c ca="left">
            <p>67.65</p>
         </c>
         <c ca="left">
            <p>75</p>
         </c>
         <c ca="left">
            <p>86.67</p>
         </c>
         <c ca="left">
            <p>30</p>
         </c>
         <c ca="left">
            <p>63.33</p>
         </c>
         <c ca="left">
            <p>88.06</p>
         </c>
         <c ca="left">
            <p>93.33</p>
         </c>
         <c ca="left">
            <p>75.89</p>
         </c>
         <c ca="left">
            <p>64.86</p>
         </c>
         <c ca="left">
            <p>34.78</p>
         </c>
         <c ca="left">
            <p>67.96</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>PAM*</p>
         </c>
         <c ca="left">
            <p>
               <b>97.06</b>
            </p>
            <p>(44)</p>
         </c>
         <c ca="left">
            <p>78.13</p>
            <p>(13)</p>
         </c>
         <c ca="left">
            <p>93.33</p>
            <p>(62)</p>
         </c>
         <c ca="left">
            <p>95</p>
            <p>(285)</p>
         </c>
         <c ca="left">
            <p>
               <b>93.33</b>
            </p>
            <p>(4822)</p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
            <p>(614)</p>
         </c>
         <c ca="left">
            <p>90</p>
            <p>(3949)</p>
         </c>
         <c ca="left">
            <p>
               <b>93.75</b>
            </p>
            <p>(3338)</p>
         </c>
         <c ca="left">
            <p>
               <b>87.84</b>
            </p>
            <p>(2008)</p>
         </c>
         <c ca="left">
            <p>56.52</p>
            <p>(1253)</p>
         </c>
         <c ca="left">
            <p>88.5</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>1-vs-1-SVM<sup>&#8225;</sup>*</p>
         </c>
         <c ca="left">
            <p>79.41</p>
         </c>
         <c ca="left">
            <p>
               <b>87.5</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
         </c>
         <c ca="left">
            <p>83.33</p>
         </c>
         <c ca="left">
            <p>97.01</p>
         </c>
         <c ca="left">
            <p>
               <b>100</b>
            </p>
         </c>
         <c ca="left">
            <p>84.82</p>
         </c>
         <c ca="left">
            <p>83.78</p>
         </c>
         <c ca="left">
            <p>65.22</p>
         </c>
         <c ca="left">
            <p>88.11</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>*Results reported in Tan et al. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp><sup>&#8224;</sup>Results obtained with our method (TSG) <sup>&#8225;</sup>: NB, k-NN, 1-vs-1-SVM used entire set of genes in classification. That is, 7129, 7129, 12582, 2308, 9216, 12600, 4026, 12558, 12533, 16063 for the ten data sets respectively.</p>
   </tblfn></tbl>
<p>The percent of increase in accuracy for HC-k-TSP and TSG over HC-TSP is shown in Figure <figr fid="F4">4</figr>. The similar bar heights for the first and third bars in Leuk1, Leuk2, SRBCT, and GCM datasets indicate that HC-k-TSP and TSG improve upon TSP with similar amount for these datasets. For the Breast, DLBCL, Leuk3 datasets, TSG achieved a lot more accuracy gain over HC-TSP than HC-k-TSP. For these three datasets, the gain of accuracy of TSG over HC-k-TSP is 30%, 12%, and 10.87% respectively. For Lung1, Lung2 and on average, TSG also has better accuracy than HC-k-TSP but the gain of TSG over HC-k-TSP is not more than 5%. For the cancers dataset, TSG lost 3.27% accuracy than HC-k-TSP. In summary, TSG gains more accuracy than HC-k-TSP in all except for one dataset. We remark that there are three schemes of extending binary classifier TSP to multi-class classifiers (1-vs-1, 1-vs-others, and hierarchical classification schemes). The HC-k-TSP and HC-TSP results reported in Tan et al. <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp> are with the hierarchical scheme that performs best out of all three schemes. Therefore, TSG classifier has even better performance than the 1-vs-1 and 1-vs-others multi-class extensions of TSP family classifiers.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Percent of improvement in accuracy for TSG over TSP family classifiers in multi-class classification</p></caption><text>
   <p><b>Percent of improvement in accuracy for TSG over TSP family classifiers in multi-class classification</b>.</p>
</text><graphic file="1755-8794-6-S1-S3-4"/></fig>
<p>The accuracy based on independent test samples for TSG, HC-k-TSP, and SVM relative to the accuracy of PAM is presented in the right panel of Figure <figr fid="F3">3</figr>. For half of the datasets SVM has better performance than PAM and for the other half of the datasets, PAM performs better than SVM in accuracy. So on average, SVM and PAM are comparable. The accuracy of HC-k-TSP in majority of datasets is below that of PAM. On average, HC-k-TSP has accuracy slightly lower than PAM. TSG has better accuracy than PAM in five of ten datasets, and TSG is not as accurate as PAM in four datasets. On average, TSG, PAM and SVM have comparable performance with TSG only slightly better. In summary, for multi-class problems, TSG, SVM and PAM are better than HC-k-TSP in accuracy.</p>
</sec>
<sec>
<st>
<p>The number of genes used in classifiers</p>
</st>
<p>For comparison of classifiers with finite number of samples, the number of genes used by each classifier is an important factor. Classifiers using more genes tend to overfit the data. Hence classifiers with small number of genes are preferred. Since SVM, k-NN, and NB use the entire set of genes in their classification algorithm, we eliminate them from further discuss on number of genes used and restrict the rest of the discussion in this subsection to TSP, HC-TSP, DT, k-TSP, HC-k-TSP, TSG, and PAM. For these classifiers, we plot the number of genes used in each dataset for each classifier. For quite many datasets, PAM used hundreds or thousands of genes in the final classifiers. So we set the upper limit of the vertical axis in Figure <figr fid="F5">5</figr> to be 50 in binary cases and 140 in multi-class cases so that the numbers used by other classifiers can be seen. The numbers for the same classifier under different datasets are connected for convenience of viewing.</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Plot of the number of genes for TSP(or HC-TSP), TSG, k-TSP(or HC-k-TSP), DT, and PAM</p></caption><text>
   <p><b>Plot of the number of genes for TSP(or HC-TSP), TSG, k-TSP(or HC-k-TSP), DT, and PAM</b>. Labels 1,..., 5 in plots represent: 1: TSP in binary and HC-TSP in multi-class cases; 2: TSG; 3: k-TSP in binary and HC-k-TSP in multi-class cases; 4: DT; 5: PAM. The number of genes used in PAM is not shown in the graph if they are beyond the upper limit of the vertical axis. The numbers not shown for PAM include 2296 and 701 for Leuk and Pros3 data respectively on binary cases, and 285, 4822, 614, 3949, 3338, 2008, 1253 genes for SRBCT, Breast, Lung2, DLBCL, Leuk3, Cancers, and GCM data respectively in multi-class cases.</p>
</text><graphic file="1755-8794-6-S1-S3-5"/></fig>
<p>It can be seen from Figure <figr fid="F5">5</figr> that TSP classifier has the lowest number of genes in binary data classification. PAM used much more number of genes than other classifiers. The classifier that has the second smallest number of genes in binary data classification is TSG, followed by DT that uses the third smallest number of genes. The k-TSP classifier in general used more number of genes than TSP, TSG, and DT.</p>
<p>For multi-class data, DT uses the least number of genes followed by HC-TSP in the second place. However, as discussed in Section 3.1, DT and HC-TSP are not as accurate as SVM, PAM, HC-k-TSP, and TSG. In 4 out of 10 datasets TSG used more genes than HC-k-TSP and in 5 datasets TSG used fewer genes than HC-k-TSP. On average across all 10 datasets, TSG uses 51 genes and HC-k-TSP uses 53.4 genes. PAM consistently uses a lot more genes than other classifiers except in Lung1 data. In fact, the number of genes used by PAM is in the magnitude of thousands in order to reach comparable accuracy as TSG. Recall that on average TSG, PAM and SVM have comparable accuracy for independent test data and HC-k-TSP has lower performance. Now combining the accuracy and the number of genes used, TSG outperforms the rest in that it uses smaller number of genes to reach high accuracy. Smaller number of genes makes it feasible to perform follow-up studies and further experimental verification after the informative genes selection.</p>
</sec>
<sec>
<st>
<p>Interpretation of the TSG classifier</p>
</st>
<p>The TSG classifier has an easy interpretation. Recall that the main rule to classify a sample is based on which class gives the highest Chisquare statistic value when the sample is assigned to that class (see section 2.3). Note that there is a one to one correspondence between the Chisquare statistic values and the degree of departure from independence between the class label and the column variables in Table <tblr tid="T2">2</tblr>. The bigger the Chisquare statistic, the further the departure is from independence and vice versa. So the class label assignment tries to maximize the dependence between the classes and the selected variables for a given set of training data and a test sample.</p>
<p>The TSG classifier has the same interpretation as the TSP classifier. In particular, suppose the expression values for these two genes are E<sub>1 </sub>and E<sub>2 </sub>for a sample, the prediction of the class label of this sample depends on whether E<sub>1 </sub>&lt; E<sub>2</sub>. For Leuk, CNS, and pros3 data, TSG achieved accuracy 98.61, 97.06, and 100 for these three datasets respectively using 2 genes. Also using 2 genes, TSP only achieved 93.80, 77.90, 97.00 in LOOCV accuracy for these three datasets. Therefore, TSG finds genes that are even more informative than TSP classifier in these three datasets. When the selected informative genes have more than 2 genes, then the class prediction of a sample depends on all pairwise comparisons of the expression values <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i34"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>E</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-punc">,</m:mo>
   <m:mo class="MathClass-rel">&#8943;</m:mo>
   <m:mspace class="thinspace" width="0.3em"/>
   <m:mo class="MathClass-punc">,</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>E</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula> for the selected genes i<sub>1</sub>, i<sub>2</sub>,..., i<sub>k </sub>from this sample, i.e., which of the inequalities <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i35"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>E</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">&lt;</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>E</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>, <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i36"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>E</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">&#8805;</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>E</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>,......, <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i37"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>E</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">&lt;</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>E</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula>, <inline-formula>
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1755-8794-6-S1-S3-i38"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>E</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">&#8805;</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>E</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>k</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</inline-formula> are true. Due to the similar interpretation to the TSP family classifiers, we do not reiterate for TSG and refer the readers to Tan et al. <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp> for details.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>In this article, we presented the TSG classifier, an improved version of TSP family classifiers for both binary and multi-class cancer classification. The TSP family classifiers only consider even number of genes and the gene selection process is based on the marginal comparison of pairwise expression values without honoring the fact that some of the marginally important genes may have similar effects as others and therefore could be redundant. We solved aforementioned shortcomings of TSP family classifiers by allowing both even and odd number of genes through newly defined score functions and a new selection algorithm. After some genes have been selected, our gene selection process assesses the importance of additional genes by considering the overall contribution of all the genes included in the informative set. As the joint effects of multiple genes are evaluated together, we expect that the final list of genes selected by TSG is more parsimonious than k-TSP and HC-k-TSP classifiers.</p>
<p>The TSG classifier is in a simple unified form for both binary and multi-class cases. This is different from the TSP family classifiers in that three binary to multi-class extension schemes (1-vs-1, 1-vs-others, hierarchical classification) lead to three different classifiers. As reported in Tan et al <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>, the hierarchical classification scheme extension HC-TSP and HC-k-TSP perform the best out of the three schemes. Our TSG classifier is in a single form and in general has equal performance or outperforms k-TSP, HC-TSP and HC-k-TSP in the 19 datasets in terms of accuracy and number of genes used. We also compared the performance of TSG with PAM and SVM. In binary classification problems, TSG has better LOOCV accuracy than PAM and SVM; in multi-class problems, TSG, PAM, and SVM give comparable accuracy for independent test data. All three classifiers are more accurate than TSP family classifiers. In terms of the number of genes used, TSG clearly uses much less number of genes than PAM and SVM. PAM often selects thousands of genes in its final classifier and SVM uses the entire set of genes.</p>
<p>An obvious advantage of the TSG as well as the TSP family classifiers is that they are based on the simple pairwise comparisons of expression values between genes from the same sample. Such comparison is robust to monotone transformation and eliminates the concern about variations among different patients, platforms, or bias from preprocessing different samples. Therefore, we expect that the results from TSG are more reliable and robust compared to many other methods that pool the data from different samples to filter genes.</p>
</sec>
<sec>
<st>
<p>List of abbreviations used</p>
</st>
<p>TSP: top scoring pair; k-TSP: k top scoring pairs; HC-TSP: multi-class extension of TSP with hierarchical classification scheme; HC-k-TSP: multi-class extension of k-TSP with hierarchical classification scheme; SVM: Support Vector Machine classification; PAM: Prediction Analysis of Microarray; LOOCV: leave-one-out cross validation.</p>
</sec>
<sec>
<st>
<p>Competing interests</p>
</st>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>ZY and HZ developed the algorithm; HZ performed gene selection and classification for the microarray data; HW conducted some of the comparisons, summarized the results and drafted the manuscript; HZ and ZD designed the software; ZY and MC provided discussion and revised the manuscript for this study. All authors have approved the final version of the manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>The research was supported by a grant from the Science Foundation for Distinguished Young Scholars of Hunan Province, China (No. 10JJ1005) to ZY, the Scientific Research Fund of Hunan Provincial Education Department (No. 11C0654) to HZ, and a grant from the Simons Foundation to HW (#246077).</p>
<p>This article has been published as part of <it>BMC Medical Genomics </it>Volume 6 Supplement 1, 2013: Proceedings of the 2011 International Conference on Bioinformatics and Computational Biology (BIOCOMP'11). The full contents of the supplement are available online at <url>http://www.biomedcentral.com/bmcmedgenomics/supplements/6/S1</url>. Publication of this supplement has been supported by the International Society of Intelligent Biological Medicine.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Diagnosis of multiple cancer types by shrunken centroids of gene expression</p></title><aug><au><snm>Tibshirani</snm><fnm>R</fnm></au><au><snm>Hastie</snm><fnm>T</fnm></au><au><snm>Narasimhan</snm><fnm>B</fnm></au><au><snm>Chu</snm><fnm>G</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2002</pubdate><volume>99</volume><fpage>6567</fpage><lpage>6572</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.082099299</pubid><pubid idtype="pmcid">124443</pubid><pubid idtype="pmpid" link="fulltext">12011421</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Classifying Gene Expression Profiles from Pairwise mRNA Comparisons</p></title><aug><au><snm>Geman</snm><fnm>D</fnm></au><au><snm>d'Avignon</snm><fnm>C</fnm></au><au><snm>Naiman</snm><fnm>D</fnm></au><au><snm>Winslow</snm><fnm>R</fnm></au></aug><source>Statistical Applications in Genetics and Molecular Biology</source><pubdate>2004</pubdate><note>doi: 10.2202/1544-6115.1071</note></bibl><bibl id="B3"><title><p>Simple decision rules for classifying human cancers from gene expression profiles</p></title><aug><au><snm>Tan</snm><fnm>AC</fnm></au><au><snm>Naiman</snm><fnm>DQ</fnm></au><au><snm>Xu</snm><fnm>L</fnm></au><au><snm>Winslow</snm><fnm>RL</fnm></au><au><snm>Geman</snm><fnm>D</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><fpage>3896</fpage><lpage>3904</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti631</pubid><pubid idtype="pmcid">1987374</pubid><pubid idtype="pmpid" link="fulltext">16105897</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>LIBSVM: a library for support vector machines</p></title><aug><au><snm>Chang</snm><fnm>CC</fnm></au><au><snm>Lin</snm><fnm>CJ</fnm></au></aug><source>ACM Transactions on Intelligent, Systems and Technology</source><pubdate>2011</pubdate><volume>2</volume><issue>27</issue><fpage>1</fpage><lpage>27</lpage></bibl><bibl id="B5"><title><p>A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression</p></title><aug><au><snm>Li</snm><fnm>T</fnm></au><au><snm>Zhang</snm><fnm>C</fnm></au><au><snm>Ogihara</snm><fnm>M</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><fpage>2429</fpage><lpage>2437</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth267</pubid><pubid idtype="pmpid" link="fulltext">15087314</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>A stable gene selection in microarray data analysis</p></title><aug><au><snm>Yang</snm><fnm>K</fnm></au><au><snm>Cai</snm><fnm>Z</fnm></au><au><snm>Li</snm><fnm>J</fnm></au><au><snm>Lin</snm><fnm>G</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>228</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-228</pubid><pubid idtype="pmcid">1524991</pubid><pubid idtype="pmpid" link="fulltext">16643657</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Minimum redundancy feature selection from microarray gene expression data</p></title><aug><au><snm>Ding</snm><fnm>C</fnm></au><au><snm>Peng</snm><fnm>H</fnm></au></aug><source>J Bioinform Comput Biol</source><pubdate>2005</pubdate><volume>3</volume><issue>2</issue><fpage>185</fpage><lpage>205</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1142/S0219720005001004</pubid><pubid idtype="pmpid">15852500</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data</p></title><aug><au><snm>Ooi</snm><fnm>CH</fnm></au><au><snm>Chetty</snm><fnm>M</fnm></au><au><snm>Teng</snm><fnm>SW</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>320</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-320</pubid><pubid idtype="pmcid">1569877</pubid><pubid idtype="pmpid" link="fulltext">16796748</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Gene selection for classification of microarray data based on the bayes error</p></title><aug><au><snm>Zhang</snm><fnm>JG</fnm></au><au><snm>Deng</snm><fnm>HW</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2007</pubdate><volume>8</volume><issue>1</issue><fpage>37</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-8-37</pubid><pubid idtype="pmcid">1802094</pubid><pubid idtype="pmpid" link="fulltext">17270045</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Exploring the within and between class correlation distributions for tumor classification</p></title><aug><au><snm>Wei</snm><fnm>X</fnm></au><au><snm>Li</snm><fnm>K</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2010</pubdate><volume>107</volume><issue>15</issue><fpage>6737</fpage><lpage>6742</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0910140107</pubid><pubid idtype="pmcid">2872377</pubid><pubid idtype="pmpid" link="fulltext">20339085</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Gene selection and classification for cancer microarray data based on machine learning and similarity measures</p></title><aug><au><snm>Liu</snm><fnm>Q</fnm></au><au><snm>Sung</snm><fnm>A</fnm></au><au><snm>Chen</snm><fnm>Z</fnm></au><au><snm>Chen</snm><fnm>L</fnm></au><au><snm>Liu</snm><fnm>J</fnm></au><au><snm>Qiao</snm><fnm>M</fnm></au><au><snm>Wang</snm><fnm>Z</fnm></au><au><snm>Huang</snm><fnm>X</fnm></au><au><snm>Deng</snm><fnm>Y</fnm></au></aug><source>BMC Genomics</source><pubdate>2011</pubdate><volume>12</volume><issue>Suppl 5</issue><fpage>S1</fpage><note>doi:10.1186/1471-2164-12-S5-S1</note><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-12-S5-S1</pubid><pubid idtype="pmcid">3287578</pubid><pubid idtype="pmpid" link="fulltext">22369514</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Gene selection for cancer classification using support vector machines</p></title><aug><au><snm>Guyon</snm><fnm>I</fnm></au><au><snm>Weston</snm><fnm>J</fnm></au><au><snm>Barnhill</snm><fnm>S</fnm></au><au><snm>Vapnik</snm><fnm>V</fnm></au></aug><source>Machine learning</source><pubdate>2002</pubdate><volume>46</volume><fpage>389</fpage><lpage>422</lpage><xrefbib><pubid idtype="doi">10.1023/A:1012487302797</pubid></xrefbib></bibl><bibl id="B13"><title><p>Variable selection using svm based criteria</p></title><aug><au><snm>Rakotomamonjy</snm><fnm>A</fnm></au></aug><source>J Mach Learn Res</source><pubdate>2003</pubdate><volume>3</volume><fpage>1357</fpage><lpage>1370</lpage></bibl><bibl id="B14"><title><p>Gene selection and classification of microarray data using random forest</p></title><aug><au><snm>D&#8055;az-Uriarte</snm><fnm>R</fnm></au><au><snm>Alvarez de Andr&#233;s</snm><fnm>S</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>3</fpage><note>doi:10.1186/1471-2105-7-3</note><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-3</pubid><pubid idtype="pmcid">1363357</pubid><pubid idtype="pmpid" link="fulltext">16398926</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>The random subspace method for constructing decision forests</p></title><aug><au><snm>Ho</snm><fnm>TK</fnm></au></aug><source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source><pubdate>1998</pubdate><volume>20</volume><issue>8</issue><fpage>832</fpage><lpage>844</lpage><xrefbib><pubid idtype="doi">10.1109/34.709601</pubid></xrefbib></bibl><bibl id="B16"><title><p>Weighted random subspace method for high dimensional data classification</p></title><aug><au><snm>Li</snm><fnm>X</fnm></au><au><snm>Zhao</snm><fnm>H</fnm></au></aug><source>Statistics and its Interface</source><pubdate>2009</pubdate><volume>2</volume><fpage>153</fpage><lpage>159</lpage><xrefbib><pubidlist><pubid idtype="pmcid">3170928</pubid><pubid idtype="pmpid">21918713</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling</p></title><aug><au><snm>Alizadeh</snm><fnm>AA</fnm></au><au><snm>Eisen</snm><fnm>MB</fnm></au><au><snm>Davis</snm><fnm>RE</fnm></au><au><snm>Ma</snm><fnm>C</fnm></au><au><snm>Lossos</snm><fnm>IS</fnm></au><au><snm>Rosenwald</snm><fnm>A</fnm></au><au><snm>Boldrick</snm><fnm>JC</fnm></au><au><snm>Sabet</snm><fnm>H</fnm></au><au><snm>Tran</snm><fnm>T</fnm></au><au><snm>Yu</snm><fnm>X</fnm></au><au><snm>Powell</snm><fnm>JI</fnm></au><au><snm>Yang</snm><fnm>L</fnm></au><au><snm>Marti</snm><fnm>GE</fnm></au><au><snm>Moore</snm><fnm>T</fnm></au><au><snm>Hudson</snm><fnm>J</fnm><suf>Jr</suf></au><au><snm>Lu</snm><fnm>L</fnm></au><au><snm>Lewis</snm><fnm>DB</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au><au><snm>Chan</snm><fnm>WC</fnm></au><au><snm>Greiner</snm><fnm>TC</fnm></au><au><snm>Weisenburger</snm><fnm>DD</fnm></au><au><snm>Armitage</snm><fnm>JO</fnm></au><au><snm>Warnke</snm><fnm>R</fnm></au><au><snm>Levy</snm><fnm>R</fnm></au><au><snm>Wilson</snm><fnm>W</fnm></au><au><snm>Grever</snm><fnm>MR</fnm></au><au><snm>Byrd</snm><fnm>JC</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au><au><snm>Staudt</snm><fnm>LM</fnm></au></aug><source>Nature</source><pubdate>2000</pubdate><volume>403</volume><fpage>503</fpage><lpage>511</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/35000501</pubid><pubid idtype="pmpid" link="fulltext">10676951</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays</p></title><aug><au><snm>Alon</snm><fnm>U</fnm></au><au><snm>Barkai</snm><fnm>N</fnm></au><au><snm>Notterman</snm><fnm>DA</fnm></au><au><snm>Gish</snm><fnm>K</fnm></au><au><snm>Ybarra</snm><fnm>S</fnm></au><au><snm>Mack</snm><fnm>D</fnm></au><au><snm>Levine</snm><fnm>AJ</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>1999</pubdate><volume>96</volume><fpage>6745</fpage><lpage>6750</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.96.12.6745</pubid><pubid idtype="pmcid">21986</pubid><pubid idtype="pmpid" link="fulltext">10359783</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Molecular classification of cancer: class discovery and class prediction by gene expression monitoring</p></title><aug><au><snm>Golub</snm><fnm>TR</fnm></au><au><snm>Slonim</snm><fnm>DK</fnm></au><au><snm>Tamayo</snm><fnm>P</fnm></au><au><snm>Huard</snm><fnm>C</fnm></au><au><snm>Gaasenbeek</snm><fnm>M</fnm></au><au><snm>Mesirov</snm><fnm>JP</fnm></au><au><snm>Coller</snm><fnm>H</fnm></au><au><snm>Loh</snm><fnm>ML</fnm></au><au><snm>Downing</snm><fnm>JR</fnm></au><au><snm>Caligiuri</snm><fnm>MA</fnm></au><au><snm>Bloomfield</snm><fnm>CD</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au></aug><source>Science</source><pubdate>1999</pubdate><volume>286</volume><issue>5439</issue><fpage>531</fpage><lpage>537</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.286.5439.531</pubid><pubid idtype="pmpid" link="fulltext">10521349</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Prediction of central nervous system embryonal tumour outcome based on gene expression</p></title><aug><au><snm>Pomeroy</snm><fnm>SL</fnm></au><au><snm>Tamayo</snm><fnm>P</fnm></au><au><snm>Gaasenbeek</snm><fnm>M</fnm></au><au><snm>Sturla</snm><fnm>LM</fnm></au><au><snm>Angelo</snm><fnm>M</fnm></au><au><snm>McLaughlin</snm><fnm>ME</fnm></au><au><snm>Kim</snm><fnm>JY</fnm></au><au><snm>Goumnerova</snm><fnm>LC</fnm></au><au><snm>Black</snm><fnm>PM</fnm></au><au><snm>Lau</snm><fnm>C</fnm></au><au><snm>Allen</snm><fnm>JC</fnm></au><au><snm>Zagzag</snm><fnm>D</fnm></au><au><snm>Olson</snm><fnm>JM</fnm></au><au><snm>Curran</snm><fnm>T</fnm></au><au><snm>Wetmore</snm><fnm>C</fnm></au><au><snm>Biegel</snm><fnm>JA</fnm></au><au><snm>Poggio</snm><fnm>T</fnm></au><au><snm>Mukherjee</snm><fnm>S</fnm></au><au><snm>Rifkin</snm><fnm>R</fnm></au><au><snm>Califano</snm><fnm>A</fnm></au><au><snm>Stolovitzky</snm><fnm>G</fnm></au><au><snm>Louis</snm><fnm>DN</fnm></au><au><snm>Mesirov</snm><fnm>JP</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Golub</snm><fnm>TR</fnm></au></aug><source>Nature</source><pubdate>2002</pubdate><volume>415</volume><fpage>436</fpage><lpage>442</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/415436a</pubid><pubid idtype="pmpid" link="fulltext">11807556</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning</p></title><aug><au><snm>Shipp</snm><fnm>MA</fnm></au><au><snm>Ross</snm><fnm>KN</fnm></au><au><snm>Tamayo</snm><fnm>P</fnm></au><au><snm>Weng</snm><fnm>AP</fnm></au><au><snm>Kutok</snm><fnm>JL</fnm></au><au><snm>Aguiar</snm><fnm>RC</fnm></au><au><snm>Gaasenbeek</snm><fnm>M</fnm></au><au><snm>Angelo</snm><fnm>M</fnm></au><au><snm>Reich</snm><fnm>M</fnm></au><au><snm>Pinkus</snm><fnm>GS</fnm></au><au><snm>Ray</snm><fnm>TS</fnm></au><au><snm>Koval</snm><fnm>MA</fnm></au><au><snm>Last</snm><fnm>KW</fnm></au><au><snm>Norton</snm><fnm>A</fnm></au><au><snm>Lister</snm><fnm>TA</fnm></au><au><snm>Mesirov</snm><fnm>J</fnm></au><au><snm>Neuberg</snm><fnm>DS</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Aster</snm><fnm>JC</fnm></au><au><snm>Golub</snm><fnm>TR</fnm></au></aug><source>Nat Med</source><pubdate>2002</pubdate><volume>8</volume><fpage>68</fpage><lpage>74</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nm0102-68</pubid><pubid idtype="pmpid" link="fulltext">11786909</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma</p></title><aug><au><snm>Gordon</snm><fnm>GJ</fnm></au><au><snm>Jensen</snm><fnm>RV</fnm></au><au><snm>Hsiao</snm><fnm>LL</fnm></au><au><snm>Gullans</snm><fnm>SR</fnm></au><au><snm>Blumenstock</snm><fnm>JE</fnm></au><au><snm>Ramaswami</snm><fnm>S</fnm></au><au><snm>Richards</snm><fnm>WG</fnm></au><au><snm>Sugarbaker</snm><fnm>DJ</fnm></au><au><snm>Bueno</snm><fnm>R</fnm></au></aug><source>Cancer Res</source><pubdate>2002</pubdate><volume>62</volume><fpage>4963</fpage><lpage>4967</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12208747</pubid></xrefbib></bibl><bibl id="B23"><title><p>Gene expression correlates of clinical prostate cancer behaviour</p></title><aug><au><snm>Singh</snm><fnm>D</fnm></au><au><snm>Febbo</snm><fnm>PG</fnm></au><au><snm>Ross</snm><fnm>K</fnm></au><au><snm>Jackson</snm><fnm>DG</fnm></au><au><snm>Manola</snm><fnm>J</fnm></au><au><snm>Ladd</snm><fnm>C</fnm></au><au><snm>Tamayo</snm><fnm>P</fnm></au><au><snm>Renshaw</snm><fnm>AA</fnm></au><au><snm>D'Amico</snm><fnm>AV</fnm></au><au><snm>Richie</snm><fnm>JP</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Loda</snm><fnm>M</fnm></au><au><snm>Kantoff</snm><fnm>PW</fnm></au><au><snm>Golub</snm><fnm>TR</fnm></au><au><snm>Sellers</snm><fnm>WR</fnm></au></aug><source>Cancer Cell</source><pubdate>2002</pubdate><volume>1</volume><fpage>203</fpage><lpage>209</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1535-6108(02)00030-2</pubid><pubid idtype="pmpid" link="fulltext">12086878</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>In silico dissection of cell-type-associated patterns of gene expression in prostate cancer</p></title><aug><au><snm>Stuart</snm><fnm>RO</fnm></au><au><snm>Wachsman</snm><fnm>W</fnm></au><au><snm>Berry</snm><fnm>CC</fnm></au><au><snm>Wang-Rodriguez</snm><fnm>J</fnm></au><au><snm>Wasserman</snm><fnm>L</fnm></au><au><snm>Klacansky</snm><fnm>I</fnm></au><au><snm>Masys</snm><fnm>D</fnm></au><au><snm>Arden</snm><fnm>K</fnm></au><au><snm>Goodison</snm><fnm>S</fnm></au><au><snm>McClelland</snm><fnm>M</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au><au><snm>Sawyers</snm><fnm>A</fnm></au><au><snm>Kalcheva</snm><fnm>I</fnm></au><au><snm>Tarin</snm><fnm>D</fnm></au><au><snm>Mercola</snm><fnm>D</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2004</pubdate><volume>101</volume><fpage>615</fpage><lpage>620</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.2536479100</pubid><pubid idtype="pmcid">327196</pubid><pubid idtype="pmpid" link="fulltext">14722351</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Analysis of Gene Expression Identifies Candidate Markers and Pharmacological Targets in Prostate Cancer</p></title><aug><au><snm>Welsh</snm><fnm>JB</fnm></au><au><snm>Sapinoso</snm><fnm>LM</fnm></au><au><snm>Su</snm><fnm>AI</fnm></au><au><snm>Kern</snm><fnm>SG</fnm></au><au><snm>Wang-Rodriguez</snm><fnm>J</fnm></au><au><snm>Moskaluk</snm><fnm>CA</fnm></au><au><snm>Frierson</snm><fnm>HF</fnm><suf>Jr</suf></au><au><snm>Hampton</snm><fnm>GM</fnm></au></aug><source>Cancer Res</source><pubdate>2001</pubdate><volume>61</volume><fpage>5974</fpage><lpage>5978</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">11507037</pubid></xrefbib></bibl><bibl id="B26"><title><p>Multiclass cancer diagnosis using tumor gene expression signatures</p></title><aug><au><snm>Ramaswamy</snm><fnm>S</fnm></au><au><snm>Tamayo</snm><fnm>P</fnm></au><au><snm>Rifkin</snm><fnm>R</fnm></au><au><snm>Mukherjee</snm><fnm>S</fnm></au><au><snm>Yeang</snm><fnm>CH</fnm></au><au><snm>Angelo</snm><fnm>M</fnm></au><au><snm>Ladd</snm><fnm>C</fnm></au><au><snm>Reich</snm><fnm>M</fnm></au><au><snm>Latulippe</snm><fnm>E</fnm></au><au><snm>Mesirov</snm><fnm>JP</fnm></au><au><snm>Poggio</snm><fnm>T</fnm></au><au><snm>Gerald</snm><fnm>W</fnm></au><au><snm>Loda</snm><fnm>M</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Golub</snm><fnm>TR</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2001</pubdate><volume>98</volume><fpage>15149</fpage><lpage>15154</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.211566398</pubid><pubid idtype="pmcid">64998</pubid><pubid idtype="pmpid" link="fulltext">11742071</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Gene-expression profiles predict survival of patients with lung adenocarcinoma</p></title><aug><au><snm>Beer</snm><fnm>DG</fnm></au><au><snm>Kardia</snm><fnm>SL</fnm></au><au><snm>Huang</snm><fnm>CC</fnm></au><au><snm>Giordano</snm><fnm>TJ</fnm></au><au><snm>Levin</snm><fnm>AM</fnm></au><au><snm>Misek</snm><fnm>DE</fnm></au><au><snm>Lin</snm><fnm>L</fnm></au><au><snm>Chen</snm><fnm>G</fnm></au><au><snm>Gharib</snm><fnm>TG</fnm></au><au><snm>Thomas</snm><fnm>DG</fnm></au><au><snm>Lizyness</snm><fnm>ML</fnm></au><au><snm>Kuick</snm><fnm>R</fnm></au><au><snm>Hayasaka</snm><fnm>S</fnm></au><au><snm>Taylor</snm><fnm>JM</fnm></au><au><snm>Iannettoni</snm><fnm>MD</fnm></au><au><snm>Orringer</snm><fnm>MB</fnm></au><au><snm>Hanash</snm><fnm>S</fnm></au></aug><source>Nat Med</source><pubdate>2002</pubdate><volume>8</volume><fpage>816</fpage><lpage>824</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12118244</pubid></xrefbib></bibl><bibl id="B28"><title><p>MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia</p></title><aug><au><snm>Armstrong</snm><fnm>SA</fnm></au><au><snm>Staunton</snm><fnm>JE</fnm></au><au><snm>Silverman</snm><fnm>LB</fnm></au><au><snm>Pieters</snm><fnm>R</fnm></au><au><snm>den Boer</snm><fnm>ML</fnm></au><au><snm>Minden</snm><fnm>MD</fnm></au><au><snm>Sallan</snm><fnm>SE</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Golub</snm><fnm>TR</fnm></au><au><snm>Korsmeyer</snm><fnm>SJ</fnm></au></aug><source>Nat Genet</source><pubdate>2002</pubdate><volume>30</volume><fpage>41</fpage><lpage>47</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng765</pubid><pubid idtype="pmpid" link="fulltext">11731795</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks</p></title><aug><au><snm>Khan</snm><fnm>J</fnm></au><au><snm>Wei</snm><fnm>JS</fnm></au><au><snm>Ringn&#233;r</snm><fnm>M</fnm></au><au><snm>Saal</snm><fnm>LH</fnm></au><au><snm>Ladanyi</snm><fnm>M</fnm></au><au><snm>Westermann</snm><fnm>F</fnm></au><au><snm>Berthold</snm><fnm>F</fnm></au><au><snm>Schwab</snm><fnm>M</fnm></au><au><snm>Antonescu</snm><fnm>CR</fnm></au><au><snm>Peterson</snm><fnm>C</fnm></au><au><snm>Meltzer</snm><fnm>PS</fnm></au></aug><source>Nat Med</source><pubdate>2001</pubdate><volume>7</volume><fpage>673</fpage><lpage>679</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/89044</pubid><pubid idtype="pmcid">1282521</pubid><pubid idtype="pmpid" link="fulltext">11385503</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Molecular portraits of human breast tumours</p></title><aug><au><snm>Perou</snm><fnm>CM</fnm></au><au><snm>S&#248;rlie</snm><fnm>T</fnm></au><au><snm>Eisen</snm><fnm>MB</fnm></au><au><snm>van de Rijn</snm><fnm>M</fnm></au><au><snm>Jeffrey</snm><fnm>SS</fnm></au><au><snm>Rees</snm><fnm>CA</fnm></au><au><snm>Pollack</snm><fnm>JR</fnm></au><au><snm>Ross</snm><fnm>DT</fnm></au><au><snm>Johnsen</snm><fnm>H</fnm></au><au><snm>Akslen</snm><fnm>LA</fnm></au><au><snm>Fluge</snm><fnm>O</fnm></au><au><snm>Pergamenschikov</snm><fnm>A</fnm></au><au><snm>Williams</snm><fnm>C</fnm></au><au><snm>Zhu</snm><fnm>SX</fnm></au><au><snm>L&#248;nning</snm><fnm>PE</fnm></au><au><snm>B&#248;rresen-Dale</snm><fnm>AL</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au></aug><source>Nature</source><pubdate>2000</pubdate><volume>406</volume><fpage>747</fpage><lpage>752</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/35021093</pubid><pubid idtype="pmpid" link="fulltext">10963602</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses</p></title><aug><au><snm>Bhattacharjee</snm><fnm>A</fnm></au><au><snm>Richards</snm><fnm>WG</fnm></au><au><snm>Staunton</snm><fnm>J</fnm></au><au><snm>Li</snm><fnm>C</fnm></au><au><snm>Monti</snm><fnm>S</fnm></au><au><snm>Vasa</snm><fnm>P</fnm></au><au><snm>Ladd</snm><fnm>C</fnm></au><au><snm>Beheshti</snm><fnm>J</fnm></au><au><snm>Bueno</snm><fnm>R</fnm></au><au><snm>Gillette</snm><fnm>M</fnm></au><au><snm>Loda</snm><fnm>M</fnm></au><au><snm>Weber</snm><fnm>G</fnm></au><au><snm>Mark</snm><fnm>EJ</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Wong</snm><fnm>W</fnm></au><au><snm>Johnson</snm><fnm>BE</fnm></au><au><snm>Golub</snm><fnm>TR</fnm></au><au><snm>Sugarbaker</snm><fnm>DJ</fnm></au><au><snm>Meyerson</snm><fnm>M</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2001</pubdate><volume>98</volume><fpage>13790</fpage><lpage>13795</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.191502998</pubid><pubid idtype="pmcid">61120</pubid><pubid idtype="pmpid" link="fulltext">11707567</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling</p></title><aug><au><snm>Yeoh</snm><fnm>EJ</fnm></au><au><snm>Ross</snm><fnm>ME</fnm></au><au><snm>Shurtleff</snm><fnm>SA</fnm></au><au><snm>Williams</snm><fnm>WK</fnm></au><au><snm>Patel</snm><fnm>D</fnm></au><au><snm>Mahfouz</snm><fnm>R</fnm></au><au><snm>Behm</snm><fnm>FG</fnm></au><au><snm>Raimondi</snm><fnm>SC</fnm></au><au><snm>Relling</snm><fnm>MV</fnm></au><au><snm>Patel</snm><fnm>A</fnm></au><au><snm>Cheng</snm><fnm>C</fnm></au><au><snm>Campana</snm><fnm>D</fnm></au><au><snm>Wilkins</snm><fnm>D</fnm></au><au><snm>Zhou</snm><fnm>X</fnm></au><au><snm>Li</snm><fnm>J</fnm></au><au><snm>Liu</snm><fnm>H</fnm></au><au><snm>Pui</snm><fnm>CH</fnm></au><au><snm>Evans</snm><fnm>WE</fnm></au><au><snm>Naeve</snm><fnm>C</fnm></au><au><snm>Wong</snm><fnm>L</fnm></au><au><snm>Downing</snm><fnm>JR</fnm></au></aug><source>Cancer Cell</source><pubdate>2002</pubdate><volume>1</volume><fpage>133</fpage><lpage>143</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1535-6108(02)00032-6</pubid><pubid idtype="pmpid" link="fulltext">12086872</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Molecular classification of human carcinomas by use of gene expression signatures</p></title><aug><au><snm>Su</snm><fnm>AI</fnm></au><au><snm>Welsh</snm><fnm>JB</fnm></au><au><snm>Sapinoso</snm><fnm>LM</fnm></au><au><snm>Kern</snm><fnm>SG</fnm></au><au><snm>Dimitrov</snm><fnm>P</fnm></au><au><snm>Lapp</snm><fnm>H</fnm></au><au><snm>Schultz</snm><fnm>PG</fnm></au><au><snm>Powell</snm><fnm>SM</fnm></au><au><snm>Moskaluk</snm><fnm>CA</fnm></au><au><snm>Frierson</snm><fnm>HF</fnm><suf>Jr</suf></au><au><snm>Hampton</snm><fnm>GM</fnm></au></aug><source>Cancer Res</source><pubdate>2001</pubdate><volume>61</volume><fpage>7388</fpage><lpage>7393</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">11606367</pubid></xrefbib></bibl></refgrp>
</bm></art>