<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-347</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>A unified framework for finding differentially expressed genes from microarray experiments</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Shaik</snm>
               <mi>S</mi>
               <fnm>Jahangheer</fnm>
               <insr iid="I1"/>
               <email>jshaik@memphis.edu</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Yeasin</snm>
               <fnm>Mohammed</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <insr iid="I4"/>
               <insr iid="I5"/>
               <email>myeasin@memphis.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Electrical and Computer Engineering, CVPIA Lab, University of Memphis, Memphis, TN-38152, USA</p>
            </ins>
            <ins id="I2">
               <p>Bioinformatics Program, CVPIA Lab, University of Memphis, Memphis, TN-38152, USA</p>
            </ins>
            <ins id="I3">
               <p>Biomedical Engineering, CVPIA Lab, University of Memphis, Memphis, TN-38152, USA</p>
            </ins>
            <ins id="I4">
               <p>4Center for Advanced Robotics, CVPIA Lab, University of Memphis, Memphis, TN-38152, USA</p>
            </ins>
            <ins id="I5">
               <p>Software Testing and Excellence Program  University of Memphis, Memphis, TN-38152, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>347</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/347</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17877806</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-347</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>29</day>
               <month>3</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>18</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>18</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Shaik and Yeasin; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>This paper presents a unified framework for finding differentially expressed genes (DEGs) from the microarray data. The proposed framework has three interrelated modules: (i) gene ranking, ii) significance analysis of genes and (iii) validation. The first module uses two gene selection algorithms, namely, a) two-way clustering and b) combined adaptive ranking to rank the genes. The second module converts the gene ranks into p-values using an R-test and fuses the two sets of p-values using the Fisher's omnibus criterion. The DEGs are selected using the FDR analysis. The third module performs three fold validations of the obtained DEGs. The robustness of the proposed unified framework in gene selection is first illustrated using false discovery rate analysis. In addition, the clustering-based validation of the DEGs is performed by employing an adaptive subspace-based clustering algorithm on the training and the test datasets. Finally, a projection-based visualization is performed to validate the DEGs obtained using the unified framework.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The performance of the unified framework is compared with well-known ranking algorithms such as t-statistics, Significance Analysis of Microarrays (SAM), Adaptive Ranking, Combined Adaptive Ranking and Two-way Clustering. The performance curves obtained using 50 simulated microarray datasets each following two different distributions indicate the superiority of the unified framework over the other reported algorithms. Further analyses on 3 real cancer datasets and 3 Parkinson's datasets show the similar improvement in performance. First, a 3 fold validation process is provided for the two-sample cancer datasets. In addition, the analysis on 3 sets of Parkinson's data is performed to demonstrate the scalability of the proposed method to multi-sample microarray datasets.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>This paper presents a unified framework for the robust selection of genes from the two-sample as well as multi-sample microarray experiments. Two different ranking methods used in module 1 bring diversity in the selection of genes. The conversion of ranks to p-values, the fusion of p-values and FDR analysis aid in the identification of significant genes which cannot be judged based on gene ranking alone. The 3 fold validation, namely, robustness in selection of genes using FDR analysis, clustering, and visualization demonstrate the relevance of the DEGs. Empirical analyses on 50 artificial datasets and 6 real microarray datasets illustrate the efficacy of the proposed approach. The analyses on 3 cancer datasets demonstrate the utility of the proposed approach on microarray datasets with two classes of samples. The scalability of the proposed unified approach to multi-sample (more than two sample classes) microarray datasets is addressed using three sets of Parkinson's Data. Empirical analyses show that the unified framework outperformed other gene selection methods in selecting differentially expressed genes from microarray data.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The high throughput experiments such as DNA microarrays have become one of the most popular biotechnologies to monitor the expression levels of thousands of genes simultaneously. Microarray experiments produce expression profiles measured under some experimental conditions and are normally labeled on the basis of external information such as, clinical identification of samples or expression of genes with respect to time <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. By analyzing microarray expression profiles one can deduce information that can provide significant understanding of the mechanism of the disease under study. Sophisticated statistical techniques are required to extract relevant genes given enormous amount of microarray data. The gene selection can be a challenging issue as the microarray data is skewed with a large number of genes in one dimension and a few samples in the other dimension. There is a large volume of biological and technical noise that must be normalized to generate a more uniform measure.</p>
         <p>The gene selection is performed typically using one of the following criteria, i) finding differential expression of genes individually (statistics based gene selection) or ii) co-expressed genes offering high discrimination between two classes of samples (clustering based gene selection). Both of these criteria lead to different computational procedures in the selection of differentially expressed genes (DEGs). A plethora of mathematical techniques have been developed for finding DEGs in microarray data, for example, <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. The performances of these methods are hard to quantify and compare as they yield significantly different results on the same dataset. This problem can be attributed to the assumptions behind the methods employed for ranking as well as to the unique characteristics of the microarray data. It is widely acknowledged that no single method is adequate to produce the desired result. The fusion of the algorithms that are diverse in nature may lead to the desired result <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. This paper proposes a gene selection method which is a blend of clustering and statistics based ranking. The gene selection is performed first by employing the two-way clustering and statistics based ranking. These ranks are converted into p-values using R-test and fused using the Fisher's omnibus criterion. The significance of the genes is analyzed next by performing false discovery rate (FDR) analysis.</p>
         <p>The clustering-based ranking is performed using two-way clustering. The two-way clustering framework involves clustering the genes into relevant groups and then clustering the samples using the gene groups. Many different frameworks have been proposed for two-way clustering of microarray data. For example, Getz et al. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> proposed a procedure called coupled two-way clustering by iteratively applying one way clustering within the subgroups of gene and tissue clusters from the previous iteration. Tang et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> reported an inter-related clustering framework based on an iterative process that uses heuristics to define the number of clusters. McLachlan et al. <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> assume a model of distribution to cluster the genes. Theunique characteristics of microarray data limit the utility of some of these methods.</p>
         <p>The performance of a two-way clustering framework also depends on the underlying clustering algorithm. A plethora of clustering methods have been proposed for mining microarray data <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. They include but are not limited to hierarchical methods <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, self organizing maps <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, k-means clustering <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and their variations. This paper employs an adaptive subspace iteration (ASI) based algorithm for clustering microarray data (see methods). This algorithm is well suited to handle a large number of data points. The centroids of the clusters are available as one of the outputs hence new data points may be assigned to relevant clusters with ease (dynamic data clustering). This faster computational algorithm as the results suggest complements the two-way clustering framework employed in this paper.</p>
         <p>The performance of the statistics based algorithms on the other hand depends on the number of available samples. If the samples are less, which is true for microarray data, it is difficult to assume a distribution for the data. The ranking functions based on mean and sample variance yield inaccurate results due to the high level of noise. The statistical methods followed for finding DEGs were initially based on 2 sample t-test <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> obtained by pooling the variances from two cases <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The estimates used here are based on the assumption that there are a large number of samples for statistical analysis. Tusher et al. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> pointed out that small sample variance estimates (not much variation among the samples) yield false alarms for DEGs. They introduced an additive constant to the sample variance to reduce the false detection rate. This parameter estimation was later proposed by Jeffery et al. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> as the 90<sup>th </sup>percentile of the sum of gene specific global standard errors. Mukherjee et al. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> proposed the notion of reproducibility to minimize expected loss in determination of test statistics. The mean often is not a good representative of all the samples and may be corrupted by the outliers. Therefore, Shaik et al. incorporate Hausdorff distance into the combined adaptive ranking function to cope with the unique characteristics of the data sets and to improve the robustness of the ranking algorithm <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Most of these methods provide the user with only the ranks for the genes and the significance of the genes is unknown based only on the ranks.</p>
         <p>The p-values are an indication of significance of the genes based on differential expression. This is important for feature selection studies because the p-values indicate the probability that a gene is deemed significant not by chance (FDR &#8211; false discovery rate). There are several significant studies that focus on this important issue <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. However, most of the gene selection methods provide the user with only the ranking of the genes <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B7">7</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B19">19</abbr></abbrgrp>. The ranks may be used to sort the genes based on differential expression from highest to lowest. The rankings do not indicate the significance of the genes. The non-availability of the p-values therefore poses problems in gene selection. The availability of p-values enables controlling the false discovery rate, which is to accept a minimum number of false positives relative to the number of rejected hypotheses. An R-test is performed in this paper to convert ranks to p-values <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
         <p>The validation of DEGs is a challenging research issue. This paper uses 3 different methods, namely FDR analysis, clustering and visualization based methods to validate the DEGs. The ASI-based clustering algorithm <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> is employed for the clustering based validation. The steps involved in clustering-based validation can be summarized as follows:</p>
         <p>&#8226; Find the differentially expressed genes between sample classes using the training set.</p>
         <p>&#8226; Apply ASI algorithm to cluster the training samples using DEGs as features and verify if the clusters are consistent with different classes.</p>
         <p>&#8226; Apply ASI algorithm to cluster the test samples with DEGs as features.</p>
         <p>&#8226; Compare the obtained clusters with the class label information of the test classes.</p>
         <p>&#8226; Repeat the process using all the ranking functions such as t-statistics <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, SAM, Adaptive <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, combined adaptive <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, two-way clustering <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and the proposed unified ranking.</p>
         <p>The application of projection based techniques by Shaik et al. <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp> for the visualization of microarray data is found to be well suited for the multi-class microarray datasets. In this paper the 3D star coordinate projection (3DSCP) algorithm originally proposed in <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> is used for the visual validation of the DEGs. The key idea behind the application of visualization algorithms for the validation of DEGs is that if the DEGs are used as features to project the samples, the samples of different classes should be projected to distinct locations in the projected space else a random projection pattern is observed <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The 3DSCP algorithm is provided in additional File <supplr sid="S1">1</supplr>.</p>
         <suppl id="S1">
            <title>
               <p>Additional file 1</p>
            </title>
            <text>
               <p>3D star coordinate projection. The basic concepts of star coordinate projection are illustrated.</p>
            </text>
            <file name="1471-2105-8-347-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>This section discusses the basic formulation of the individual modules of unified framework. The overview of the unified framework is presented first.</p>
         <sec>
            <st>
               <p>Unified framework</p>
            </st>
            <p>The proposed unified framework as shown in Fig. <figr fid="F1">1</figr> consists of three modules viz. i) Gene ranking, ii) Significance analysis of ranking and iii)Validation. The genes are first scored by employing two-way clustering framework and combined adaptive ranking. The gene with highest score is given rank 1; gene with next highest score is given rank 2 and so on for both the methods. The ranks are converted into p-values (<it>P</it><sub>1 </sub>and <it>P</it><sub>2</sub>) using the R-test which is discussed later. The p-values <it>P</it><sub>1 </sub>and <it>P</it><sub>2 </sub>are combined using Fisher's omnibus procedure to obtain the unified p-value (<it>U</it>).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Unified Framework to find DEGs from Microarray Data</p>
               </caption>
               <text>
                  <p>Unified Framework to find DEGs from Microarray Data.</p>
               </text>
               <graphic file="1471-2105-8-347-1"/>
            </fig>
            <p>
               <display-formula id="M1">
                  <m:math name="1471-2105-8-347-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>U</m:mi>
                           <m:mo>=</m:mo>
                           <m:mo>&#8722;</m:mo>
                           <m:mn>2</m:mn>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>N</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mi>log</m:mi>
                                 <m:mo>&#8289;</m:mo>
                                 <m:msub>
                                    <m:mi>P</m:mi>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGvbqvcqGH9aqpcqGHsislcqaIYaGmdaaeWbqaaiGbcYgaSjabc+gaVjabcEgaNjabdcfaqnaaBaaaleaacqWGRbWAaeqaaaqaaiabdUgaRjabg2da9iabigdaXaqaaiabd6eaobqdcqGHris5aOGaeiOla4caaa@3F35@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Here, '<it>N</it>' is the number of p-value sets (2 in this case) and '<it>P</it><sub><it>k</it></sub>' is the set of p-values obtained using ranking procedure '<it>k</it>'. The resultant score '<it>U</it>' follows <it>&#967;</it><sup>2 </sup>distribution with 2N degrees of freedom. The scores are hence compared with <it>&#967;</it><sup>2 </sup>distribution to obtain their significance at appropriate significance level <it>&#945;</it>. The appropriate significance level <it>&#945; </it>is decided based on false discovery rate (FDR) analysis such that there are minimum expected percentage of false positives. The selected genes are further validated using several validation measures.</p>
            <p>The proposed framework for selecting and validating DEGs can be succinctly summarized as follows:</p>
            <p>&#8226; Rank the genes using the two-way clustering framework.</p>
            <p>&#8226; Rank the genes using statistics based ranking method.</p>
            <p>&#8226; Convert the ranks to p-values using R-test <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
            <p>&#8226; Combine the p-values of both gene selection methods using Fisher's omnibus criterion to obtain unified score as shown in the Eq. 1.</p>
            <p>&#8226; Select the DEGs based on FDR analysis.</p>
            <p>&#8226; Validate the selected DEGs.</p>
         </sec>
         <sec>
            <st>
               <p>Module 1: Gene ranking</p>
            </st>
            <p>The marker genes are generally ranked based on two criteria, i) finding differential expression of genes individually or ii) co-expressed genes offering high discrimination between two classes of tissues. Both of these criteria lead to different computational procedures in selecting DEGs as shown below:</p>
            <sec>
               <st>
                  <p>Two-way clustering</p>
               </st>
               <p>This paper employs a progressive framework as shown in Fig. <figr fid="F2">2</figr> for clustering the genes. Unlike the traditional two-way methods which cluster the genes into specified number of clusters, the progressive framework clusters the genes into all possible resolutions. A resolution is a measure of compactness of clusters. The Higher the resolution, the compact are the clusters and vice versa. If the discriminative ability of clusters is to be studied, it must be performed at various levels of granularities. A progressive clustering algorithm provides a flexible way to achieve this goal. For example, as shown in Fig. <figr fid="F2">2</figr>, resolution 2 has more granularity than resolution 1 and so on. The clustering of the data progresses into various levels of granularity ranging from macro to micro clusters. The resolution level at which the progressive framework terminates is determined using Davies-Bouldin indices <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The two-way clustering framework employs the gene clusters at each resolution to cluster the samples as shown in the Fig. <figr fid="F3">3</figr>.</p>
               <fig id="F2">
                  <title>
                     <p>Figure 2</p>
                  </title>
                  <caption>
                     <p>Progressive Clustering Framework to Cluster the Genes</p>
                  </caption>
                  <text>
                     <p>Progressive Clustering Framework to Cluster the Genes.</p>
                  </text>
                  <graphic file="1471-2105-8-347-2"/>
               </fig>
               <fig id="F3">
                  <title>
                     <p>Figure 3</p>
                  </title>
                  <caption>
                     <p>Two-Way Clustering Framework to Find DEGs from Microarray Data</p>
                  </caption>
                  <text>
                     <p>Two-Way Clustering Framework to Find DEGs from Microarray Data.</p>
                  </text>
                  <graphic file="1471-2105-8-347-3"/>
               </fig>
               <p>The sample cluster groups are compared with the sample class label information. The score for each gene cluster '<it>CSC</it><sub><it>k</it></sub>' as shown in the Eq. 2 is determined by finding the number of correctly identified samples.</p>
               <p>
                  <display-formula id="M2">
                     <m:math name="1471-2105-8-347-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:munderover>
                                 <m:mrow>
                                    <m:mi>C</m:mi>
                                    <m:mi>S</m:mi>
                                    <m:msub>
                                       <m:mi>C</m:mi>
                                       <m:mi>k</m:mi>
                                    </m:msub>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>M</m:mi>
                              </m:munderover>
                              <m:mo>=</m:mo>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>t</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:munderover>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>j</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mi>L</m:mi>
                                       </m:munderover>
                                       <m:mrow>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>C</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo>&#8745;</m:mo>
                                                <m:msub>
                                                   <m:mi>G</m:mi>
                                                   <m:mi>j</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWfWaqaaiabdoeadjabdofatjabdoeadnaaBaaaleaacqWGRbWAaeqaaaqaaiabdUgaRjabg2da9iabigdaXaqaaiabd2eanbaakiabg2da9maaqahabaWaaabCaeaadaqadaqaaiabdoeadnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaaakiaawIcacaGLPaaaaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabdYeambqdcqGHris5aaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemiDaqhaniabggHiLdaaaa@4D84@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>Here, '<it>M</it>' is the total number of gene clusters, '<it>L</it>' is the number of different labels according to the ground truth, <it>C</it><sub><it>i </it></sub>are the samples that are part of cluster '<it>i</it>', <it>G</it><sub><it>j </it></sub>is the group of samples having label '<it>j</it>' according to the ground truth and (<it>C</it><sub><it>i </it></sub>&#8745; <it>G</it><sub><it>j</it></sub>) is the maximum consistency between any of the sample clusters <it>C</it><sub><it>i </it></sub>and the samples '<it>G</it>' with labels '<it>j</it>' according to the ground truth.</p>
            </sec>
            <sec>
               <st>
                  <p>Adaptive subspace iteration algorithm</p>
               </st>
               <p>The adaptive subspace iteration (ASI) is a subspace based method to cluster the data. It involves an optimization process that iteratively identifies the subspace structure. The following notations are used in the algorithm:</p>
               <p>&#8226; <it>D</it><sub><it>nxm </it></sub>is the data matrix that contains the microarray data with '<it>n</it>' genes and '<it>m</it>' samples. Also, assume that each macro cluster is divided into '<it>k</it>' number of micro clusters at each resolution level (cf. Fig. <figr fid="F2">2</figr>).</p>
               <p>&#8226; The matrix <it>M</it><sub><it>nxk </it></sub>is the membership matrix. Each gene has '<it>k</it>' memberships corresponding to the '<it>k</it>' clusters. The cluster to which the gene belongs has membership of 1 and rest of the memberships are 0. This enables hard clustering of the genes.</p>
               <p>&#8226; Let <it>S</it><sub><it>mxk </it></sub>be the subspace structure associated with each gene cluster. This subspace has adequate information about the most informative genes in the cluster. The columns of <it>S </it>determine the relevance of each sample in the formation of a cluster. Hence, (<it>DS</it>)<sub><it>nxk </it></sub>represents the projection of the data onto the subspaces.</p>
               <p>&#8226; Let '<it>C</it>' be the projection of centroid of each gene cluster onto the subspaces given by <it>S</it><sub><it>mxk</it></sub>. This enables calculating the distances between the genes in the subspace and to each of the centroids in the subspace to determine the relevance of each gene with each of the clusters. The relationship between the '<it>C</it>', '<it>S</it>' and '<it>M' </it>is given by,</p>
               <p>
                  <display-formula id="M3"><it>C </it>= (<it>M</it><sup><it>T </it></sup><it>M</it>)<sup>-1 </sup><it>M</it><sup><it>T </it></sup><it>DS</it>.</display-formula>
               </p>
               <p>Here, (<it>M</it><sup><it>T </it></sup><it>M</it>) provides the size of the clusters (number of genes in a cluster). The diagonal elements provide the size of each cluster and off diagonal elements are zero. Hence '<it>C</it>' matrix calculates the mean of each gene cluster to estimate the centroids. These centroids are projected to subspace as shown in Eq. 3.</p>
               <p>The objective function '<it>O</it>' is given by,</p>
               <p>
                  <display-formula id="M4">
                     <m:math name="1471-2105-8-347-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>O</m:mi>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mn>1</m:mn>
                                 <m:mn>2</m:mn>
                              </m:mfrac>
                              <m:msub>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mo>&#8214;</m:mo>
                                       <m:mrow>
                                          <m:mi>D</m:mi>
                                          <m:mi>S</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mi>M</m:mi>
                                          <m:mi>C</m:mi>
                                       </m:mrow>
                                       <m:mo>&#8214;</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                                 <m:mi>F</m:mi>
                              </m:msub>
                              <m:mo>.</m:mo>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGpbWtcqGH9aqpdaWcaaqaaiabigdaXaqaaiabikdaYaaadaqbdaqaaiabdseaejabdofatjabgkHiTiabd2eanjabdoeadbGaayzcSlaawQa7amaaBaaaleaacqWGgbGraeqaaOGaeiOla4caaa@3B80@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>Here, ||.||<sub><it>F </it></sub>is called the Frobenius norm defined as ||<it>A</it>||<sub><it>F </it></sub>= <it>tr</it>(<it>AA</it><sup><it>T</it></sup>) where, 'tr' is the trace of a matrix. The objective function optimizes by rendering the distance between the gene cluster centroid and each of the genes in that cluster as small as possible thereby making the clusters compact. Therefore,</p>
               <p>
                  <display-formula id="M5">
                     <m:math name="1471-2105-8-347-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>O</m:mi>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mn>1</m:mn>
                                 <m:mn>2</m:mn>
                              </m:mfrac>
                              <m:mi>t</m:mi>
                              <m:mi>r</m:mi>
                              <m:mrow>
                                 <m:mo>[</m:mo>
                                 <m:mrow>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>D</m:mi>
                                    <m:mi>S</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mi>M</m:mi>
                                    <m:mi>C</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:msup>
                                       <m:mrow>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>D</m:mi>
                                          <m:mi>S</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mi>M</m:mi>
                                          <m:mi>C</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                       <m:mi>T</m:mi>
                                    </m:msup>
                                 </m:mrow>
                                 <m:mo>]</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGpbWtcqGH9aqpdaWcaaqaaiabigdaXaqaaiabikdaYaaacqWG0baDcqWGYbGCdaWadaqaaiabcIcaOiabdseaejabdofatjabgkHiTiabd2eanjabdoeadjabcMcaPiabcIcaOiabdseaejabdofatjabgkHiTiabd2eanjabdoeadjabcMcaPmaaCaaaleqabaGaemivaqfaaaGccaGLBbGaayzxaaaaaa@4525@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>
                  <display-formula id="M6">
                     <m:math name="1471-2105-8-347-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mn>1</m:mn>
                                 <m:mn>2</m:mn>
                              </m:mfrac>
                              <m:mrow>
                                 <m:mo>[</m:mo>
                                 <m:mrow>
                                    <m:mi>t</m:mi>
                                    <m:mi>r</m:mi>
                                    <m:mrow>
                                       <m:mo>(</m:mo>
                                       <m:mrow>
                                          <m:mi>D</m:mi>
                                          <m:mi>S</m:mi>
                                          <m:msup>
                                             <m:mi>S</m:mi>
                                             <m:mi>T</m:mi>
                                          </m:msup>
                                          <m:msup>
                                             <m:mi>D</m:mi>
                                             <m:mi>T</m:mi>
                                          </m:msup>
                                       </m:mrow>
                                       <m:mo>)</m:mo>
                                    </m:mrow>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>2</m:mn>
                                    <m:mi>t</m:mi>
                                    <m:mi>r</m:mi>
                                    <m:mrow>
                                       <m:mo>(</m:mo>
                                       <m:mrow>
                                          <m:mi>D</m:mi>
                                          <m:mi>S</m:mi>
                                          <m:msup>
                                             <m:mi>C</m:mi>
                                             <m:mi>T</m:mi>
                                          </m:msup>
                                          <m:msup>
                                             <m:mi>M</m:mi>
                                             <m:mi>T</m:mi>
                                          </m:msup>
                                       </m:mrow>
                                       <m:mo>)</m:mo>
                                    </m:mrow>
                                    <m:mo>+</m:mo>
                                    <m:mi>t</m:mi>
                                    <m:mi>r</m:mi>
                                    <m:mrow>
                                       <m:mo>(</m:mo>
                                       <m:mrow>
                                          <m:mi>M</m:mi>
                                          <m:mi>C</m:mi>
                                          <m:msup>
                                             <m:mi>C</m:mi>
                                             <m:mi>T</m:mi>
                                          </m:msup>
                                          <m:msup>
                                             <m:mi>M</m:mi>
                                             <m:mi>T</m:mi>
                                          </m:msup>
                                       </m:mrow>
                                       <m:mo>)</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                                 <m:mo>]</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGH9aqpdaWcaaqaaiabigdaXaqaaiabikdaYaaadaWadaqaaiabdsha0jabdkhaYnaabmaabaGaemiraqKaem4uamLaem4uam1aaWbaaSqabeaacqWGubavaaGccqWGebardaahaaWcbeqaaiabdsfaubaaaOGaayjkaiaawMcaaiabgkHiTiabikdaYiabdsha0jabdkhaYnaabmaabaGaemiraqKaem4uamLaem4qam0aaWbaaSqabeaacqWGubavaaGccqWGnbqtdaahaaWcbeqaaiabdsfaubaaaOGaayjkaiaawMcaaiabgUcaRiabdsha0jabdkhaYnaabmaabaGaemyta0Kaem4qamKaem4qam0aaWbaaSqabeaacqWGubavaaGccqWGnbqtdaahaaWcbeqaaiabdsfaubaaaOGaayjkaiaawMcaaaGaay5waiaaw2faaaaa@5752@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>
                  <display-formula id="M7">= <it>tr</it>(<it>DSS</it><sup><it>T </it></sup><it>D</it><sup><it>T</it></sup>) - <it>tr</it>(<it>D</it><sup><it>T </it></sup><it>S</it><sup><it>T </it></sup><it>MC</it>)</display-formula>
               </p>
               <p>Here, the first component (<it>DSS</it><sup><it>T </it></sup><it>D</it><sup><it>T</it></sup>) = (<it>DS</it>)(<it>DS</it>)<sup><it>T </it></sup>gives the total deviance of the data in the subspace. The second component (<it>D</it><sup><it>T </it></sup><it>S</it><sup><it>T </it></sup><it>MC</it>) = (((<it>D</it><sup><it>T </it></sup><it>S</it><sup><it>T</it></sup>)<it>M</it>)<it>C</it>) first projects the data onto the subspace as given by (<it>D</it><sup><it>T </it></sup><it>S</it><sup><it>T</it></sup>). Further, the sum of distance between the centroids is estimated using (((<it>D</it><sup><it>T </it></sup><it>S</it><sup><it>T</it></sup>)<it>M</it>)<it>C</it>). The objective function shown in Eq. 7 is minimized by maximizing distances between the centroids of individual clusters.</p>
               <p>The objective function in (7) is minimized by considering first '<it>k</it>' Eigen vectors of (<it>D</it><sup><it>T</it></sup>(<it>M</it>(<it>M</it><sup><it>T </it></sup><it>M</it>)<sup>-1 </sup><it>M</it><sup><it>T </it></sup>- <it>I</it>)<it>D</it>)<sub>1:<it>k </it></sub><abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Therefore, '<it>S </it>'is updated using Eq. 8</p>
               <p>
                  <display-formula id="M8"><it>S </it>= (<it>D</it><sup><it>T</it></sup>(<it>M</it>(<it>M</it><sup><it>T </it></sup><it>M</it>)<sup>-1 </sup><it>M</it><sup><it>T </it></sup>- <it>I</it>)<it>D</it>)<sub><it>k</it></sub>.</display-formula>
               </p>
               <p>Please note that this feature provides dimensionality reduction and further computations are all performed in the reduced sub space. The output of the algorithm is '<it>M</it>' and '<it>S</it>'. Here, '<it>M</it>' offers the cluster memberships and '<it>S</it>'offers the weights of the samples forming the clusters defined by the matrix '<it>M</it>'. Based on the membership, the relevance of the gene with a cluster may be estimated. If the membership is 0, there is no relevance and membership 1 indicates the gene belongs to that cluster.</p>
            </sec>
            <sec>
               <st>
                  <p>ASI algorithm</p>
               </st>
               <p>
                  <it>Begin clustering</it>
               </p>
               <p>
                  <it>Step 1: Begin Initialization</it>
               </p>
               <p><it>Initialize 'M' </it>with zeros and randomly place 1 in each row.</p>
               <p>
                  <it>Initialize 'S' with Random values such that each column adds up to 1;</it>
               </p>
               <p>
                  <it>End Initialization</it>
               </p>
               <p>Step 2: Project the centroids of each cluster onto the subspaces using Eq. (3);</p>
               <p>Step 3: Compute the initial optimization value '<it>O</it><sub>0</sub>' using the objective function of Eq. (7);</p>
               <p>Step 4: Perform optimization by iteratively updating D, F, S;</p>
               <p>
                  <it>Begin Optimization</it>
               </p>
               <p>While (<it>O</it><sub>1 </sub>&lt;<it>O</it><sub>0</sub>)&#160;&#160;&#160;//Continue as long as the cluster compactness increases</p>
               <p>Step 4-1: Update '<it>M</it>' given by the formula in equation (5)</p>
               <p>Begin Loop&#160;&#160;&#160;//Iterate over all the features</p>
               <p>//update the memberships by finding the relevance of a gene with each of the updated cluster centroids as shown in Eq. 9</p>
               <p><it>M</it>(<it>i</it>, <it>j</it>) = ((<it>DS</it>)<sub><it>i</it>,<it>j </it></sub>- <it>C</it><sub>:,<it>j</it></sub>);&#160;&#160;&#160;j = 1...k</p>
               <p>
                  <display-formula id="M9">Min(<it>M</it>(<it>i</it>, <it>j</it>)) = 1;&#160;&#160;&#160;j = 1...k</display-formula>
               </p>
               <p>End loop</p>
               <p>Step 4-3: Update '<it>S </it>'given by the formula in equation (8);</p>
               <p>
                  <it>Step 4-4: Compute Step 2;</it>
               </p>
               <p>Step 4-5: Compute '<it>O</it><sub>1</sub>' using equation (7);</p>
               <p>Step 4-6: If (<it>O</it><sub>1 </sub>&lt;<it>O</it><sub>0</sub>);&#160;&#160;&#160;//Check for the terminating condition//.</p>
               <p><it>O</it><sub>0 </sub>= <it>O</it><sub>1</sub>;</p>
               <p>
                  <it>End optimization</it>
               </p>
               <p>
                  <it>End Clustering</it>
               </p>
            </sec>
            <sec>
               <st>
                  <p>Davies-Bouldin index</p>
               </st>
               <p>Davies-Bouldin index is a measure of cluster validation metric<abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. It measures the homogeneity of the clusters by finding the ratio of the sum of intra-cluster scatter to inter-cluster scatter. The intra cluster scatter is a measure of spread of a cluster. The inter cluster scatter on the other hand is a measure of distinctiveness of the clusters. Therefore, lower the ratio of intra cluster scatter to inter cluster scatter, the better.</p>
               <p>The intra-cluster scatter is given by</p>
               <p>
                  <display-formula id="M10">
                     <m:math name="1471-2105-8-347-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>S</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>q</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:msup>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mo>(</m:mo>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mn>1</m:mn>
                                             <m:mrow>
                                                <m:mrow>
                                                   <m:mo>|</m:mo>
                                                   <m:mrow>
                                                      <m:msub>
                                                         <m:mi>A</m:mi>
                                                         <m:mi>i</m:mi>
                                                      </m:msub>
                                                   </m:mrow>
                                                   <m:mo>|</m:mo>
                                                </m:mrow>
                                             </m:mrow>
                                          </m:mfrac>
                                          <m:mstyle displaystyle="true">
                                             <m:munder>
                                                <m:mo>&#8721;</m:mo>
                                                <m:mrow>
                                                   <m:mi>x</m:mi>
                                                   <m:mo>&#8712;</m:mo>
                                                   <m:msub>
                                                      <m:mi>A</m:mi>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:munder>
                                             <m:mrow>
                                                <m:msubsup>
                                                   <m:mrow>
                                                      <m:mrow>
                                                         <m:mo>&#8214;</m:mo>
                                                         <m:mrow>
                                                            <m:mi>x</m:mi>
                                                            <m:mo>&#8722;</m:mo>
                                                            <m:msub>
                                                               <m:mi>v</m:mi>
                                                               <m:mi>i</m:mi>
                                                            </m:msub>
                                                         </m:mrow>
                                                         <m:mo>&#8214;</m:mo>
                                                      </m:mrow>
                                                   </m:mrow>
                                                   <m:mn>2</m:mn>
                                                   <m:mi>q</m:mi>
                                                </m:msubsup>
                                             </m:mrow>
                                          </m:mstyle>
                                       </m:mrow>
                                       <m:mo>)</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mn>1</m:mn>
                                       <m:mi>q</m:mi>
                                    </m:mfrac>
                                 </m:mrow>
                              </m:msup>
                              <m:mo>,</m:mo>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWudaWgaaWcbaGaemyAaKMaeiilaWIaemyCaehabeaakiabg2da9maabmaabaWaaSaaaeaacqaIXaqmaeaadaabdaqaaiabdgeabnaaBaaaleaacqWGPbqAaeqaaaGccaGLhWUaayjcSdaaamaaqafabaWaauWaaeaacqWG4baEcqGHsislcqWG2bGDdaWgaaWcbaGaemyAaKgabeaaaOGaayzcSlaawQa7amaaDaaaleaacqaIYaGmaeaacqWGXbqCaaaabaGaemiEaGNaeyicI4Saemyqae0aaSbaaWqaaiabdMgaPbqabaaaleqaniabggHiLdaakiaawIcacaGLPaaadaahaaWcbeqaamaalaaabaGaeGymaedabaGaemyCaehaaaaakiabcYcaSaaa@5160@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>and the inter-cluster scatter is given by,</p>
               <p>
                  <display-formula id="M11">
                     <m:math name="1471-2105-8-347-i7" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>d</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mi>j</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>t</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:msup>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mo>{</m:mo>
                                       <m:mrow>
                                          <m:msup>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:munderover>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:mi>s</m:mi>
                                                         <m:mo>=</m:mo>
                                                         <m:mn>1</m:mn>
                                                      </m:mrow>
                                                      <m:mi>p</m:mi>
                                                   </m:munderover>
                                                   <m:mrow>
                                                      <m:mrow>
                                                         <m:mo>|</m:mo>
                                                         <m:mrow>
                                                            <m:msub>
                                                               <m:mi>v</m:mi>
                                                               <m:mrow>
                                                                  <m:mi>s</m:mi>
                                                                  <m:mi>i</m:mi>
                                                               </m:mrow>
                                                            </m:msub>
                                                            <m:mo>&#8722;</m:mo>
                                                            <m:msub>
                                                               <m:mi>v</m:mi>
                                                               <m:mrow>
                                                                  <m:mi>s</m:mi>
                                                                  <m:mi>j</m:mi>
                                                               </m:mrow>
                                                            </m:msub>
                                                         </m:mrow>
                                                         <m:mo>|</m:mo>
                                                      </m:mrow>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                             <m:mi>t</m:mi>
                                          </m:msup>
                                       </m:mrow>
                                       <m:mo>}</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mn>1</m:mn>
                                       <m:mi>t</m:mi>
                                    </m:mfrac>
                                 </m:mrow>
                              </m:msup>
                              <m:mo>=</m:mo>
                              <m:msub>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mo>&#8214;</m:mo>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>v</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mo>&#8722;</m:mo>
                                          <m:msub>
                                             <m:mi>v</m:mi>
                                             <m:mi>j</m:mi>
                                          </m:msub>
                                       </m:mrow>
                                       <m:mo>&#8214;</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                                 <m:mi>t</m:mi>
                              </m:msub>
                              <m:mo>.</m:mo>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGKbazdaWgaaWcbaGaemyAaKMaemOAaOMaeiilaWIaemiDaqhabeaakiabg2da9maacmqabaWaaabCaeaadaabdaqaaiabdAha2naaBaaaleaacqWGZbWCcqWGPbqAaeqaaOGaeyOeI0IaemODay3aaSbaaSqaaiabdohaZjabdQgaQbqabaaakiaawEa7caGLiWoaaSqaaiabdohaZjabg2da9iabigdaXaqaaiabdchaWbqdcqGHris5aOWaaWbaaSqabeaacqWG0baDaaaakiaawUhacaGL9baadaahaaWcbeqaamaalaaabaGaeGymaedabaGaemiDaqhaaaaakiabg2da9maafmaabaGaemODay3aaSbaaSqaaiabdMgaPbqabaGccqGHsislcqWG2bGDdaWgaaWcbaGaemOAaOgabeaaaOGaayzcSlaawQa7amaaBaaaleaacqWG0baDaeqaaOGaeiOla4caaa@5C8C@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>Where, <it>v</it><sub><it>i </it></sub>is the centroid of i<sup>th </sup>cluster, <it>q</it>,<it>t </it>&#8805; 1, <it>q</it>,<it>t </it>can be selected independently of each other. For example, when t = 2, <it>d</it><sub><it>ij</it>,<it>t </it></sub>is the Euclidean distance between '<it>v</it><sub><it>si</it></sub>' and '<it>v</it><sub><it>sj</it></sub>'. The |<it>A</it><sub><it>i</it></sub>| is the number of elements in the cluster <it>A</it><sub><it>j </it></sub>and '<it>x</it>'s are the elements of cluster <it>A</it><sub><it>j</it></sub>.</p>
               <p>
                  <display-formula id="M12">
                     <m:math name="1471-2105-8-347-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mtext>Define&#160;</m:mtext>
                              <m:msub>
                                 <m:mi>R</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>q</m:mi>
                                    <m:mi>t</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:munder>
                                 <m:mrow>
                                    <m:mi>max</m:mi>
                                    <m:mo>&#8289;</m:mo>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi>c</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>j</m:mi>
                                    <m:mo>&#8800;</m:mo>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mo>{</m:mo>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>S</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>q</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>+</m:mo>
                                          <m:msub>
                                             <m:mi>S</m:mi>
                                             <m:mrow>
                                                <m:mi>j</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>q</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>d</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>t</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mfrac>
                                 </m:mrow>
                                 <m:mo>}</m:mo>
                              </m:mrow>
                              <m:mo>.</m:mo>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGebarcqqGLbqzcqqGMbGzcqqGPbqAcqqGUbGBcqqGLbqzcqqGGaaicqWGsbGudaWgaaWcbaGaemyAaKMaeiilaWIaemyCaeNaemiDaqhabeaakiabg2da9maaxababaGagiyBa0MaeiyyaeMaeiiEaGhaleaacqWGQbGAcqGHiiIZcqWGJbWycqGGSaalcqWGQbGAcqGHGjsUcqWGPbqAaeqaaOWaaiWabeaadaWcaaqaaiabdofatnaaBaaaleaacqWGPbqAcqGGSaalcqWGXbqCaeqaaOGaey4kaSIaem4uam1aaSbaaSqaaiabdQgaQjabcYcaSiabdghaXbqabaaakeaacqWGKbazdaWgaaWcbaGaemyAaKMaemOAaOMaeiilaWIaemiDaqhabeaaaaaakiaawUhacaGL9baacqGGUaGlaaa@5F5B@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>Here, 'R' is a measure of compactness and distinctiveness of the clusters formulated as the ratio of intra cluster scatter and inter cluster scatter.</p>
               <p>
                  <display-formula id="M13">
                     <m:math name="1471-2105-8-347-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mtext>Davies-Bouldin&#160;index&#160;is&#160;defined&#160;as&#160;</m:mtext>
                              <m:mi>D</m:mi>
                              <m:mi>B</m:mi>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mn>1</m:mn>
                                 <m:mi>c</m:mi>
                              </m:mfrac>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>c</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>R</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:mi>q</m:mi>
                                          <m:mi>t</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                 </m:mrow>
                              </m:mstyle>
                              <m:mo>.</m:mo>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGebarcqqGHbqycqqG2bGDcqqGPbqAcqqGLbqzcqqGZbWCcqqGTaqlcqqGcbGqcqqGVbWBcqqG1bqDcqqGSbaBcqqGKbazcqqGPbqAcqqGUbGBcqqGGaaicqqGPbqAcqqGUbGBcqqGKbazcqqGLbqzcqqG4baEcqqGGaaicqqGPbqAcqqGZbWCcqqGGaaicqqGKbazcqqGLbqzcqqGMbGzcqqGPbqAcqqGUbGBcqqGLbqzcqqGKbazcqqGGaaicqqGHbqycqqGZbWCcqqGGaaicqWGebarcqWGcbGqcqGH9aqpdaWcaaqaaiabigdaXaqaaiabdogaJbaadaaeWbqaaiabdkfasnaaBaaaleaacqWGPbqAcqGGSaalcqWGXbqCcqWG0baDaeqaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabdogaJbqdcqGHris5aOGaeiOla4caaa@6BE3@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>Here, '<it>c</it>' is the number of clusters and '<it>DB</it>' is the measure of homogeneity of the clusters. Lower the '<it>DB</it>', more homogenous are the clusters.</p>
            </sec>
            <sec>
               <st>
                  <p>Combined adaptive ranking</p>
               </st>
               <p>The adaptive ranking based method adopts a modification of the classical t-statistic based ranking function <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Let '<it>D</it><sub><it>n</it>,(<it>m</it>+<it>k</it>)</sub>' be the data matrix with '<it>n</it>' genes and '<it>m</it>' samples under one condition (say tumor class) and '<it>k</it>' samples under the other condition (say non-tumor class). The bootstrapping procedure is employed on the original dataset <it>D</it><sub><it>n</it>,(<it>m</it>+<it>k</it>) </sub>and <it>j </it>&lt;<it>min(m,k) </it>samples are randomly selected from both cases and pooled to form the data <it>'D1</it><sub><it>n</it>,2<it>j</it></sub><it>' </it>('<it>j</it>' samples from each condition). The process is repeated to construct another dataset <it>'D2</it><sub><it>n</it>,2<it>j</it></sub><it>'</it>. The readers are encouraged to read <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> for further information on bootstrapping procedure. The ranking function of the Eq. 14 is applied independently on these two datasets to obtain the scores of the marker genes describing their differential expression. These scores are ranked and sorted from highest to lowest resulting in <it>R</it><sub>1 </sub>and <it>R</it><sub>2 </sub>in the Eq. 15. Since these two datasets are the subset of the original dataset '<it>D</it><sub><it>n</it>,(<it>m</it>+<it>k</it>)</sub>', they must produce similar ranking. The optimized set of parameters <it>&#952;</it>s which provide high consistency (Eq. 16) between the rankings are obtained using Monte Carlo simulation <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Since it is adequate to test the consistency using a few high ranked genes '<it>h</it>', <it>h </it>= 100 is employed in this paper. The first '<it>h</it>' high ranked genes are obtained from these two rankings resulting in two sets given by Eq. 15,</p>
               <p>
                  <display-formula id="M14">
                     <m:math name="1471-2105-8-347-i10" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>R</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>&#952;</m:mi>
                                 <m:mn>1</m:mn>
                              </m:msub>
                              <m:mo>,</m:mo>
                              <m:msub>
                                 <m:mi>&#952;</m:mi>
                                 <m:mn>2</m:mn>
                              </m:msub>
                              <m:mo>,</m:mo>
                              <m:msub>
                                 <m:mi>&#952;</m:mi>
                                 <m:mn>3</m:mn>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:mi>d</m:mi>
                                    <m:mo>+</m:mo>
                                    <m:msub>
                                       <m:mi>&#952;</m:mi>
                                       <m:mn>1</m:mn>
                                    </m:msub>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>&#952;</m:mi>
                                       <m:mn>2</m:mn>
                                    </m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#963;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mo>+</m:mo>
                                    <m:msub>
                                       <m:mi>&#952;</m:mi>
                                       <m:mn>3</m:mn>
                                    </m:msub>
                                 </m:mrow>
                              </m:mfrac>
                              <m:mo>.</m:mo>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGsbGucqGGOaakiiGacqWF4oqCdaWgaaWcbaGaeGymaedabeaakiabcYcaSiab=H7aXnaaBaaaleaacqaIYaGmaeqaaOGaeiilaWIae8hUde3aaSbaaSqaaiabiodaZaqabaGccqGGPaqkcqGH9aqpdaWcaaqaaiabdsgaKjabgUcaRiab=H7aXnaaBaaaleaacqaIXaqmaeqaaaGcbaGae8hUde3aaSbaaSqaaiabikdaYaqabaGccuWFdpWCgaqcaiabgUcaRiab=H7aXnaaBaaaleaacqaIZaWmaeqaaaaakiabc6caUaaa@494A@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>Here,'<it>d</it>' is the difference of means for 'mean method' and Hausdorff distance between different samples for the 'Hausdorff distance method'. The <inline-formula><m:math name="1471-2105-8-347-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mover accent="true"><m:mi>&#963;</m:mi><m:mo>^</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFdpWCgaqcaaaa@2E86@</m:annotation></m:semantics></m:math></inline-formula> is the square root of the sample variance.</p>
               <p>
                  <display-formula id="M15"><it>S</it><sub>1 </sub>= <it>R</it><sub>1</sub>(1: <it>h</it>) and <it>S</it><sub>2 </sub>= <it>R</it><sub>2</sub>(1: <it>h</it>).</display-formula>
               </p>
               <p>A consistency measure is obtained by comparing these two sets</p>
               <p>
                  <display-formula id="M16"><it>Co </it>= <it>S</it><sub>1 </sub>&#8745; <it>S</it><sub>2</sub>.</display-formula>
               </p>
               <p>The ranking <it>'R' </it>which produces highest consistency <it>'Co</it>' is considered as the best ranking. The distance measure in this ranking function was initially based on absolute difference of means <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Mean is not a good representative of the sample expressions and may be driven by outliers. A robust distance measure called <it>K</it><sup>th </sup>Hausdorff distance is supplemented with the mean method by Shaik et al. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. The rankings <it>R</it><sub><it>M </it></sub>and <it>R</it><sub><it>H </it></sub>are obtained for mean method and Hausdorff distance method respectively using Eq. 14 and combined to develop a fused ranking method <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> as shown in Eq. 17,</p>
               <p>
                  <display-formula id="M17">
                     <m:math name="1471-2105-8-347-i12" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>S</m:mi>
                              <m:msub>
                                 <m:mi>C</m:mi>
                                 <m:mn>2</m:mn>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>W</m:mi>
                                       <m:mn>1</m:mn>
                                    </m:msub>
                                    <m:msub>
                                       <m:mi>R</m:mi>
                                       <m:mi>M</m:mi>
                                    </m:msub>
                                    <m:mo>+</m:mo>
                                    <m:msub>
                                       <m:mi>W</m:mi>
                                       <m:mn>2</m:mn>
                                    </m:msub>
                                    <m:msub>
                                       <m:mi>R</m:mi>
                                       <m:mi>H</m:mi>
                                    </m:msub>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>W</m:mi>
                                       <m:mn>1</m:mn>
                                    </m:msub>
                                    <m:mo>+</m:mo>
                                    <m:msub>
                                       <m:mi>W</m:mi>
                                       <m:mn>2</m:mn>
                                    </m:msub>
                                 </m:mrow>
                              </m:mfrac>
                              <m:mo>.</m:mo>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWucqWGdbWqdaWgaaWcbaGaeGOmaidabeaakiabg2da9maalaaabaGaem4vaC1aaSbaaSqaaiabigdaXaqabaGccqWGsbGudaWgaaWcbaGaemyta0eabeaakiabgUcaRiabdEfaxnaaBaaaleaacqaIYaGmaeqaaOGaemOuai1aaSbaaSqaaiabdIeaibqabaaakeaacqWGxbWvdaWgaaWcbaGaeGymaedabeaakiabgUcaRiabdEfaxnaaBaaaleaacqaIYaGmaeqaaaaakiabc6caUaaa@424A@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>Here, <it>W</it><sub>1 </sub>and <it>W</it><sub>2 </sub>represent the confidence in the rankings <it>R</it><sub><it>M </it></sub>and <it>R</it><sub><it>H </it></sub>obtained using the consistency '<it>Co</it>' obtained from the Eq. 16.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Module 2: Significance analysis of ranking</p>
            </st>
            <p>The ranking algorithms of the module 1 provide the user with only the relative ranks. These ranks do not indicate the significance of the rankings. Therefore, these ranks must be converted to p-values to find the significance.</p>
            <sec>
               <st>
                  <p>Convert the scores to p-values</p>
               </st>
               <p>The R-test followed in <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> is followed in this paper to convert scores to p-values. This is formulated as a hypothesis testing problem. Let '<it>I</it>' be the informative genes and '<it>UI</it>' be the non-informative genes. The null hypothesis is that the gene is not informative and the alternate hypothesis is that the gene is informative. The distribution of statistics under null hypothesis is obtained as follows (Please see <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> for more details):</p>
               <p>&#8226; Obtain the ranks of the genes using the scores for 'I' iterations using bootstrapping. The value I = 25 is often adequate as indicated in <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
               <p>&#8226; Construct the distribution of statistics under null hypothesis from consistently high ranked (insignificant) genes.</p>
               <p>&#8226; The median rank (<it>r</it>) of each gene is obtained (in <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> mean rank was computed).</p>
               <p>&#8226; The p-value of each gene is obtained by finding <it>p</it>(<it>r</it><sub><it>i</it></sub>/<it>g </it>&#8712; <it>UI</it>) i.e. the probability of the ranking of gene '<it>r</it><sub><it>i</it></sub>' given the gene belongs to null-hypothesis. The null hypothesis is that gene is uninformative therefore lower the probability under null hypothesis, more significant is the gene.</p>
            </sec>
            <sec>
               <st>
                  <p>False discovery rate (FDR) analysis</p>
               </st>
               <p>The FDR analysis is the process of selecting the DEGs such that there are minimum possible expected false positives. Let '<it>S</it><sub><it>g</it></sub>' be the number of selected DEGs at significance level <it>&#945; </it>and let <it>V </it>be the number of false positives among the selected DEGs. The FDR as proposed by Storey and Tibshirani <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> is given by,</p>
               <p>
                  <display-formula id="M18">
                     <m:math name="1471-2105-8-347-i13" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>F</m:mi>
                              <m:mi>D</m:mi>
                              <m:mi>R</m:mi>
                              <m:mo>=</m:mo>
                              <m:mi>E</m:mi>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mi>V</m:mi>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>S</m:mi>
                                             <m:mi>g</m:mi>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mfrac>
                                    <m:mo>|</m:mo>
                                    <m:msub>
                                       <m:mi>S</m:mi>
                                       <m:mi>g</m:mi>
                                    </m:msub>
                                    <m:mo>></m:mo>
                                    <m:mn>0</m:mn>
                                 </m:mrow>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                              <m:mi>p</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>S</m:mi>
                                 <m:mi>g</m:mi>
                              </m:msub>
                              <m:mo>></m:mo>
                              <m:mn>0</m:mn>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>.</m:mo>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrcqWGebarcqWGsbGucqGH9aqpcqWGfbqrdaqadaqaamaalaaabaGaemOvayfabaGaem4uam1aaSbaaSqaaiabdEgaNbqabaaaaOGaeiiFaWNaem4uam1aaSbaaSqaaiabdEgaNbqabaGccqGH+aGpcqaIWaamaiaawIcacaGLPaaacqWGWbaCcqGGOaakcqWGtbWudaWgaaWcbaGaem4zaCgabeaakiabg6da+iabicdaWiabcMcaPiabc6caUaaa@4685@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>The FDR provides the expected proportion of false positives among the selected DEGs where the number of selected genes is greater than 0. In this paper <it>&#945; </it>is selected such that FDR is minimized. If the ground truth information about the DEGs is available, the performance of ranking algorithms may be compared using the performance analysis curves.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Performance analysis curves</p>
            </st>
            <p>The performance analysis curves are employed to study the performance of different ranking algorithms. The problem at hand is a binary classifier where the gene is either differentially expressed or not differentially expressed. There are four possible alternatives that may be obtained from the classifier viz. true positives (TPs), false positives (FPs), true negatives (TNs) and false negatives (FNs). The TPs are the number of true DEGs among the selected DEGs <it>S</it><sub><it>g</it></sub>. The FPs are the number of true non-DEGs among <it>S</it><sub><it>g</it></sub>. Alternatively, TNs are the total number of true non-DEGs among the genes deemed insignificant by the algorithm where as the FNs are the total number of true DEGs among the genes deemed insignificant. If the labels for the genes (differentially expressed/ non-differentially expressed) are available, which is true for artificial microarray datasets employed in this study, it is possible to accurately find TPs, FPs, TNs and FNs. The plot of TPF vs FPF hence, enables the comparison of performance of various classifiers employed in the study. Each of the 50 artificial datasets employed in this paper is used as an instance for building the performance curves. The TP, FP, TN and FN are added at each instance for 50 artificial datasets. The true positive fraction (TPF) is obtained by using the formula TPF = TP/(TP+FN) and false positive fraction (FPF) by using the formula FPF = FP/(FP+TN). These TPFs and FPFs are plotted to build the performance analysis curves.</p>
         </sec>
         <sec>
            <st>
               <p>Artificial microarray datasets</p>
            </st>
            <p>Two different models are employed to generate artificial microarray datasets viz. i) Lognormal model <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and ii) Asymmetric Laplace distribution <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Each artificial dataset is created to have 2050 genes with 10 samples under each of the two conditions. The first 50 genes are rendered differentially expressed and the rest 2000 are rendered non-DEGs. This process enables class labels for genes (DEGs or non-DEGs) for each generated artificially generated microarray dataset which can be used as ground truth to quantitatively assess the performance of different algorithms used in this study.</p>
            <sec>
               <st>
                  <p>Lognormal model</p>
               </st>
               <p>A lognormal distribution-based model is used to generate artificial microarray datasets as proposed in <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. The artificial microarray datasets are generated based on a multivariate lognormal model. The means under both conditions for the DEGs, are set to a fixed value and for the non-DEGs, means under one condition are set to zero and for other condition are drawn from <it>N (3, 1)</it>. Unequal variances following a Gamma distribution are used for DEGs as reported in the literature <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B27">27</abbr></abbrgrp>. The parameters used for generating the artificial microarray datasets are shown in table <tblr tid="T1">1</tblr>.</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>Parameters for Generating Artificial Microarray Datasets</p>
                  </caption>
                  <tblbdy cols="4">
                     <r>
                        <c ca="left">
                           <p>Tissue Type</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>Normal tissues (condition1)</p>
                        </c>
                        <c ca="left">
                           <p>Abnormal tissues (condition 2)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Non- DEGs</p>
                        </c>
                        <c ca="left">
                           <p>mean</p>
                        </c>
                        <c ca="left">
                           <p>0</p>
                        </c>
                        <c ca="left">
                           <p>0</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>variance</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Gamma distribution with mean 2, variance 2</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>DEGs</p>
                        </c>
                        <c ca="left">
                           <p>mean</p>
                        </c>
                        <c ca="left">
                           <p>0</p>
                        </c>
                        <c ca="left">
                           <p>Normal distribution mean 3, variance 1</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>variance</p>
                        </c>
                        <c ca="left">
                           <p>Gamma distribution with mean 3, variance 2</p>
                        </c>
                        <c ca="left">
                           <p>Gamma distribution with mean 2, variance 2</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>DEGs, Differentially Expressed Genes; Non-DEGs, Non-Differentially Expressed Genes.</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Asymmetric laplace distribution model</p>
               </st>
               <p>The artificial microarray datasets are created using the same procedure employed for lognormal distribution but by using an Asymmetric Laplace distribution as reported in <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. The mean and variance of the DEGs and Non-DEGs are approximated using the parameters in table <tblr tid="T1">1</tblr>. The sample size was set to 12.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>The performance of the proposed unified framework for finding DEGs from microarray datasets is evaluated using two models of simulated microarray datasets (50 artificially generated microarray datasets each <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B27">27</abbr></abbrgrp>) as well as six real cancer and Parkinson's microarray datasets <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. Artificial datasets with ground truth information are used for the comparison of performance of unified framework with other gene selection methods. The performance of various gene selection algorithms <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B11">11</abbr><abbr bid="B13">13</abbr></abbrgrp> is further compared with the proposed method using real microarray datasets in the selection of DEGs.</p>
         <sec>
            <st>
               <p>Artificial microarray datasets</p>
            </st>
            <p>A lognormal distribution and asymmetric Laplace distribution model are used to generate artificial microarray datasets as proposed in <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. The steps involved in the analysis of artificial microarray datasets can be succinctly summarized as follows:</p>
            <p>1. Generate artificial microarray dataset such that the first 50 genes are rendered differentially expressed and next 2000 are non-differentially expressed (see methods).</p>
            <p>2. Find the ranks using module 1 of the unified framework and convert them to p-values using the R-test.</p>
            <p>3. Merge the p-values and obtain the unified p-value using Fisher's omnibus criteria.</p>
            <p>4. Perform FDR Analysis.</p>
            <p>5. Compare the DEGs with the ground truth to obtain TPF and FPF (see methods).</p>
            <p>6. Repeat the steps 1&#8211;5 for all 50 artificial microarray datasets to obtain performance curves as shown in Fig. <figr fid="F4">4</figr>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>The Performance curves for various Gene Selection Methods using Artificial Microarray Datasets</p>
               </caption>
               <text>
                  <p>The Performance curves for various Gene Selection Methods using Artificial Microarray Datasets.</p>
               </text>
               <graphic file="1471-2105-8-347-4"/>
            </fig>
            <p>The unified framework, its individual modules and other well known techniques such as the t-statistics <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B32">32</abbr></abbrgrp>, significance analysis of microarrays <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, adaptive ranking <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, combined Adaptive ranking <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and two-way clustering using ASI <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> are employed to find DEGs of 50 artificially generated microarray datasets. The R-test <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> is employed to convert ranks to p-values for the gene selection methods that do not offer p-values. The Fig. <figr fid="F4">4</figr> shows the performance curves (see methods) of various well known methods and the proposed unified approach using lognormal model and asymmetric Laplace model. Analyzing the values in the performance plots, it can be inferred that the proposed unified approach outperforms the other gene selection methods in finding the DEGs from the artificial microarray data.</p>
         </sec>
         <sec>
            <st>
               <p>Leukemia microarray dataset</p>
            </st>
            <p>Gene expressions of approximately 6817 genes are used to classify two types of acute Leukemia viz. Acute Lymphoid Leukemia (ALL) and Acute Myeloid Leukemia (AML). The data consists of 47 (38 B-cell and 9 T-cell) cases of ALL and 25 cases of AML samples. The data is divided into a training class containing 38 samples (27 ALL and 11 AML) and a test class containing 34 samples of tissues (20 ALL and 14 AML). The class labels for the training and test samples are available from Golub et al. <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. The pre-processing steps proposed by Golub et al. resulted in 3571 genes, the rest of the genes are considered noise and therefore eliminated.</p>
            <sec>
               <st>
                  <p>Gene selection and statistical validation</p>
               </st>
               <p>The various well known gene selection methods are applied on the training set and the p-values of the genes are obtained. For the gene selection methods which offer only ranking, the R-test is employed to obtain the p-values. The Table <tblr tid="T2">2</tblr> shows the FDR analysis for the leukemia training dataset. As shown in Table <tblr tid="T2">2</tblr>, the unified framework recorded less percentage of false positives at various levels of <it>&#945; </it>(6.8%&#8211;16.92%). This indicates the improved performance of the unified framework over other gene selection methods. The top 300 low ranked genes obtained using the unified framework is provided as a supplementary document (See additional file <supplr sid="S2">2</supplr>). The first 52 genes are selected at significance level 0.001 as shown in the Table <tblr tid="T2">2</tblr> because it offered minimum expected percentage of false positives (6.8%).</p>
               <suppl id="S2">
                  <title>
                     <p>Additional file 2</p>
                  </title>
                  <text>
                     <p>Differentially expressed genes for Leukemia data. The genes selected by unified framework for the Leukemia data <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
                  </text>
                  <file name="1471-2105-8-347-S2.pdf">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>FDR Analysis of Leukemia Dataset</p>
                  </caption>
                  <tblbdy cols="13">
                     <r>
                        <c ca="left">
                           <p>AlphaValue</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>t-statistics</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>SAM</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Adaptive</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Combined Adaptive</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Two-Way</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Unified</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="13">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="13">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>0.01</p>
                        </c>
                        <c ca="left">
                           <p>154</p>
                        </c>
                        <c ca="left">
                           <p>23.19</p>
                        </c>
                        <c ca="left">
                           <p>171</p>
                        </c>
                        <c ca="left">
                           <p>20.88</p>
                        </c>
                        <c ca="left">
                           <p>183</p>
                        </c>
                        <c ca="left">
                           <p>19.51</p>
                        </c>
                        <c ca="left">
                           <p>189</p>
                        </c>
                        <c ca="left">
                           <p>18.89</p>
                        </c>
                        <c ca="left">
                           <p>191</p>
                        </c>
                        <c ca="left">
                           <p>18.7</p>
                        </c>
                        <c ca="left">
                           <p>211</p>
                        </c>
                        <c ca="left">
                           <p>16.92</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>0.005</p>
                        </c>
                        <c ca="left">
                           <p>94</p>
                        </c>
                        <c ca="left">
                           <p>18.99</p>
                        </c>
                        <c ca="left">
                           <p>103</p>
                        </c>
                        <c ca="left">
                           <p>17.33</p>
                        </c>
                        <c ca="left">
                           <p>119</p>
                        </c>
                        <c ca="left">
                           <p>15</p>
                        </c>
                        <c ca="left">
                           <p>121</p>
                        </c>
                        <c ca="left">
                           <p>14.76</p>
                        </c>
                        <c ca="left">
                           <p>117</p>
                        </c>
                        <c ca="left">
                           <p>15.26</p>
                        </c>
                        <c ca="left">
                           <p>147</p>
                        </c>
                        <c ca="left">
                           <p>12.15</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>0.001</p>
                        </c>
                        <c ca="left">
                           <p>29</p>
                        </c>
                        <c ca="left">
                           <p>12.31</p>
                        </c>
                        <c ca="left">
                           <p>34</p>
                        </c>
                        <c ca="left">
                           <p>10.5</p>
                        </c>
                        <c ca="left">
                           <p>41</p>
                        </c>
                        <c ca="left">
                           <p>8.71</p>
                        </c>
                        <c ca="left">
                           <p>41</p>
                        </c>
                        <c ca="left">
                           <p>8.71</p>
                        </c>
                        <c ca="left">
                           <p>43</p>
                        </c>
                        <c ca="left">
                           <p>8.3</p>
                        </c>
                        <c ca="left">
                           <p>52</p>
                        </c>
                        <c ca="left">
                           <p>6.8</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>GS, Genes Selected; FP, Percentage of False Positives.</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Comparison of the obtained DEGs with DEGs obtained by Golub et al. <abbrgrp><abbr bid="B30">30</abbr></abbrgrp></p>
               </st>
               <p>The 52 significantly expressed genes obtained using the unified framework are compared to the DEGs obtained by Golub et al. <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> (see additional file <supplr sid="S3">3</supplr>). There are 24 genes common to the genes found by Golub et al. This shows that the genes obtained by the unified framework are not significantly different from those obtained by Golub et al. It also shows that there are many genes selected by the unified framework that were not considered significant by Golub et al. It has already been statistically validated that unified framework offered less percentage of expected false positives and hence the genes selected using unified framework are considered to be relevant.</p>
               <suppl id="S3">
                  <title>
                     <p>Additional file 3</p>
                  </title>
                  <text>
                     <p>Common genes for Leukemia data. The genes found using unified framework common to the genes found by Golub et al <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
                  </text>
                  <file name="1471-2105-8-347-S3.pdf">
                     <p>Click here for file</p>
                  </file>
               </suppl>
            </sec>
            <sec>
               <st>
                  <p>Clustering-based validation</p>
               </st>
               <p>The ASI-based clustering algorithm <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and the steps outlined in the background section are employed for validation using clustering. The training samples are clustered using the DEGs obtained for individual methods. The obtained sample clusters are compared with the class labels of the samples. The Table <tblr tid="T3">3</tblr>, row 2 shows the number of correctly identified samples. As shown in the Table <tblr tid="T3">3</tblr>, the DEGs obtained by the unified framework offered 100% accuracy in the identification of training sample classes. The two-way framework also offered 100% accuracy in identification of the labels of training samples. Additional validations are performed to assess the performance of the individual methods.</p>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>ASI Classification of Leukemia Samples using the DEGs</p>
                  </caption>
                  <tblbdy cols="8">
                     <r>
                        <c ca="left">
                           <p>Gene Selection Method</p>
                        </c>
                        <c ca="left">
                           <p>Samples</p>
                        </c>
                        <c ca="left">
                           <p>t-statistics</p>
                        </c>
                        <c ca="left">
                           <p>SAM</p>
                        </c>
                        <c ca="left">
                           <p>Adaptive Ranking</p>
                        </c>
                        <c ca="left">
                           <p>Combined Adaptive Ranking</p>
                        </c>
                        <c ca="left">
                           <p>Two-way Clustering</p>
                        </c>
                        <c ca="left">
                           <p>Unified Ranking</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="8">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Training</p>
                        </c>
                        <c ca="left">
                           <p>38</p>
                        </c>
                        <c ca="left">
                           <p>33</p>
                        </c>
                        <c ca="left">
                           <p>35</p>
                        </c>
                        <c ca="left">
                           <p>36</p>
                        </c>
                        <c ca="left">
                           <p>36</p>
                        </c>
                        <c ca="left">
                           <p>38</p>
                        </c>
                        <c ca="left">
                           <p>38</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Testing</p>
                        </c>
                        <c ca="left">
                           <p>34</p>
                        </c>
                        <c ca="left">
                           <p>25</p>
                        </c>
                        <c ca="left">
                           <p>28</p>
                        </c>
                        <c ca="left">
                           <p>29</p>
                        </c>
                        <c ca="left">
                           <p>30</p>
                        </c>
                        <c ca="left">
                           <p>30</p>
                        </c>
                        <c ca="left">
                           <p>33</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>Samples, The total number of samples for training and testing; Other Columns, Number of samples classified correctly by individual methods; Example, number of training samples classified correctly by using t-statistics is 33 out of 38; The same procedure is followed for other columns of table 3 and for the tables 5, 7, 9 and 11.</p>
                  </tblfn>
               </tbl>
               <p>The ASI algorithm is further applied to cluster the test samples using the DEGs obtained through respective methods. It is evident from the row 3 of Table <tblr tid="T3">3</tblr> that the DEGs obtained using the unified framework classified the AML and ALL samples better (97.06%) than the DEGs obtained using the other methods. This also shows the improved performance of unified framework over the other methods as shown in Table <tblr tid="T3">3</tblr>.</p>
            </sec>
            <sec>
               <st>
                  <p>Visualization-based validation</p>
               </st>
               <p>The idea behind visualization-based cross validation is that if the genes obtained using a gene selection method are differentially expressed, they should separate the sample cases of different classes in the projected space <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The Fig. <figr fid="F5">5</figr> shows the visualization of samples using DEGs as features using a 3D star coordinate projection algorithm (3DSCP). Comparing the Figs. <figr fid="F5">5(a)</figr> to <figr fid="F5">5(f)</figr> it may be seen that the unified framework offered clear differentiation between different sample cases. Although the two-way clustering and unified approach identified all the samples correctly as shown in the Table <tblr tid="T3">3</tblr> for training samples, comparing the Figs. <figr fid="F5">5(e)</figr> and <figr fid="F5">5(f)</figr>, it may be seen that the unified framework offered much clear separation between the samples of different cases. This evidence suggests that better gene selection is achieved using the unified framework.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>The 3DSCP Projection of ALL and AML Samples using DEGs as Features using Various Gene Selection Methods for Leukemia Dataset</p>
                  </caption>
                  <text>
                     <p><b>The 3DSCP Projection of ALL and AML Samples using DEGs as Features using Various Gene Selection Methods for Leukemia Dataset</b>. a) t-statistics, b) SAM, c) Adaptive Ranking, d) Combined Adaptive Ranking, e) Two-Way Clustering and f) Unified Framework</p>
                  </text>
                  <graphic file="1471-2105-8-347-5"/>
               </fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Gastric cancer microarray dataset</p>
            </st>
            <p>The objective of this study is to identify genes distinguishing primary gastric cancers and metastatic gastric cancers from neoplastic gastric cancers which are otherwise morphologically indistinguishable. Approximately 30300 genes are used to study expression patterns of 90 primary gastric cancers and 22 neoplastic gastric cancers. The preprocessing steps indicated by Chen et al. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> are employed resulting in 5200 genes for further study.</p>
            <p>The two bootstrapped datasets are created from the original dataset, one for training and one for testing. The training data has randomly selected 60 primary samples and 12 neoplastic samples where as the test data has randomly selected 30 primary samples and 10 neoplastic samples. The experimental design used for the Leukemia dataset is followed for the analysis of the Gastric cancer dataset.</p>
            <sec>
               <st>
                  <p>Gene selection and statistical validation</p>
               </st>
               <p>The gene selection is performed such that there is minimum percentage of expected false positives. As shown in the Table <tblr tid="T4">4</tblr>, the unified framework recorded less percentage of false positives (2.48%-11.98%) than the other gene selection methods at various levels of <it>&#945;</it>. The full list of better ranked genes can be accessed from additional file <supplr sid="S4">4</supplr>.</p>
               <suppl id="S4">
                  <title>
                     <p>Additional file 4</p>
                  </title>
                  <text>
                     <p>Differentially expressed genes for Gastric cancer data. The genes selected by unified framework for the Gastric cancer data <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
                  </text>
                  <file name="1471-2105-8-347-S4.pdf">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <tbl id="T4">
                  <title>
                     <p>Table 4</p>
                  </title>
                  <caption>
                     <p>FDR Analysis of Gastric Cancer Dataset</p>
                  </caption>
                  <tblbdy cols="13">
                     <r>
                        <c ca="left">
                           <p>AlphaValue</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>t-statistics</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>SAM</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Adaptive</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Combined Adaptive</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Two-Way</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Unified</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="13">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="13">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>0.01</p>
                        </c>
                        <c ca="left">
                           <p>417</p>
                        </c>
                        <c ca="left">
                           <p>12.47</p>
                        </c>
                        <c ca="left">
                           <p>398</p>
                        </c>
                        <c ca="left">
                           <p>13.07</p>
                        </c>
                        <c ca="left">
                           <p>397</p>
                        </c>
                        <c ca="left">
                           <p>13.10</p>
                        </c>
                        <c ca="left">
                           <p>406</p>
                        </c>
                        <c ca="left">
                           <p>12.81</p>
                        </c>
                        <c ca="left">
                           <p>414</p>
                        </c>
                        <c ca="left">
                           <p>12.56</p>
                        </c>
                        <c ca="left">
                           <p>434</p>
                        </c>
                        <c ca="left">
                           <p>11.98</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>0.005</p>
                        </c>
                        <c ca="left">
                           <p>299</p>
                        </c>
                        <c ca="left">
                           <p>8.70</p>
                        </c>
                        <c ca="left">
                           <p>283</p>
                        </c>
                        <c ca="left">
                           <p>9.19</p>
                        </c>
                        <c ca="left">
                           <p>279</p>
                        </c>
                        <c ca="left">
                           <p>9.32</p>
                        </c>
                        <c ca="left">
                           <p>288</p>
                        </c>
                        <c ca="left">
                           <p>9.03</p>
                        </c>
                        <c ca="left">
                           <p>294</p>
                        </c>
                        <c ca="left">
                           <p>8.84</p>
                        </c>
                        <c ca="left">
                           <p>325</p>
                        </c>
                        <c ca="left">
                           <p>8</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>0.001</p>
                        </c>
                        <c ca="left">
                           <p>173</p>
                        </c>
                        <c ca="left">
                           <p>3.01</p>
                        </c>
                        <c ca="left">
                           <p>189</p>
                        </c>
                        <c ca="left">
                           <p>2.75</p>
                        </c>
                        <c ca="left">
                           <p>166</p>
                        </c>
                        <c ca="left">
                           <p>3.13</p>
                        </c>
                        <c ca="left">
                           <p>175</p>
                        </c>
                        <c ca="left">
                           <p>2.97</p>
                        </c>
                        <c ca="left">
                           <p>187</p>
                        </c>
                        <c ca="left">
                           <p>2.78</p>
                        </c>
                        <c ca="left">
                           <p>210</p>
                        </c>
                        <c ca="left">
                           <p>2.48</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>GS, Genes Selected; FP, Percentage of False Positives.</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Comparison of the DEGs to the significant genes obtained by Chen et al. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp></p>
               </st>
               <p>The DEGs found using the unified framework are compared against the DEGs found by Chen et al. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The 204 genes out of 210 genes found by unified algorithm are common to the DEGs found by Chen et al. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The list of common genes may be accessed through additional file <supplr sid="S5">5</supplr>. It may be seen that most of the genes found using the unified framework were present in the list of 3000 genes found significant by Chen et al. The improved performance of the unified framework may be attributed to the rejection of most of the genes deemed significant by Chen et al. This is one of the advantages of FDR analysis which focuses not only on the selection of DEGs but also on the rejection of the insignificant genes.</p>
               <suppl id="S5">
                  <title>
                     <p>Additional file 5</p>
                  </title>
                  <text>
                     <p>Common genes for Gastric cancer data. The genes found using unified framework common to the genes found by Chen et al <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
                  </text>
                  <file name="1471-2105-8-347-S5.pdf">
                     <p>Click here for file</p>
                  </file>
               </suppl>
            </sec>
            <sec>
               <st>
                  <p>Clustering-based validation</p>
               </st>
               <p>The method similar to clustering based validation for leukemia dataset is followed for gastric cancer dataset. As shown in Table <tblr tid="T5">5</tblr>, row 2, the DEGs obtained by the unified framework offered 100% accuracy in the identification of the training samples which was not achieved by DEGs obtained by other methods. It may also be seen from the row 3 of the Table <tblr tid="T5">5</tblr>, that DEGs obtained using the unified framework identified the primary and neoplastic samples from the test set better than the DEGs obtained using the other gene selection methods (97.5%).</p>
               <tbl id="T5">
                  <title>
                     <p>Table 5</p>
                  </title>
                  <caption>
                     <p>ASI Classification of Gastric Cancer Samples using the DEGs</p>
                  </caption>
                  <tblbdy cols="8">
                     <r>
                        <c ca="left">
                           <p>Gene Selection Method</p>
                        </c>
                        <c ca="left">
                           <p>Samples</p>
                        </c>
                        <c ca="left">
                           <p>t-statistics</p>
                        </c>
                        <c ca="left">
                           <p>SAM</p>
                        </c>
                        <c ca="left">
                           <p>Adaptive Ranking</p>
                        </c>
                        <c ca="left">
                           <p>Combined Adaptive Ranking</p>
                        </c>
                        <c ca="left">
                           <p>Two-way Clustering</p>
                        </c>
                        <c ca="left">
                           <p>Unified Ranking</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="8">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Training</p>
                        </c>
                        <c ca="left">
                           <p>72</p>
                        </c>
                        <c ca="left">
                           <p>64</p>
                        </c>
                        <c ca="left">
                           <p>67</p>
                        </c>
                        <c ca="left">
                           <p>67</p>
                        </c>
                        <c ca="left">
                           <p>69</p>
                        </c>
                        <c ca="left">
                           <p>69</p>
                        </c>
                        <c ca="left">
                           <p>72</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Testing</p>
                        </c>
                        <c ca="left">
                           <p>40</p>
                        </c>
                        <c ca="left">
                           <p>28</p>
                        </c>
                        <c ca="left">
                           <p>34</p>
                        </c>
                        <c ca="left">
                           <p>33</p>
                        </c>
                        <c ca="left">
                           <p>35</p>
                        </c>
                        <c ca="left">
                           <p>33</p>
                        </c>
                        <c ca="left">
                           <p>39</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Visualization-based validation</p>
               </st>
               <p>The training samples from the Gastric cancer dataset are projected using DEGs as features for visual validation of the DEGs. The Fig. <figr fid="F6">6</figr> shows that the unified framework offered much clear separation between the samples of different cases with no overlap between the elements of two classes. This evidence suggests the better gene selection obtained using the unified framework.</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>The 3DSCP Projection of Primary and Neoplastic Samples using DEGs as Features using Unified Framework</p>
                  </caption>
                  <text>
                     <p>The 3DSCP Projection of Primary and Neoplastic Samples using DEGs as Features using Unified Framework.</p>
                  </text>
                  <graphic file="1471-2105-8-347-6"/>
               </fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Colon cancer microarray dataset</p>
            </st>
            <p>The Affymetrix oligonucleotide array complementary to more than 6500 human genes are used in this study. The gene expression is studied using 40 tumor samples and 22 normal samples. The preprocessing of this dataset resulted in 2000 interesting genes which have been used as input to the gene selection algorithms.</p>
            <p>The analysis is performed by first dividing the data into training and test sets. The training data has 25 tumor samples and 12 normal samples selected randomly where as the test data has 15 tumor samples and 10 normal samples selected randomly. The steps similar to experimental design followed for Leukemia dataset is used for the analysis of this dataset.</p>
            <sec>
               <st>
                  <p>Gene selection and statistical validation</p>
               </st>
               <p>The Table <tblr tid="T6">6</tblr> shows the percentage of expected false positives for various gene selection methods for different values of <it>&#945;</it>. The unified framework as shown in the Table <tblr tid="T6">6</tblr> offered improved performance in the selection of DEGs. The full list of better ranked genes can be accessed from additional file <supplr sid="S6">6</supplr>. These genes are obtained for <it>&#945; </it>= 0.001 which offered minimum expected percentage of false positives (3.03%).</p>
               <suppl id="S6">
                  <title>
                     <p>Additional file 6</p>
                  </title>
                  <text>
                     <p>Differentially expressed genes for Colon cancer data. The genes selected by unified framework for the Colon cancer data <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
                  </text>
                  <file name="1471-2105-8-347-S6.pdf">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <tbl id="T6">
                  <title>
                     <p>Table 6</p>
                  </title>
                  <caption>
                     <p>FDR Analysis of Colon Cancer Dataset</p>
                  </caption>
                  <tblbdy cols="13">
                     <r>
                        <c ca="left">
                           <p>AlphaValue</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>t-statistics</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>SAM</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Adaptive</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Combined Adaptive</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Two-Way</p>
                        </c>
                        <c cspan="2" ca="left">
                           <p>Unified</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="13">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                        <c ca="left">
                           <p>GS</p>
                        </c>
                        <c ca="left">
                           <p>%FP</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="13">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>0.01</p>
                        </c>
                        <c ca="left">
                           <p>211</p>
                        </c>
                        <c ca="left">
                           <p>9.48</p>
                        </c>
                        <c ca="left">
                           <p>233</p>
                        </c>
                        <c ca="left">
                           <p>8.58</p>
                        </c>
                        <c ca="left">
                           <p>221</p>
                        </c>
                        <c ca="left">
                           <p>9.05</p>
                        </c>
                        <c ca="left">
                           <p>218</p>
                        </c>
                        <c ca="left">
                           <p>9.17</p>
                        </c>
                        <c ca="left">
                           <p>211</p>
                        </c>
                        <c ca="left">
                           <p>9.48</p>
                        </c>
                        <c ca="left">
                           <p>236</p>
                        </c>
                        <c ca="left">
                           <p>8.47</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>0.005</p>
                        </c>
                        <c ca="left">
                           <p>121</p>
                        </c>
                        <c ca="left">
                           <p>8.2</p>
                        </c>
                        <c ca="left">
                           <p>119</p>
                        </c>
                        <c ca="left">
                           <p>8.4</p>
                        </c>
                        <c ca="left">
                           <p>113</p>
                        </c>
                        <c ca="left">
                           <p>8.85</p>
                        </c>
                        <c ca="left">
                           <p>117</p>
                        </c>
                        <c ca="left">
                           <p>8.55</p>
                        </c>
                        <c ca="left">
                           <p>124</p>
                        </c>
                        <c ca="left">
                           <p>8.06</p>
                        </c>
                        <c ca="left">
                           <p>134</p>
                        </c>
                        <c ca="left">
                           <p>7.46</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>0.001</p>
                        </c>
                        <c ca="left">
                           <p>48</p>
                        </c>
                        <c ca="left">
                           <p>4.17</p>
                        </c>
                        <c ca="left">
                           <p>54</p>
                        </c>
                        <c ca="left">
                           <p>3.7</p>
                        </c>
                        <c ca="left">
                           <p>38</p>
                        </c>
                        <c ca="left">
                           <p>5.26</p>
                        </c>
                        <c ca="left">
                           <p>42</p>
                        </c>
                        <c ca="left">
                           <p>4.76</p>
                        </c>
                        <c ca="left">
                           <p>57</p>
                        </c>
                        <c ca="left">
                           <p>3.51</p>
                        </c>
                        <c ca="left">
                           <p>66</p>
                        </c>
                        <c ca="left">
                           <p>3.03</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>GS, Genes Selected; FP, Percentage of False Positives.</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Comparison of the DEGs with earlier works</p>
               </st>
               <p>A list of significantly differentially expressed genes is not available from Alon et. al for comparison. However, the comprehensive analysis on this dataset is performed by Su et al. <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. The procedure involves ranking the genes using 8 different measures viz. t-test, information gain, sum of variances, twoing rule, gini index, sum minority, max minority and ID SVM. The rankings are then fused to obtain a list of 100 better ranked genes <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. This list of 100 ranked genes is compared with the list of 66 genes obtained using the unified framework. The 51 genes out of 66 genes were among the top 100 genes obtained using the 'rankgene' method. The rank gene method did not employ any FDR analysis for gene selection, it merely lists the top 100 genes.</p>
            </sec>
            <sec>
               <st>
                  <p>Clustering-based validation</p>
               </st>
               <p>The Table <tblr tid="T7">7</tblr>, row 2 shows the number of samples identified correctly by the gene selection methods. As shown in the Table <tblr tid="T7">7</tblr>, the unified framework performed relatively better in the identification of training samples. The performance of the gene selection methods is also compared by using the test set. The Table <tblr tid="T7">7</tblr>, row 3 shows that DEGs obtained using the Unified framework performed better than DEGs obtained using other gene selection methods.</p>
               <tbl id="T7">
                  <title>
                     <p>Table 7</p>
                  </title>
                  <caption>
                     <p>ASI Classification of Colon Cancer Samples using the DEGs</p>
                  </caption>
                  <tblbdy cols="8">
                     <r>
                        <c ca="left">
                           <p>Gene Selection Method</p>
                        </c>
                        <c ca="left">
                           <p>Samples</p>
                        </c>
                        <c ca="left">
                           <p>t-statistics</p>
                        </c>
                        <c ca="left">
                           <p>SAM</p>
                        </c>
                        <c ca="left">
                           <p>Adaptive Ranking</p>
                        </c>
                        <c ca="left">
                           <p>Combined Adaptive Ranking</p>
                        </c>
                        <c ca="left">
                           <p>Two-way Clustering</p>
                        </c>
                        <c ca="left">
                           <p>Unified Ranking</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="8">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Training</p>
                        </c>
                        <c ca="left">
                           <p>37</p>
                        </c>
                        <c ca="left">
                           <p>31</p>
                        </c>
                        <c ca="left">
                           <p>33</p>
                        </c>
                        <c ca="left">
                           <p>31</p>
                        </c>
                        <c ca="left">
                           <p>33</p>
                        </c>
                        <c ca="left">
                           <p>32</p>
                        </c>
                        <c ca="left">
                           <p>35</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Testing</p>
                        </c>
                        <c ca="left">
                           <p>25</p>
                        </c>
                        <c ca="left">
                           <p>19</p>
                        </c>
                        <c ca="left">
                           <p>22</p>
                        </c>
                        <c ca="left">
                           <p>22</p>
                        </c>
                        <c ca="left">
                           <p>21</p>
                        </c>
                        <c ca="left">
                           <p>21</p>
                        </c>
                        <c ca="left">
                           <p>24</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Visualization-based validation</p>
               </st>
               <p>The training samples from the colon cancer dataset are projected using DEGs as features for visual validation of the DEGs. The Fig. <figr fid="F7">7</figr> shows that the unified framework performed better in the separation of colon cancer samples using the DEGs as features. The validation using 3DSCP shows the better performance of unified framework in the robust selection of DEGs.</p>
               <fig id="F7">
                  <title>
                     <p>Figure 7</p>
                  </title>
                  <caption>
                     <p>The 3DSCP Projection of Normal and Tumor Samples using DEGs as Features using Unified Framework</p>
                  </caption>
                  <text>
                     <p>The 3DSCP Projection of Normal and Tumor Samples using DEGs as Features using Unified Framework.</p>
                  </text>
                  <graphic file="1471-2105-8-347-7"/>
               </fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Parkinson's dataset</p>
            </st>
            <p>The Parkinson's dataset is employed to extend the application of two sample gene selection methods to multi-sample experiments. Three sets of microarray data are available for this model from Miller et al. <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. The first dataset is obtained using Codelink Mouse uniSet I bioarrays. The other two are obtained using Affymetrix array data analyzed using Affymetrix Microarray Suite software(MAS 5) and Model Based Expression Index (MBEI) using dChip software. The data consists of three treatment groups MC (saline treated mouse control), MME (mouse MPTP early) and MML (mouse MPTP late). Each group has four set of samples obtained at different times after MPTP administration using 12588 genes. The performance of different gene selection methods is evaluated for the comparison of MC and MME, MC and MML groups for all the three datasets. This pattern of comparison provides DEGs at different times. This also provides information about the DEGs at the early stage that stayed differentially expressed at late stage.</p>
            <sec>
               <st>
                  <p>Codelink mouse uniSet I bioarrays</p>
               </st>
               <sec>
                  <st>
                     <p>Experimental design</p>
                  </st>
                  <p>&#8226; Find the p-values for the genes based on differential expression between the MC and MME groups and MC and MML groups.</p>
                  <p>&#8226; Merge the two sets of p-values using Fisher's Omnibus criterion <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>.</p>
                  <p>&#8226; Perform FDR analysis and select the genes with significant differential expression such that there is minimum percentage of expected false positives.</p>
                  <p>&#8226; Apply the ASI algorithm to cross validate the DEGs.</p>
                  <p>&#8226; Repeat the process for each gene selection method.</p>
               </sec>
               <sec>
                  <st>
                     <p>Gene selection and statistical validation</p>
                  </st>
                  <p>The thresholding process by Golub et al. <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> on the codelink mouse uniset1 bioarrays resulted in 2347 genes for further analysis. The gene selection methods are employed for finding the p-values for the genes based on differential expression between MC and MME groups and MC and MML groups. The FDR analysis is performed on the merged p-values. The Table <tblr tid="T8">8</tblr> shows the FDR analysis of different gene selection methods. Table <tblr tid="T8">8</tblr> reveals that the unified framework shows improved performance in selecting the DEGs from the Parkinson's Dataset (13.8% false positives). The DEGs obtained using the unified framework at <it>&#945; </it>= 0.001 are provided in the additional file <supplr sid="S7">7</supplr>.</p>
                  <suppl id="S7">
                     <title>
                        <p>Additional file 7</p>
                     </title>
                     <text>
                        <p>Differentially expressed genes for Codelink data. The genes selected by unified framework for the Codelink data <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
                     </text>
                     <file name="1471-2105-8-347-S7.pdf">
                        <p>Click here for file</p>
                     </file>
                  </suppl>
                  <tbl id="T8">
                     <title>
                        <p>Table 8</p>
                     </title>
                     <caption>
                        <p>FDR Analysis of Parkinson's Dataset using CodeLink BioArrays</p>
                     </caption>
                     <tblbdy cols="13">
                        <r>
                           <c ca="left">
                              <p>AlphaValue</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>t-statistics</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>SAM</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Adaptive</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Combined Adaptive</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Two-Way</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Unified</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="13">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c>
                              <p/>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="13">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>0.01</p>
                           </c>
                           <c ca="left">
                              <p>51</p>
                           </c>
                           <c ca="left">
                              <p>46.07</p>
                           </c>
                           <c ca="left">
                              <p>52</p>
                           </c>
                           <c ca="left">
                              <p>45.13</p>
                           </c>
                           <c ca="left">
                              <p>57</p>
                           </c>
                           <c ca="left">
                              <p>41.18</p>
                           </c>
                           <c ca="left">
                              <p>57</p>
                           </c>
                           <c ca="left">
                              <p>41.18</p>
                           </c>
                           <c ca="left">
                              <p>51</p>
                           </c>
                           <c ca="left">
                              <p>46.07</p>
                           </c>
                           <c ca="left">
                              <p>56</p>
                           </c>
                           <c ca="left">
                              <p>41.91</p>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>0.005</p>
                           </c>
                           <c ca="left">
                              <p>25</p>
                           </c>
                           <c ca="left">
                              <p>46.9</p>
                           </c>
                           <c ca="left">
                              <p>29</p>
                           </c>
                           <c ca="left">
                              <p>40.4</p>
                           </c>
                           <c ca="left">
                              <p>28</p>
                           </c>
                           <c ca="left">
                              <p>41.8</p>
                           </c>
                           <c ca="left">
                              <p>28</p>
                           </c>
                           <c ca="left">
                              <p>41.8</p>
                           </c>
                           <c ca="left">
                              <p>29</p>
                           </c>
                           <c ca="left">
                              <p>40.4</p>
                           </c>
                           <c ca="left">
                              <p>31</p>
                           </c>
                           <c ca="left">
                              <p>37.8</p>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>0.001</p>
                           </c>
                           <c ca="left">
                              <p>14</p>
                           </c>
                           <c ca="left">
                              <p>16.76</p>
                           </c>
                           <c ca="left">
                              <p>15</p>
                           </c>
                           <c ca="left">
                              <p>15.65</p>
                           </c>
                           <c ca="left">
                              <p>14</p>
                           </c>
                           <c ca="left">
                              <p>16.76</p>
                           </c>
                           <c ca="left">
                              <p>16</p>
                           </c>
                           <c ca="left">
                              <p>14.67</p>
                           </c>
                           <c ca="left">
                              <p>15</p>
                           </c>
                           <c ca="left">
                              <p>15.65</p>
                           </c>
                           <c ca="left">
                              <p>17</p>
                           </c>
                           <c ca="left">
                              <p>13.81</p>
                           </c>
                        </r>
                     </tblbdy>
                     <tblfn>
                        <p>GS, Genes Selected; FP, Percentage of False Positives.</p>
                     </tblfn>
                  </tbl>
               </sec>
               <sec>
                  <st>
                     <p>Clustering-based validation of codelink data</p>
                  </st>
                  <p>The Table <tblr tid="T9">9</tblr>, row 2 shows the number of samples identified correctly by the gene selection methods. As shown in the Table <tblr tid="T9">9</tblr>, the unified framework identified all the samples correctly.</p>
                  <tbl id="T9">
                     <title>
                        <p>Table 9</p>
                     </title>
                     <caption>
                        <p>Cross Validation of Parkinson's Datasets using Training Samples</p>
                     </caption>
                     <tblbdy cols="8">
                        <r>
                           <c ca="left">
                              <p>Gene Selection Method</p>
                           </c>
                           <c ca="left">
                              <p>Samples</p>
                           </c>
                           <c ca="left">
                              <p>t-statistics</p>
                           </c>
                           <c ca="left">
                              <p>SAM</p>
                           </c>
                           <c ca="left">
                              <p>Adaptive Ranking</p>
                           </c>
                           <c ca="left">
                              <p>Combined Adaptive Ranking</p>
                           </c>
                           <c ca="left">
                              <p>Two-way Clustering</p>
                           </c>
                           <c ca="left">
                              <p>Unified Ranking</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="8">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>Codelink</p>
                           </c>
                           <c ca="left">
                              <p>12</p>
                           </c>
                           <c ca="left">
                              <p>10</p>
                           </c>
                           <c ca="left">
                              <p>10</p>
                           </c>
                           <c ca="left">
                              <p>9</p>
                           </c>
                           <c ca="left">
                              <p>11</p>
                           </c>
                           <c ca="left">
                              <p>10</p>
                           </c>
                           <c ca="left">
                              <p>12</p>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>MAS05</p>
                           </c>
                           <c ca="left">
                              <p>12</p>
                           </c>
                           <c ca="left">
                              <p>10</p>
                           </c>
                           <c ca="left">
                              <p>10</p>
                           </c>
                           <c ca="left">
                              <p>10</p>
                           </c>
                           <c ca="left">
                              <p>10</p>
                           </c>
                           <c ca="left">
                              <p>11</p>
                           </c>
                           <c ca="left">
                              <p>12</p>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>Dchip</p>
                           </c>
                           <c ca="left">
                              <p>12</p>
                           </c>
                           <c ca="left">
                              <p>9</p>
                           </c>
                           <c ca="left">
                              <p>9</p>
                           </c>
                           <c ca="left">
                              <p>9</p>
                           </c>
                           <c ca="left">
                              <p>10</p>
                           </c>
                           <c ca="left">
                              <p>11</p>
                           </c>
                           <c ca="left">
                              <p>12</p>
                           </c>
                        </r>
                     </tblbdy>
                  </tbl>
               </sec>
               <sec>
                  <st>
                     <p>Visualization-based validation</p>
                  </st>
                  <p>The DEGs obtained using the various gene selection methods are projected by employing the 3DSCP algorithm. The Fig. <figr fid="F8">8</figr> shows the projection of MC, MME and MML samples using the DEGs obtained by using different gene selection methods as features. As shown in the Fig. <figr fid="F8">8</figr>, the DEGs obtained from unified framework yield clear separation between different sample cases showing the validity of the selected genes.</p>
                  <fig id="F8">
                     <title>
                        <p>Figure 8</p>
                     </title>
                     <caption>
                        <p>The 3DSCP Projection of MC, MME and MML Samples using DEGs as Features using Unified Framework for CodeLink Parkinson's Dataset</p>
                     </caption>
                     <text>
                        <p>The 3DSCP Projection of MC, MME and MML Samples using DEGs as Features using Unified Framework for CodeLink Parkinson's Dataset.</p>
                     </text>
                     <graphic file="1471-2105-8-347-8"/>
                  </fig>
               </sec>
            </sec>
            <sec>
               <st>
                  <p>Affymetrix using MAS 05</p>
               </st>
               <sec>
                  <st>
                     <p>Gene selection and statistical validation</p>
                  </st>
                  <p>The irrelevant genes are first filtered out by employing the filtering process by Golub et al. <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> resulting in 2433 genes for further analysis. The experimental design used for the codelink data is followed for the MAS 05 data. The FDR analysis of different gene selection methods as shown in the Table <tblr tid="T10">10</tblr> shows improved performance in selection of DEGs from the Parkinson's MAS 5 Dataset (16.71% false positives). The DEGs obtained using the unified framework at <it>&#945; </it>= 0.001 are provided in the additional file <supplr sid="S8">8</supplr>.</p>
                  <suppl id="S8">
                     <title>
                        <p>Additional file 8</p>
                     </title>
                     <text>
                        <p>Differentially expressed genes for MAS05 data. The genes selected by unified framework for the MAS05 data <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
                     </text>
                     <file name="1471-2105-8-347-S8.pdf">
                        <p>Click here for file</p>
                     </file>
                  </suppl>
                  <tbl id="T10">
                     <title>
                        <p>Table 10</p>
                     </title>
                     <caption>
                        <p>FDR Analysis of Parkinson's Dataset using Affymetrix MAS05</p>
                     </caption>
                     <tblbdy cols="13">
                        <r>
                           <c ca="left">
                              <p>AlphaValue</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>t-statistics</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>SAM</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Adaptive</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Combined Adaptive</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Two-Way</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Unified</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="13">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c>
                              <p/>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="13">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>0.01</p>
                           </c>
                           <c ca="left">
                              <p>46</p>
                           </c>
                           <c ca="left">
                              <p>50.87</p>
                           </c>
                           <c ca="left">
                              <p>46</p>
                           </c>
                           <c ca="left">
                              <p>50.87</p>
                           </c>
                           <c ca="left">
                              <p>48</p>
                           </c>
                           <c ca="left">
                              <p>48.75</p>
                           </c>
                           <c ca="left">
                              <p>48</p>
                           </c>
                           <c ca="left">
                              <p>48.75</p>
                           </c>
                           <c ca="left">
                              <p>49</p>
                           </c>
                           <c ca="left">
                              <p>47.76</p>
                           </c>
                           <c ca="left">
                              <p>51</p>
                           </c>
                           <c ca="left">
                              <p>45.88</p>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>0.005</p>
                           </c>
                           <c ca="left">
                              <p>25</p>
                           </c>
                           <c ca="left">
                              <p>46.8</p>
                           </c>
                           <c ca="left">
                              <p>27</p>
                           </c>
                           <c ca="left">
                              <p>43.34</p>
                           </c>
                           <c ca="left">
                              <p>25</p>
                           </c>
                           <c ca="left">
                              <p>46.8</p>
                           </c>
                           <c ca="left">
                              <p>28</p>
                           </c>
                           <c ca="left">
                              <p>41.8</p>
                           </c>
                           <c ca="left">
                              <p>28</p>
                           </c>
                           <c ca="left">
                              <p>41.8</p>
                           </c>
                           <c ca="left">
                              <p>31</p>
                           </c>
                           <c ca="left">
                              <p>37.75</p>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>0.001</p>
                           </c>
                           <c ca="left">
                              <p>12</p>
                           </c>
                           <c ca="left">
                              <p>19.5</p>
                           </c>
                           <c ca="left">
                              <p>13</p>
                           </c>
                           <c ca="left">
                              <p>18</p>
                           </c>
                           <c ca="left">
                              <p>11</p>
                           </c>
                           <c ca="left">
                              <p>21.27</p>
                           </c>
                           <c ca="left">
                              <p>12</p>
                           </c>
                           <c ca="left">
                              <p>19.5</p>
                           </c>
                           <c ca="left">
                              <p>14</p>
                           </c>
                           <c ca="left">
                              <p>16.71</p>
                           </c>
                           <c ca="left">
                              <p>14</p>
                           </c>
                           <c ca="left">
                              <p>16.71</p>
                           </c>
                        </r>
                     </tblbdy>
                     <tblfn>
                        <p>GS, Genes Selected; FP, Percentage of False Positives.</p>
                     </tblfn>
                  </tbl>
               </sec>
               <sec>
                  <st>
                     <p>Clustering-based validation of MAS 05 data</p>
                  </st>
                  <p>The Table <tblr tid="T9">9</tblr> shows the number of samples identified correctly by the gene selection methods. As shown in the Table <tblr tid="T9">9</tblr>, row 3, the unified framework performed relatively better in the validation of the samples (100% accuracy).</p>
               </sec>
               <sec>
                  <st>
                     <p>Visualization-based validation</p>
                  </st>
                  <p>The Fig. <figr fid="F9">9</figr> shows the projection of MC, MME and MML samples using the DEGs obtained by using different gene selection methods as features for the MAS05 data. The unified framework offered clear separation between the data points in the projected space as shown in Fig. <figr fid="F9">9</figr> showing the validity of the proposed approach.</p>
                  <fig id="F9">
                     <title>
                        <p>Figure 9</p>
                     </title>
                     <caption>
                        <p>The 3DSCP Projection of MC, MME and MML Samples using DEGs as Features using Unified Framework for MAS 05 Parkinson's Dataset</p>
                     </caption>
                     <text>
                        <p>The 3DSCP Projection of MC, MME and MML Samples using DEGs as Features using Unified Framework for MAS 05 Parkinson's Dataset.</p>
                     </text>
                     <graphic file="1471-2105-8-347-9"/>
                  </fig>
               </sec>
            </sec>
            <sec>
               <st>
                  <p>Affymetrix using dchip</p>
               </st>
               <sec>
                  <st>
                     <p>Gene selection and statistical validation</p>
                  </st>
                  <p>The DChip data is first filtered to remove irrelevant genes using the method by Golub et al. <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> resulting in 2179 genes for further analysis. The experimental design employed for codelink data is followed for the dChip data. The Table <tblr tid="T11">11</tblr> as shown in Table <tblr tid="T11">11</tblr> offered improved performance in selection of DEGs than most of the gene selection methods (12.8% false positives). The DEGs obtained using the unified framework at <it>&#945; </it>= 0.001 are provided in additional file <supplr sid="S9">9</supplr>.</p>
                  <suppl id="S9">
                     <title>
                        <p>Additional file 9</p>
                     </title>
                     <text>
                        <p>Differentially expressed genes for dChip data. The genes selected by unified framework for the dChip data <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
                     </text>
                     <file name="1471-2105-8-347-S9.pdf">
                        <p>Click here for file</p>
                     </file>
                  </suppl>
                  <tbl id="T11">
                     <title>
                        <p>Table 11</p>
                     </title>
                     <caption>
                        <p>FDR Analysis of Parkinson's Dataset using Affymetrix dChip</p>
                     </caption>
                     <tblbdy cols="13">
                        <r>
                           <c ca="left">
                              <p>AlphaValue</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>t-statistics</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>SAM</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Adaptive</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Combined Adaptive</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Two-Way</p>
                           </c>
                           <c cspan="2" ca="left">
                              <p>Unified</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="13">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c>
                              <p/>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                           <c ca="left">
                              <p>GS</p>
                           </c>
                           <c ca="left">
                              <p>%FP</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="13">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>0.01</p>
                           </c>
                           <c ca="left">
                              <p>49</p>
                           </c>
                           <c ca="left">
                              <p>44.47</p>
                           </c>
                           <c ca="left">
                              <p>50</p>
                           </c>
                           <c ca="left">
                              <p>43.58</p>
                           </c>
                           <c ca="left">
                              <p>52</p>
                           </c>
                           <c ca="left">
                              <p>41.9</p>
                           </c>
                           <c ca="left">
                              <p>52</p>
                           </c>
                           <c ca="left">
                              <p>41.9</p>
                           </c>
                           <c ca="left">
                              <p>49</p>
                           </c>
                           <c ca="left">
                              <p>44.47</p>
                           </c>
                           <c ca="left">
                              <p>54</p>
                           </c>
                           <c ca="left">
                              <p>40.35</p>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>0.005</p>
                           </c>
                           <c ca="left">
                              <p>33</p>
                           </c>
                           <c ca="left">
                              <p>33</p>
                           </c>
                           <c ca="left">
                              <p>35</p>
                           </c>
                           <c ca="left">
                              <p>31.1</p>
                           </c>
                           <c ca="left">
                              <p>35</p>
                           </c>
                           <c ca="left">
                              <p>31.1</p>
                           </c>
                           <c ca="left">
                              <p>35</p>
                           </c>
                           <c ca="left">
                              <p>31.1</p>
                           </c>
                           <c ca="left">
                              <p>33</p>
                           </c>
                           <c ca="left">
                              <p>33</p>
                           </c>
                           <c ca="left">
                              <p>37</p>
                           </c>
                           <c ca="left">
                              <p>29.43</p>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>0.001</p>
                           </c>
                           <c ca="left">
                              <p>15</p>
                           </c>
                           <c ca="left">
                              <p>14.53</p>
                           </c>
                           <c ca="left">
                              <p>15</p>
                           </c>
                           <c ca="left">
                              <p>14.53</p>
                           </c>
                           <c ca="left">
                              <p>15</p>
                           </c>
                           <c ca="left">
                              <p>14.53</p>
                           </c>
                           <c ca="left">
                              <p>16</p>
                           </c>
                           <c ca="left">
                              <p>13.62</p>
                           </c>
                           <c ca="left">
                              <p>14</p>
                           </c>
                           <c ca="left">
                              <p>15.56</p>
                           </c>
                           <c ca="left">
                              <p>15</p>
                           </c>
                           <c ca="left">
                              <p>13.22</p>
                           </c>
                        </r>
                     </tblbdy>
                     <tblfn>
                        <p>GS, Genes Selected; FP, Percentage of False Positives.</p>
                     </tblfn>
                  </tbl>
               </sec>
               <sec>
                  <st>
                     <p>Clustering-based validation of dchip data</p>
                  </st>
                  <p>The samples are clustered by employing the DEGs obtained using various gene selection methods (for <it>&#945; </it>= 0.001). The obtained sample clusters are compared with the class labels of the samples (MC, MME and MML). The Table <tblr tid="T9">9</tblr> shows that all samples (100%) are correctly identified using the proposed unified framework.</p>
               </sec>
               <sec>
                  <st>
                     <p>Visualization-based validation</p>
                  </st>
                  <p>The Fig. <figr fid="F10">10</figr> shows the projection of MC, MME and MML samples using the DEGs at <it>&#945; </it>= 0.001. The Analysis reveals that the DEGs obtained using the unified framework yielded good separation between the samples of various classes as shown in the Fig. <figr fid="F10">10</figr>. This evidence suggests better gene selection using the proposed method.</p>
                  <fig id="F10">
                     <title>
                        <p>Figure 10</p>
                     </title>
                     <caption>
                        <p>The 3DSCP Projection of MC, MME and MML Samples using DEGs as Features using Unified Framework for dChip Parkinson's Dataset</p>
                     </caption>
                     <text>
                        <p>The 3DSCP Projection of MC, MME and MML Samples using DEGs as Features using Unified Framework for dChip Parkinson's Dataset.</p>
                     </text>
                     <graphic file="1471-2105-8-347-10"/>
                  </fig>
               </sec>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>This paper presents a unified framework of gene selection and their validation. The fusion of two different gene selection algorithms viz. two-way clustering and combined adaptive ranking is performed to rank the genes. The two-way framework finds the differential expression of the co-expressed genes. The progressive framework using ASI algorithm is employed to cluster the gene dimension. This presents the gene clusters at different resolutions which may be analyzed for differential expression. The clusters at different resolutions may be tested for differential expression. The number of resolutions for progressive framework is determined using the Davies-Bouldin Index.</p>
         <p>Most of the ranking functions employed in this study for gene selection provide the user with only the relative ranking of the genes. These ranks enable sorting the genes based on differential expression but they do not indicate the significance of genes. The R-test presents a means of converting ranks into a measure of significance (p-values). The gene rankings using module1 are converted into p-values using R-test and fused using Fisher's omnibus criterion. The FDR analysis is further applied on the fused p-values. The FDR analysis enables judicious selection of DEGs by providing a balance between the genes selected and expected percentage of false positives. For example, at <it>&#945; </it>= 0.001 the percentage of false positives for gastric cancer dataset using the unified framework is 2.48%. This indicates that out of 210 genes there is a possibility of only 5 genes (210*2.4% = 5) to have occurred by chance.</p>
         <p>The real datasets are divided into two categories i) Two sample experiments with a large number of samples and ii) Multi sample experiments with small number of samples. For the first category, emphasis is made on the validation techniques. The data is divided into training and testing sets. The DEGs are obtained by employing the training set and three fold validations are performed. The improvement in statistical power for the selection of DEGs is first shown with the aid of FDR analysis. The clustering based cross validation of the DEGs is performed next by clustering the training and test samples and evaluating the performance. Finally, a visualization based cross validation is performed to show the separability of samples in the projected space. The aim of the second category of the real datasets is to show the extensibility of the proposed approach to multi-sample experiments. Due to the non-availability of large number of samples, the validation is performed on only the training set by employing the clustering and visualization based algorithms. The clustering-based validation approaches clearly showed the better performance of unified framework over the other algorithms. Further, the visualization based validation demonstrated that the DEGs obtained using the unified framework offered much clear separation between the samples of the different classes than the DEGs obtained using the other methods.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>A unified framework for finding DEGs from microarray data is developed and empirically evaluated. The judicious combination of the three different modules is used to build the unified framework. The performance of the unified framework is compared with other well known gene selection algorithms. The performance analysis curves using 50 artificial microarray datasets each following two different distributions indicate the superiority of the unified framework over the other reported algorithms. Further analyses on 6 real cancer datasets show the similar improvement in performance. The comprehensive validation of the DEGs is presented using the first three real datasets. The robustness in the selection of genes is first presented using FDR analysis for various methods used in the study. The clustering based validation is presented next by analyzing the clustering of training and test samples using ASI algorithm. Finally, a visualization based validation is performed. The scalability of the proposed unified approach to multi-sample experiments is demonstrated using the Parkinson's datasets. Empirical analyses on artificial and real microarray datasets illustrate the efficacy of the proposed unified framework in finding the DEGs.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JSS carried out the following tasks: i) preparation of data, ii) development and implementation of all algorithms used in this work and (iii) wrote the manuscript.</p>
         <p>MY carried out the following tasks: i) supervision of the whole work from inception to completion, ii) revised the manuscript and provided critical feedback to improve the intellectual merit of the contribution.</p>
         <p>Both the authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors acknowledge the Herff Fellowship, financial assistance from the Bioinformatics program and faculty start-up grants from the University of Memphis for partially funding this research. The authors thank Drs. Ebenezer George and Ramin Homayouni for their helpful tips in preparing the manuscript. The authors would also like to thank the anonymous reviewers for their helpful suggestions and comments in improving the quality of the paper.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>An Introduction of Variable and Feature Selection</p>
            </title>
            <aug>
               <au>
                  <snm>Guyon</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Journal of Machine Learning Research</source>
            <pubdate>2003</pubdate>
            <volume>3</volume>
            <issue>7-8</issue>
            <fpage>1157</fpage>
            <lpage>1182</lpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Coupled two-way clustering of gene microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Getz</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Domany</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Proceedings of National Academy of Science, USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <issue>22</issue>
            <fpage>12079</fpage>
            <lpage>12084</lpage>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Data-adaptive Test Statistics for Microarray Data</p>
            </title>
            <aug>
               <au>
                  <snm>Mukherjee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Laan</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>2</issue>
            <fpage>108</fpage>
            <lpage>114</lpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Adaptive Ranking and Selection of Differentially Expressed Genes from Microarray Data</p>
            </title>
            <aug>
               <au>
                  <snm>Shaik</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yeasin</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>WSEAS transactions on Biology and Biomedicine</source>
            <pubdate>2006</pubdate>
            <volume>3</volume>
            <issue>2</issue>
            <fpage>125</fpage>
            <lpage>133</lpage>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Advanced Data Mining Technologies in Bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Hui-Huang</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <publisher> Idea Group Publishing</publisher>
            <pubdate>2006</pubdate>
            <fpage>329</fpage>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Interrelated Two-way Clustering: an unsupervised approach for gene expression data analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Tang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>In Proceedings of the 2nd IEEE international Symposium on Bioinformatics and Bioengineering</source>
            <pubdate>2001</pubdate>
            <volume>14</volume>
            <issue>4</issue>
            <fpage>41</fpage>
            <lpage>48</lpage>
         </bibl>
         <bibl id="B7">
            <title>
               <p>A mixture model-based approach to the clustering of microarray expression data</p>
            </title>
            <aug>
               <au>
                  <snm>McLachlan</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Bean</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Peel</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>3</issue>
            <fpage>413</fpage>
            <lpage>422</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11934740</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Alon</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Barkai</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Notterman</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <cnm>K.Gish</cnm>
               </au>
               <au>
                  <snm>Ybarra</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mack</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>6745</fpage>
            <lpage>6750</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">21986</pubid>
                  <pubid idtype="pmpid" link="fulltext">10359783</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Interpreting Patterns of Gene Expression with Self-Organizing Maps:Methods and Applications to Hematopoietic differentiation</p>
            </title>
            <aug>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Kitareewan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dmitrovsky</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>2907</fpage>
            <lpage>2912</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">15868</pubid>
                  <pubid idtype="pmpid" link="fulltext">10077610</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Cluster Analysis and Display of Genome Wide Expression Patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Tavazoie</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cho</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nat Genetics</source>
            <pubdate>1999</pubdate>
            <volume>22</volume>
            <fpage>281</fpage>
            <lpage>285</lpage>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Analysis of Variance for Random Models: Theory, Methods, Applications and Data Analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Sahai</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ojeda</snm>
                  <fnm>MM</fnm>
               </au>
            </aug>
            <publisher> Birkhauser</publisher>
            <pubdate>2004</pubdate>
            <fpage>484</fpage>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Statistical Inference</p>
            </title>
            <aug>
               <au>
                  <snm>Casella</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Berger</snm>
                  <fnm>RL</fnm>
               </au>
            </aug>
            <source>Duxbury Advanced Series</source>
            <publisher> Duxbury Press</publisher>
            <edition>2</edition>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Significance Analysis of Microarrays Applied to The Ionizing Radiation Response</p>
            </title>
            <aug>
               <au>
                  <snm>Tusher</snm>
                  <fnm>VG</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>PNAS</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <issue>9</issue>
            <fpage>5116</fpage>
            <lpage>5121</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">33173</pubid>
                  <pubid idtype="pmpid" link="fulltext">11309499</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Comparison and Evaluation of Methods for Generating Differentially Expressed Gene lists from MicroArray Data</p>
            </title>
            <aug>
               <au>
                  <snm>Jeffery</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Culhane</snm>
                  <fnm>AC</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>359</fpage>
            <lpage>375</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1544358</pubid>
                  <pubid idtype="pmpid" link="fulltext">16872483</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hochberg</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J R Statistical Society</source>
            <pubdate>1995</pubdate>
            <volume>57</volume>
            <issue>1</issue>
            <fpage>289</fpage>
            <lpage>300</lpage>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The Control of The False Discovery Rate in Multiple Testing Under Dependency</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yekutieli</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>The Annals of Statistics</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>4</issue>
            <fpage>1165</fpage>
            <lpage>1188</lpage>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Statistical Significance for Genome Wide Studies</p>
            </title>
            <aug>
               <au>
                  <snm>Storey</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>PNAS</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <issue>16</issue>
            <fpage>9440</fpage>
            <lpage>9445</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">170937</pubid>
                  <pubid idtype="pmpid" link="fulltext">12883005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Controlling the Proportion of False Positives in Multiple Dependent Tests</p>
            </title>
            <aug>
               <au>
                  <snm>Fernando</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Nettleton</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Southey</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Dekkers</snm>
                  <fnm>JCM</fnm>
               </au>
               <au>
                  <snm>Rothschild</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Soller</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2004</pubdate>
            <volume>166</volume>
            <issue>1</issue>
            <fpage>611</fpage>
            <lpage>619</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1470704</pubid>
                  <pubid idtype="pmpid" link="fulltext">15020448</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Ranking Function Based on Higher Order Statistics (RF-HOS) for Two-Sample Microarray Experiments: May; Atlanta, GA.
					</p>
            </title>
            <aug>
               <au>
                  <snm>Shaik</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yeasin</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <publisher>Springer Verlag</publisher>
            <editor>Mandoiu I, Zelikovsky A</editor>
            <pubdate>2007</pubdate>
            <volume>LNBI 4463</volume>
            <fpage>97</fpage>
            <lpage>108</lpage>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Significance of Gene Ranking for Classification of Microarray Samples</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>IEEE/ACM Transactions on Computational Biology and Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>3</volume>
            <issue>3</issue>
            <fpage>312</fpage>
            <lpage>320</lpage>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A Progressive Framework for Two-Way Clustering Using Adaptive Subspace Iteration for Functionally Classifying Genes </p>
            </title>
            <aug>
               <au>
                  <snm>Shaik</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yeasin</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proceedings of IEEE IJCNN'06, Vancouver, Canada</source>
            <pubdate>2006</pubdate>
            <fpage>5287</fpage>
            <lpage>5292</lpage>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Visualization of High Dimensional Data using an Automated 3D Star Co-ordinate System</p>
            </title>
            <aug>
               <au>
                  <snm>Shaik</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yeasin</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proceedings of IEEE IJCNN'06, Vancouver, Canada</source>
            <pubdate>2006</pubdate>
            <fpage>2318</fpage>
            <lpage>2325</lpage>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Functionally Classifying Genes from Microarray Data Using Linear and Non-linear Data Projection</p>
            </title>
            <aug>
               <au>
                  <snm>Shaik</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yeasin</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>IEEE International Conference on Computer Systems and Applications</source>
            <pubdate>2006</pubdate>
            <fpage>608</fpage>
            <lpage>615</lpage>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Microarray Bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Stekel</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <publisher>Cambridge , Cambridge University Press</publisher>
            <edition>1</edition>
            <pubdate>2003</pubdate>
            <fpage>263</fpage>
         </bibl>
         <bibl id="B25">
            <title>
               <p>A Cluster Separation Measure</p>
            </title>
            <aug>
               <au>
                  <snm>Davies</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Bouldin</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
            <pubdate>1979</pubdate>
            <volume>1</volume>
            <fpage>224</fpage>
            <lpage>227</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Document Clustering via Adaptive Subspace Iteration</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <cnm>M.Ogihara</cnm>
               </au>
            </aug>
            <source>Special Information Group on Information Retrieval 2004</source>
            <pubdate>2004</pubdate>
            <fpage>218</fpage>
            <lpage>225</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Replicated Microarray Data</p>
            </title>
            <aug>
               <au>
                  <snm>Lonnstedt</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Statistica Sinica</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>31</fpage>
            <lpage>46</lpage>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Error Distribution for Gene Expression Data</p>
            </title>
            <aug>
               <au>
                  <snm>Purdom</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Holmes</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Statistical Applications in Genetics and Molecular Biology</source>
            <publisher>California , Stanford University</publisher>
            <pubdate>2005</pubdate>
            <volume>4</volume>
            <issue>1</issue>
            <fpage>1</fpage>
            <lpage>5</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Variation in Gene Expression Patterns in Human Gastric Cancers</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Leung</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Yeuen</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Ji</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>ASY</fnm>
               </au>
               <au>
                  <snm>Law</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Troyanskaya</snm>
                  <fnm>OG</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>So</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Mol Bio Cell</source>
            <pubdate>2003</pubdate>
            <volume>14</volume>
            <fpage>3208</fpage>
            <lpage>3215</lpage>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Molecular classification of cancer: class discovery and class prediction by gene expression monitoring</p>
            </title>
            <aug>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gaasenbeek</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Coller</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Loh</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Downing</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Caligiuri</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Bloomfield</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>286</volume>
            <fpage>531</fpage>
            <lpage>537</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10521349</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Dysregulation of Gene Expression in the 1-Methyl-4-Phenyl-1,2,3,6-Tetrahydropyridine- Lesioned Mouse Substantia Nigra</p>
            </title>
            <aug>
               <au>
                  <snm>Miller</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Callahan</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Casaceli</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kiser</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Chui</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kaysser-kranich</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Sendera</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Palaniappan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Federoff</snm>
                  <fnm>HJ</fnm>
               </au>
            </aug>
            <source>Journal of Neuroscience</source>
            <pubdate>2004</pubdate>
            <volume>24</volume>
            <issue>34</issue>
            <fpage>7445</fpage>
            <lpage>7454</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15329391</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Pattern Classification</p>
            </title>
            <aug>
               <au>
                  <snm>Duda</snm>
                  <fnm>RO</fnm>
               </au>
               <au>
                  <snm>E.Hart</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>G.Stork</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <publisher> John Wiley and Sons Inc</publisher>
            <edition>2nd</edition>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B33">
            <title>
               <p>rankgene:Identication of Diagnostic Genes Based on Expression Data</p>
            </title>
            <aug>
               <au>
                  <snm>Su</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Murali</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Pavlovic</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source/>
            <pubdate>2002</pubdate>
            <url>http://genomics10buedu/yangsu/rankgene/</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>The use of multiple measurements in taxonomic problems</p>
            </title>
            <aug>
               <au>
                  <snm>Fisher</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Annual Eugenics</source>
            <pubdate>1936</pubdate>
            <volume>7</volume>
            <issue>2</issue>
            <fpage>179</fpage>
            <lpage>188</lpage>
         </bibl>
      </refgrp>
   </bm>
</art>
