<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-29</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Measuring similarities between gene expression profiles through new data transformations</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Kim</snm>
               <fnm>Kyungpil</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>kpkim@stat.berkeley.edu</email>
            </au>
            <au id="A2">
               <snm>Zhang</snm>
               <fnm>Shibo</fnm>
               <insr iid="I3"/>
               <email>shibo_z@yahoo.com</email>
            </au>
            <au id="A3">
               <snm>Jiang</snm>
               <fnm>Keni</fnm>
               <insr iid="I3"/>
               <email>kenij@nature.berkeley.edu</email>
            </au>
            <au id="A4">
               <snm>Cai</snm>
               <fnm>Li</fnm>
               <insr iid="I4"/>
               <email>lcai@rutgers.edu</email>
            </au>
            <au id="A5">
               <snm>Lee</snm>
               <fnm>In-Beum</fnm>
               <insr iid="I2"/>
               <email>iblee@postech.ac.kr</email>
            </au>
            <au id="A6">
               <snm>Feldman</snm>
               <mi>J</mi>
               <fnm>Lewis</fnm>
               <insr iid="I3"/>
               <email>feldman@nature.berkeley.edu</email>
            </au>
            <au id="A7" ca="yes">
               <snm>Huang</snm>
               <fnm>Haiyan</fnm>
               <insr iid="I1"/>
               <email>hhuang@stat.berkeley.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Statistics, University of California, Berkeley, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Korea</p>
            </ins>
            <ins id="I3">
               <p>Department of Plant and Microbial Biology, University of California, Berkeley, USA</p>
            </ins>
            <ins id="I4">
               <p>Department of Biomedical Engineering, Rutgers University, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>29</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/29</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17257435</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-29</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>01</day>
               <month>9</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>27</day>
               <month>1</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>27</day>
               <month>1</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Kim et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Clustering methods are widely used on gene expression data to categorize genes with similar expression profiles. Finding an appropriate (dis)similarity measure is critical to the analysis. In our study, we developed a new measure for clustering the genes when the key factor is the shape of the profile, and when the expression magnitude should also be accounted for in determining the gene relationship. This is achieved by modeling the shape and magnitude parameters separately in a gene expression profile, and then using the estimated shape and magnitude parameters to define a measure in a new feature space.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We explored several different transformation schemes to construct the feature spaces that include a space whose features are determined by the mutual differences of the original expression components, a space derived from a parametric covariance matrix, and the principal component space in traditional PCA analysis. The former two are the newly proposed and the latter is explored for comparison purposes. The new measures we defined in these feature spaces were employed in a <it>K</it>-means clustering procedure to perform analyses. Applying these algorithms to a simulation dataset, a developing mouse retina SAGE dataset, a small yeast sporulation cDNA dataset, and a maize root affymetrix microarray dataset, we found from the results that the algorithm associated with the first feature space, named <it>TransChisq</it>, showed clear advantages over other methods.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The proposed <it>TransChisq </it>is very promising in capturing meaningful gene expression clusters. This study also demonstrates the importance of data transformations in defining an efficient distance measure. Our method should provide new insights in analyzing gene expression data. The clustering algorithms are available upon request.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>With the explosion of various 'omic' data, a general question facing the biologists and statisticians is how to summarize and organize the observed data into meaningful structures. Clustering is one of the methods that have been widely explored for this purpose <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. In particular, clustering is being generally applied to gene expression data to group genes with similar expression profiles into discrete functional clusters. Many clustering methods are available, including hierarchical clustering <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, <it>K</it>-means clustering <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, self-organizing maps <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, and various model-based methods <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <p>Recent research in clustering analysis has been focused largely on two areas: estimating the number of clusters in data <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp> and the optimization of the clustering algorithms <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. In this paper we studied a different yet fundamental issue in clustering analysis: to define an appropriate measure of similarity for gene expression patterns.</p>
         <p>The most commonly used distances or similarity measures for analyzing gene expression data are the <it>Pearson correlation coefficient </it>and <it>Euclidean distance</it>, which however, in some situations, could be unsuitable to explore the true gene relationship. The <it>Pearson correlation coefficient </it>is overly sensitive to the shape of an expression curve, and the <it>Euclidean distance </it>mainly considers the magnitude of the changes of the gene expression. For other model-based methods <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B15">15</abbr></abbrgrp>, their successes would highly rely on how well the assumed probability model fits the data and the clustering purpose.</p>
         <p>In recent literature, several advanced measures with emphasis on the expression profile shape have been developed in different contexts <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. In particular, based on the <it>Spearman Rank Correlation</it>, <it>CLARITY </it>was defined for detecting the local similarity or time-shifted patterns in expression profiles <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. However, the rank-based methods could mistakenly interpret a pattern since the use of rank causes information loss. As an example, we consider a profile <b><it>Y </it></b>= (104, 95, 88, 92, 88) with all components generated from the same Poisson distribution of mean 100. Clearly, the differences among the components in <b><it>Y </it></b>are due to the distribution variance and ranking in this case is meaningless. Briefly, <it>Spearman Rank Correlation </it>cannot distinguish the real differences from random errors in some situations and thus may provide a wrong estimate of the pattern.</p>
         <p>By separately modeling the shape and the magnitude parameters in a gene expression profile, we developed a new measure for clustering the genes when the profile shape is a key factor, and when the expression magnitude should also be accounted for in determining the gene relationship. The approach is to use the estimated shape and magnitude parameters to define a Chi-square-statistic based distance measure in a new feature space. An appropriate feature space helps summarize the data more effectively, hence improving the identification of gene relationships. We explored different transformation schemes to construct the feature spaces, which include a space with features determined by the mutual differences of the original expression components, a space derived from a parametric covariance matrix, and the principal component space in PCA analysis <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The former two are the newly proposed and the latter is explored for comparison purposes.</p>
         <p>The new measures associated with different feature spaces were employed in a <it>K</it>-means clustering procedure to perform clustering analyses. We designated the algorithm using the measure from the first transformed space as <it>TransChisq</it>, and the one associated with the principal component space as <it>PCAChisq</it>. The space derived from a parametric covariance matrix is not included in comparison for computational reasons (see Methods). For evaluation purposes we also implemented a set of widely used measures into the <it>K</it>-means clustering procedure, including Pearson correlation coefficient (<it>PearsonC</it>), Euclidian distance (<it>Eucli</it>), Spearman Rank Correlation (<it>SRC</it>), and a chi-square based measure for Poisson distributed data (<it>PoissonC</it>) <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. All the measures were applied to a simulation dataset, a developing mouse retina SAGE dataset of 153 tags <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, a small yeast sporulation cDNA dataset <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, and a maize root affymetrix microarray dataset <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The results showed that <it>TransChisq </it>outperforms other methods. Using the gap statistic <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>, <it>TransChisq </it>was also found to be advantageous in estimating the number of clusters. The underlying probability model of our method was adopted from the model of <it>PoissonC</it>, a method for analyzing the expression patterns in Serial Analysis of Gene Expression (SAGE) data <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The MATLAB source codes for all these algorithms are available upon request.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>First, we will illustrate the property of the proposed new transformations by applying them to a maize gene expression dataset. Then we will present the applications of <it>TransChisq</it>, <it>PCAChisq </it>and other methods to a simulation dataset, a yeast sporulation microarray dataset, and a mouse retinal SAGE dataset.</p>
         <sec>
            <st>
               <p>Experimental maize gene expression data</p>
            </st>
            <p>The maize dataset, consisting of nine Affymetrix microarrays, was generated to investigate the gene transcription activity in three maize root tissues with three replicates for each tissue: the proximal meristem (PM), the quiescent center (QC) and the root cap (RC) <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. 2092 significantly differentially expressed genes have been identified and categorized into 6 classes of expression patterns <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Here we used these genes to illustrate the property of the proposed transformations with comparison to the traditional PCA.</p>
            <p>Firstly, we applied the transformation employed in <it>TransChisq </it>to the data. Figures <figr fid="F1">1(a)&#8211;(c)</figr> plot the expression profiles of the genes in this new space. The blue and red genes are from the two dominant classes (RC up- or down-regulated genes account for 94% of all genes) and the other four colors (orange, green, pink, light blue) correspond to the other four small classes (up- or down-regulated genes in QC or PM account for 6% of all genes). The three plots show that the six classes can be recognized explicitly in any of the three subspaces of dimension 2.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Plots of 2092 maize genes on to the three different feature spaces</p>
               </caption>
               <text>
                  <p><b>Plots of 2092 maize genes on to the three different feature spaces</b>. From top to bottom, the genes are plotted on to the subspaces of dimension 2 of the new spaces. Figures 1(a-c) correspond to the space used in <it>TransChisq</it>, Figures 1(d-f) correspond to the space determined by the parametric covariance matrix and Figures 1(g-i) correspond to the principal component space associated with the <it>PCAChisq</it>. PC1, PC2 and PC3 specify the subspaces. Blue/red dots represent RC up-/down-regulated genes, cyanide/pink dots represent PM up-/down-regulated genes, green/orange dots represent QC up-/down-regulated genes. The dotted lines in (e) are the centers of the six class separating regions determined by the second and third component from the parametric covariance matrix.</p>
               </text>
               <graphic file="1471-2105-8-29-1"/>
            </fig>
            <p>We then applied the transformation suggested by a parametric covariance matrix to the same data (see Methods). Figures <figr fid="F1">1(d)&#8211;(f)</figr> plot the expression profiles of the genes in this new space. We can see that the second and the third component separate all six classes in Figure <figr fid="F1">1(e)</figr> correctly. The description of the six class separating regions, whose centers are the dotted lines in Figure <figr fid="F1">1(e)</figr>, is in Table <tblr tid="T1">1</tblr> (e.g., the genes around the line PC2 = <m:math name="1471-2105-8-29-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>3</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabiodaZaWcbeaaaaa@2DBB@</m:annotation></m:semantics></m:math>&#183;PC3 &lt; 0 are expected to be PM up-regulated). A convenient common property of this transformation, and the one in <it>TransChisq</it>, is that the information carried by each component is explicit, and hence the region in the new space corresponding to each class can be clearly determined.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>The six expression patterns and their separating regions described by PC2 and PC3</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="center">
                        <p>Class index</p>
                     </c>
                     <c ca="left">
                        <p>Expression patterns</p>
                     </c>
                     <c ca="left">
                        <p>Center of separating regions described by PC2 and PC3</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>PM > (QC &#8776; RC)</p>
                     </c>
                     <c ca="left">
                        <p>PC2 = <m:math name="1471-2105-8-29-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>3</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabiodaZaWcbeaaaaa@2DBB@</m:annotation></m:semantics></m:math>&#183;PC3 &lt; 0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>PM &lt; (QC &#8776; RC)</p>
                     </c>
                     <c ca="left">
                        <p>PC2 = <m:math name="1471-2105-8-29-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>3</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabiodaZaWcbeaaaaa@2DBB@</m:annotation></m:semantics></m:math>&#183;PC3 > 0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>QC > (PM &#8776; RC)</p>
                     </c>
                     <c ca="left">
                        <p>PC2 = -<m:math name="1471-2105-8-29-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>3</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabiodaZaWcbeaaaaa@2DBB@</m:annotation></m:semantics></m:math>&#183;PC3 > 0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>QC &lt; (PM &#8776; RC)</p>
                     </c>
                     <c ca="left">
                        <p>PC2 = -<m:math name="1471-2105-8-29-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>3</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabiodaZaWcbeaaaaa@2DBB@</m:annotation></m:semantics></m:math>&#183;PC3 &lt; 0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>RC > (PM &#8776; QC)</p>
                     </c>
                     <c ca="left">
                        <p>PC2 = 0; PC3 > 0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>RC &lt; (PM &#8776; QC)</p>
                     </c>
                     <c ca="left">
                        <p>PC2 = 0; PC3 &lt; 0</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>For comparison, we performed a traditional PCA analysis to the same data. Figures <figr fid="F1">1(g)&#8211;(i)</figr> plot the expression profiles of the genes in the principal component space. We can see that the direct application of the PCA can separate the two dominating expression patterns. But it fails to recognize the other patterns, even when exhausting all principal components. The poor performance of PCA could be attributed to the use of empirical sample covariance matrix in determining the principal components. In the maize dataset, about 94% genes are RC up- or down-regulated genes, which cause most of the variance. The principal components, determined by this sample covariance matrix thus largely capture the two dominating clusters, yet miss the meaningful class information for the other four small groups.</p>
            <p>This example demonstrates the advantage of the proposed new data transformations over the traditional PCA in keeping class information intact.</p>
         </sec>
         <sec>
            <st>
               <p>Simulation study</p>
            </st>
            <p>We applied <it>TransChisq </it>to a simulation dataset to evaluate its performance. For comparison purposes, other modified <it>K</it>-means algorithms, i.e. <it>PCAChisq</it>, <it>PoissonC</it>, <it>PearsonC</it>, and <it>Eucli </it>were also applied to the same dataset.</p>
            <p>The simulation dataset consists of 46 vectors of dimension 5 and the components are independently generated from different Normal distributions. The mean (<it>&#956;</it>) and variance (<it>&#963;</it><sup>2</sup>) of the Normal distributions are constrained by <it>&#963;</it><sup>2 </sup>= 3<it>&#956; </it>and described in Table <tblr tid="T2">2</tblr>. Based on the Normal distributions they are generated from, the 46 vectors are put into six groups, i.e., A, B, C, D, E, and F, whose sizes are 3, 6, 6, 9, 7, and 15 respectively. The motivation and guideline on choosing the various parameters related to this simulation datasets are presented in Additional file <supplr sid="S1">1</supplr>. Genes with a similar expression shape are considered to be in the same group. Although the expression magnitude in the dataset is not a critical factor for determining the gene clusters, its information is useful and should be taken into account when comparing the profile shapes.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p><b>One set of orthonormal eigenvectors</b>. This PDF file contains one set of orthonormal eigenvectors referred in the Method section.</p>
               </text>
               <file name="1471-2105-8-29-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Five dimensional simulation dataset with Normal distributions (<it>&#963;</it><sup>2 </sup>= 3<it>&#956;</it>).</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Group ID</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5" ca="left">
                        <p>Mean parameters of the Normal distributions (<it>&#956;</it>)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Group A</p>
                     </c>
                     <c ca="left">
                        <p>a1 ~ a3</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>15</p>
                     </c>
                     <c ca="left">
                        <p>150</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Group B</p>
                     </c>
                     <c ca="left">
                        <p>b1 ~ b6</p>
                     </c>
                     <c ca="left">
                        <p>15</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>150</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Group C</p>
                     </c>
                     <c ca="left">
                        <p>c1 ~ c4</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>30</p>
                     </c>
                     <c ca="left">
                        <p>30</p>
                     </c>
                     <c ca="left">
                        <p>60</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>c5 ~ c6</p>
                     </c>
                     <c ca="left">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>300</p>
                     </c>
                     <c ca="left">
                        <p>300</p>
                     </c>
                     <c ca="left">
                        <p>600</p>
                     </c>
                     <c ca="left">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Group D</p>
                     </c>
                     <c ca="left">
                        <p>d1 ~ d7</p>
                     </c>
                     <c ca="left">
                        <p>200</p>
                     </c>
                     <c ca="left">
                        <p>70</p>
                     </c>
                     <c ca="left">
                        <p>70</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>d8 ~ d9</p>
                     </c>
                     <c ca="left">
                        <p>2000</p>
                     </c>
                     <c ca="left">
                        <p>700</p>
                     </c>
                     <c ca="left">
                        <p>700</p>
                     </c>
                     <c ca="left">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Group E</p>
                     </c>
                     <c ca="left">
                        <p>e1 ~ e5</p>
                     </c>
                     <c ca="left">
                        <p>210</p>
                     </c>
                     <c ca="left">
                        <p>120</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>e6 ~ e7</p>
                     </c>
                     <c ca="left">
                        <p>2100</p>
                     </c>
                     <c ca="left">
                        <p>1200</p>
                     </c>
                     <c ca="left">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Group F</p>
                     </c>
                     <c ca="left">
                        <p>f1 ~ f3</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>f4 ~ f6</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>75</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>F7 ~ f9</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>f10 ~ f11</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>500</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>f12 ~ f13</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>750</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>f14 ~ f15</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>1000</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>The clustering results from different methods are shown in Figure <figr fid="F2">2</figr>. The horizontal axis represents the index of the 46 genes that belong to six groups (designated A, B, C, D, E and F, and marked at the top of the figure). The vertical axis represents the index of the cluster to which each gene has been assigned by a particular algorithm. Only <it>TransChisq </it>correctly categorized the genes into six groups. <it>PCAChisq</it>, <it>PoissonC</it>, and <it>PearsonC </it>mixed up group A and group B. <it>Eucli </it>clustered genes mainly by the magnitude of the gene expression values rather than the changes of the profile shapes. To reduce the effects from the magnitude, we further applied <it>Eucli </it>to the rescaled data. The rescaling was performed in a way so that the sum of the components within each vector was set the same. The clustering result of <it>Eucli </it>on the rescaled data (Figure <figr fid="F2">2(f)</figr>) is better, but not perfect.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Graphs of clustering results for the simulation data</p>
               </caption>
               <text>
                  <p><b>Graphs of clustering results for the simulation data</b>. Horizontal axis represents the index of the 46 genes which belong to six groups (designated A, B, C, D, E and F, and marked at the top of the figure); vertical axis represents the index of the cluster that each gene has been assigned to by each algorithm.</p>
               </text>
               <graphic file="1471-2105-8-29-2"/>
            </fig>
            <p>We performed an additional 100 replications of the above simulation. <it>TransChisq</it>, <it>PCAChisq </it>and <it>PoissonC </it>correctly clustered 75, 37 and 43 of the 100 replicate simulation datasets, while <it>PearsonC</it>, <it>Eucli </it>and <it>Eucli </it>on rescaled data cannot generate correct clusters. We also tried <it>PCAChisq </it>on different combinations of principal components to optimize the clustering results. These different combinations, however, are not helpful to identify all the six groups.</p>
            <p>This study evaluates the performance of <it>TransChisq </it>on the normally distributed data with Poisson-like property: variance increases with mean. The success of this application sheds a light on applying <it>TransChisq </it>to a microarray dataset in addition to the SAGE data.</p>
         </sec>
         <sec>
            <st>
               <p>Experimental mouse retinal SAGE data</p>
            </st>
            <p>For further validation we applied <it>TransChisq</it>, <it>PCAChisq</it>, <it>PoissonC</it>, <it>PearsonC</it>, <it>Eucli </it>and <it>SRC </it>(the <it>K</it>-means algorithm using Spearman Rank correlation as the similarity measure) to a set of mouse retinal SAGE libraries. The raw mouse retinal data consists of 10 SAGE libraries (38818 unique tags with tag counts &#8805; 2) from developing retina taken at 2-day intervals. The samples range from embryonic, to postnatal, to adult <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Among the 38818 tags, 1467 tags that have counts greater than or equal to 20 in at least one of the 10 libraries were selected. The purpose of this selection is to exclude the genes with uniform low expression. To be more effective in comparing the clustering algorithms, a subset of 153 SAGE tags with known biological functions were selected. These 153 tags fall into 5 functional groups: 125 of these genes are developmental genes that can be further categorized into four classes by their activities at different developmental stages; the other 28 genes are not relevant to the mouse retina development (see Table <tblr tid="T3">3</tblr>). The average expression profile for each of the five clusters is shown in Figure <figr fid="F3">3</figr>.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Functional categorization of the 153 mouse retinal tags (125 developmental genes; 28 non-developmental genes).</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5" ca="center">
                        <p>Function Groups</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Early I</p>
                     </c>
                     <c ca="center">
                        <p>Early II</p>
                     </c>
                     <c ca="center">
                        <p>Late I</p>
                     </c>
                     <c ca="center">
                        <p>Late II</p>
                     </c>
                     <c ca="center">
                        <p>Non-dev.</p>
                     </c>
                     <c ca="center">
                        <p>Total</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of tags</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                     <c ca="center">
                        <p>153</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Average expression profiles for the 153 SAGE tags</p>
               </caption>
               <text>
                  <p><b>Average expression profiles for the 153 SAGE tags</b>. These 153 tags fall into 5 clusters: 125 of these genes are developmental genes and can be further categorized into four groups (Early I, Early II, Late I and Late II) by their expressions at different developmental stages; the other 28 genes are not relevant to the mouse retina development.</p>
               </text>
               <graphic file="1471-2105-8-29-3"/>
            </fig>
            <p><it>TransChisq</it>, <it>PCAChisq</it>, <it>PoissonC</it>, <it>PearsonC</it>, <it>Eucli </it>and <it>SRC </it>were applied to group these 153 SAGE tags into five clusters. Here we assumed that the number of the clusters, <it>K</it>, is known. A study to evaluate the performance of different measures in determining <it>K </it>when it is unknown can be found in a later section of this paper. The clustering results showed that <it>TransChisq </it>and <it>PCAChisq </it>outperform others (Table <tblr tid="T4">4</tblr>): 12, 12, 22, 26 and 38 of the 153 tags are incorrectly clustered by <it>TransChisq</it>, <it>PCAChisq</it>, <it>PoissonC</it>, <it>PearsonC </it>and <it>Eucli </it>on rescaled data respectively. For the results from <it>Eucli </it>on original data, as the correspondence between the predicted clusters and the true clusters is unclear, we cannot report the number of incorrectly clustered tags. We also evaluated the quality of the clustering results against an external criterion, the adjusted Rand Index <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. The adjusted Rand Index assesses the degree of agreement between two partitions of the same set of objects. We compared the clustering results from each algorithm with the true categorizations, and calculated the adjusted Rand Index accordingly. The adjusted Rand Index varies between 1 (when the two partitions are identical) and 0 (when the partitions or the resulted clusters are random). A higher adjusted Rand Index represents the higher correspondence between the two partitions. From Table <tblr tid="T4">4</tblr>, we can see that the adjusted Rand Index results confirm that <it>TransChisq </it>and <it>PCAChisq </it>perform similarly and have clear advantages over other methods.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Comparison of the algorithms on the 153 SAGE tags</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Algorithm</p>
                     </c>
                     <c ca="center">
                        <p>Number of tags in incorrect clusters</p>
                     </c>
                     <c ca="center">
                        <p>% of tags in incorrect clusters</p>
                     </c>
                     <c ca="center">
                        <p>Adjusted Rand Index</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>TransChisq</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>7.8</p>
                     </c>
                     <c ca="center">
                        <p>0.822</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>PCAChisq</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>7.8</p>
                     </c>
                     <c ca="center">
                        <p>0.825</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>PoissonC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>14.4</p>
                     </c>
                     <c ca="center">
                        <p>0.725</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>PearsonC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>26</p>
                     </c>
                     <c ca="center">
                        <p>17.0</p>
                     </c>
                     <c ca="center">
                        <p>0.664</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>Eucli</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>NA</p>
                     </c>
                     <c ca="center">
                        <p>NA</p>
                     </c>
                     <c ca="center">
                        <p>0.003</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p><it>Eucli </it>on rescaled data</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>24.8</p>
                     </c>
                     <c ca="center">
                        <p>0.675</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>SRC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>NA</p>
                     </c>
                     <c ca="center">
                        <p>NA</p>
                     </c>
                     <c ca="center">
                        <p>0.347</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Microarray yeast sporulation gene expression data</p>
            </st>
            <p>To further demonstrate how effective <it>TransChisq </it>is in clustering genes with characterized patterns in a microarray analysis, we applied <it>TransChisq </it>to a microarray yeast sporulation dataset <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Chu et al. measured gene expressions in the budding yeast <it>Saccharomyces cerevisiae </it>at seven time points during sporulation using spotted microarrays, and identified seven distinct temporal patterns of induction <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. 39 representative genes were used to define the model expression profile for each pattern. Based on their properties, the seven patterns are designated as Metabolic, Early I, Early II, Early-Mid, Middle, Mid-Late and Late. The average expression profiles for these seven patterns are presented in Figure <figr fid="F4">4</figr>. The genes in Early I, Early II, Middle, Mid-Late and Late initiates induction of expression at 0.5 h, 2 h, 5 h, 7 h and 9 h, respectively, and sustains expression through the rest of the time course. The expression of metabolic genes is also induced at 0.5 h as in Early I, but decays afterwards. The expression of genes in Early-Mid is induced not only at the 0.5 h and 2 h as in Early genes, but also at 5 h and 7 h, as in the Middle and Mid-Late genes. This data structure made it difficult to separate the Early-Mid genes from others. The direct clustering analyses using <it>PearsonC </it>or <it>Eucli </it>were not successful.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Expression patterns of the 39 representative genes in the yeast sporulation data</p>
               </caption>
               <text>
                  <p><b>Expression patterns of the 39 representative genes in the yeast sporulation data</b>. These 39 representative genes represent seven expression patterns in the yeast sporulation data. The figure shows the average expression profile for each pattern.</p>
               </text>
               <graphic file="1471-2105-8-29-4"/>
            </fig>
            <p>Prior to analyzing the data we substituted the expression ratios that were below zero with zeros as in Figure <figr fid="F5">5(a)</figr>. This truncation of negative values simplifies the expression patterns of the 39 representative genes with the key properties in each pattern being intact. The clustering results are summarized in Table <tblr tid="T5">5</tblr>. We can see that <it>TransChisq </it>outperforms other methods: 3, 7, 8, 13, 14 and 17 of the 39 genes are incorrectly clustered by <it>TransChisq</it>, <it>PoissonC</it>, <it>Eucli</it>, <it>PearsonC</it>, <it>PCAChisq </it>and <it>Eucli </it>on rescaled data respectively. <it>TransChisq </it>also shows the best adjusted Rand Index. It is interesting to see that the performance of <it>Eucli </it>on rescaled data is worse than it is on original data. This suggests that the magnitude information should be critical and cannot be ignored in determining the seven classes. As we have discussed, all methods fail to discern the genes in Early-Mid from the genes in Early I, Early II, Middle, Mid-Late and Late (Figure <figr fid="F5">5(b)&#8211;(f)</figr>). Furthermore, <it>PCAChisq </it>and <it>PoissonC </it>mixed up two different patterns from Metabolic and Early I because of their similar induction time at 0.5 h (Figure <figr fid="F5">5(c)</figr> and <figr fid="F5">5(d)</figr>). <it>PearsonC </it>even splits the Metabolic group further into two separate clusters (Figure <figr fid="F5">5(e)</figr>).</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Comparison of the algorithms on the 39 yeast sporulation genes</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Algorithm</p>
                     </c>
                     <c ca="center">
                        <p>Number of genes in incorrect clusters</p>
                     </c>
                     <c ca="center">
                        <p>% of genes in incorrect clusters</p>
                     </c>
                     <c ca="center">
                        <p>Adjusted Rand Index</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>TransChisq</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>7.7</p>
                     </c>
                     <c ca="center">
                        <p>0.830</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>PCAChisq</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>35.9</p>
                     </c>
                     <c ca="center">
                        <p>0.527</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>PoissonC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>18.0</p>
                     </c>
                     <c ca="center">
                        <p>0.675</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>PearsonC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>33.3</p>
                     </c>
                     <c ca="center">
                        <p>0.483</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>Eucli</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>20.5</p>
                     </c>
                     <c ca="center">
                        <p>0.600</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p><it>Eucli </it>on rescaled data</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>43.6</p>
                     </c>
                     <c ca="center">
                        <p>0.483</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>SRC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>NA</p>
                     </c>
                     <c ca="center">
                        <p>NA</p>
                     </c>
                     <c ca="center">
                        <p>0.325</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Clustering results for the yeast sporulation data</p>
               </caption>
               <text>
                  <p><b>Clustering results for the yeast sporulation data</b>. (a) Original expression profiles of the 39 representative genes from 7 functional groups, (b)-(f) Expression profiles of the 7 clusters produced by different clustering algorithms. The x-axis represents different time points of 0h, 0.5 h, 2 h, 5 h, 7 h, 9 h, 11.5 h. The y-axis represents the normalized log-ratio expression levels.</p>
               </text>
               <graphic file="1471-2105-8-29-5"/>
            </fig>
            <p>For <it>PCAChisq</it>, we tried different combinations of principal components (PCs) to optimize the clustering results. The best result can be reached when the first 5 PCs were used: 3 out of the 39 genes were incorrectly grouped. This optimal result is the same as the one from <it>TransChisq</it>. However, in practice, it is not feasible to exhaust all possible combinations of PCs to search for the optimal clustering result.</p>
         </sec>
         <sec>
            <st>
               <p>Estimating the number of clusters using Gap Statistics</p>
            </st>
            <p>An unsolved issue in <it>K</it>-means clustering analysis is how to estimate <it>K</it>, the number of clusters. In the recent literature the Gap statistic was found useful <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>. The technique of the Gap statistic uses the output of any clustering algorithm to compare the 'between-to-total variance (<it>R</it><sup>2</sup>)' with that expected under an appropriate reference null distribution. A high <it>R</it><sup>2 </sup>value represents high variability between clusters and high coherence within clusters. Below we sketch how to calculate the Gap statistic: Let <it>D</it><sub><it>k </it></sub>be the <it>R</it><sup>2 </sup>measure for the clustering output when the number of clusters is <it>k</it>. To derive the reference expected value of <it>D</it><sub><it>k</it></sub>, the elements within each row of original data are permuted to produce the new matrices with random profile patterns. Assume <it>B </it>such matrices are obtained. Then for each matrix, a new <it>R</it><sup>2 </sup>is calculated based on the original clustering output and the pre-selected similarity measure. The average of these <it>R</it><sup>2</sup>'s, denoted by <m:math name="1471-2105-8-29-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mrow><m:mover accent="true"><m:mi>D</m:mi><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:mi>k</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdseaebaadaWgaaWcbaGaem4AaSgabeaaaaa@2F59@</m:annotation></m:semantics></m:math>, serves as the expectation of <it>D</it><sub><it>k</it></sub>. With <it>D</it><sub><it>k</it></sub>and <m:math name="1471-2105-8-29-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mrow><m:mover accent="true"><m:mi>D</m:mi><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:mi>k</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdseaebaadaWgaaWcbaGaem4AaSgabeaaaaa@2F59@</m:annotation></m:semantics></m:math>, the Gap function is defined by</p>
            <p>Gap(<it>k</it>)= <it>D</it><sub><it>k </it></sub>- <m:math name="1471-2105-8-29-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mrow><m:mover accent="true"><m:mi>D</m:mi><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:mi>k</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdseaebaadaWgaaWcbaGaem4AaSgabeaaaaa@2F59@</m:annotation></m:semantics></m:math>.</p>
            <p>The value of <it>k </it>with the largest Gap value will be selected as the optimal number of clusters in that at this <it>k</it>, the observed between-to-total variance <it>R</it><sup>2 </sup>is the most ahead of expected.</p>
            <p>For comparison, we used different measures including <it>TransChisq</it>, <it>PCAChisq</it>, <it>PoissonC</it>, <it>Pearson</it>, <it>Eucli</it>, and <it>SRC </it>to calculate the Gap statistics for each of the two experimental datasets: microarray yeast sporulation data and mouse retinal SAGE data. For the microarray yeast sporulation data, the Gap values from different measures over different number of clusters are shown in Figure <figr fid="F6">6</figr>. We can see that <it>TransChisq </it>shows the maximum Gap value at <it>k </it>= 7. In other words, <it>TransChisq </it>finds an optimal number of 7 clusters, which agrees with the known functional categorization of the genes. Other measures all produce incorrect estimates of the number of clusters on the same dataset. In a similar analysis of the SAGE data, <it>TransChisq</it>, <it>PCAChisq </it>and <it>PoissonC </it>provide a correct estimate on the number of clusters, 5. <it>PearsonC</it>, <it>Eucli </it>and <it>SRC </it>give an incorrect estimate of 3, 14 and 2 respectively (the gap function curves are not shown here). This study shows that when the number of clusters, <it>K</it>, is unknown, the Gap Statistics can be used to estimate <it>K</it>, and <it>TransChisq </it>is favorable over others on estimating the true number of clusters in both experimental datasets.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Gap statistic results on the 39 yeast sporulation genes</p>
               </caption>
               <text>
                  <p><b>Gap statistic results on the 39 yeast sporulation genes</b>. The x-axis represents the number of clusters and the y-axis represents the gap statistics over different number of clusters. In each sub-figure, the x-axis value associated with the largest gap statistic is the optimal selection of the number of clusters under the used similarity measure. From the shown gap curves, only <it>TransChisq </it>provides a correct estimate on the true number of clusters, 7.</p>
               </text>
               <graphic file="1471-2105-8-29-6"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussions and conclusions</p>
         </st>
         <p>In this study, we proposed a method, <it>TransChisq</it>, to group genes with similar expression shapes. The expression magnitude was considered when measuring the shape similarity. Results from applications to a variety of datasets demonstrated <it>TransChisq</it>'s clear advantages over other methods. Furthermore, with the gap statistics, <it>TransChisq </it>was also found to be effective in estimating the number of clusters. Regarding the computational efficiency, <it>TransChisq</it>, <it>PCAChisq </it>and <it>PoissonC </it>have similar costs but usually run a few times (2 to 5 times) slower than the <it>PearsonC </it>and <it>Eucli</it>.</p>
         <p>We have embedded different measures in the <it>K</it>-means clustering procedure to reveal the important gene expression patterns. In addition to <it>K</it>-means, our new measure can also be implemented in other clustering methods, e.g., hierarchical clustering <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, to perform the analysis. In a hierarchical clustering procedure, the distance of any two gene expression profiles can be defined using measure (4) by assuming that two genes form a cluster. A study on the performance of different measures in a hierarchical clustering procedure is in Additional file <supplr sid="S2">2</supplr>. Our new method also outperforms others when implemented in the hierarchical clustering algorithm.</p>
         <suppl id="S2">
            <title>
               <p>Additional File 2</p>
            </title>
            <text>
               <p><b>Proof of the properties of the estimators under the restricted normal model</b>. This PDF file shows that the <m:math name="1471-2105-8-29-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#952;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=H7aXbGaayPadaaaaa@2F2B@</m:annotation></m:semantics></m:math><sub><it>i </it></sub>in formula (2) is an unbiased estimator of <it>&#952;</it><sub><it>i </it></sub>and <m:math name="1471-2105-8-29-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=T7aSbGaayPadaaaaa@2F29@</m:annotation></m:semantics></m:math>(<it>t</it>) in formula (2) is a consistent estimator of <it>&#955;</it>(<it>t</it>) under the proposed restricted normal model.</p>
            </text>
            <file name="1471-2105-8-29-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p>We view different measures as complementary rather than competing in that each has its advantages. In general, <it>TransChisq </it>would be effective when it is necessary to consider the magnitude information in measuring the shape similarity. In clustering analyses of SAGE and microarray data, very often the magnitude information should be taken into account, whereas the shape could be a more critical factor to determine the gene relationship.</p>
         <p>Although the proposed method is very promising, it does require further study on possible data transformation schemes when the original data show a more complex structure, or when the clustering purpose is different. We suggest our method could provide new insights to the applications of different data transformations in clustering analysis of gene expression data.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>The underlying probability model of our new measures was adopted from the work of Cai et al. <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, where two Poisson based measures were proposed for clustering analysis of SAGE data, or more generally, Poisson distributed data. A brief review on this work is presented below, followed by a detailed description of the newly proposed measures.</p>
         <sec>
            <st>
               <p>PoissonC and PoissonL for clustering analysis of SAGE data</p>
            </st>
            <p>SAGE is one of the effective techniques for comprehensive gene expression profiling. The result of a SAGE experiment, called a SAGE library, is a list of counts of sequenced tags isolated from mRNAs that are randomly sampled from a cell or tissue. As discussed in Man et al. <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, the sampling process for tag extraction is approximately equivalent to randomly taking a bag of colored balls from a big box. This randomness leads to an approximate multinomial distribution for the number of transcripts of different types. Moreover, due to the vast amount of varied types of transcripts in a cell or tissue, the selection probability of a particular type of transcript at each draw should be very small. This suggests that the tag counts of sampled transcripts of each type are approximately Poisson distributed. <it>PoissonC </it>and <it>PoissonL </it>were developed under this context <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The method is summarized below.</p>
            <p>Let <it>Y</it><sub><it>i</it></sub>(<it>t</it>) be the count of tag <it>i </it>in library <it>t</it>, and <b><it>Y</it></b><sub><it>i </it></sub>= (<it>Y</it><sub><it>i</it></sub>(1),..., <it>Y</it><sub><it>i</it></sub>(<it>T</it>)) be the vector of counts of tag <it>i </it>over a total of <it>T </it>libraries. <it>Y</it><sub><it>i</it></sub>(<it>t</it>) is assumed to be Poisson distributed with mean <it>&#947;</it><sub><it>it</it></sub>. To model the magnitude and shape of the expression profile separately, Cai et al. <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> further parameterized the Poisson rate as <it>&#947;</it><sub><it>it </it></sub>= <it>&#955;</it><sub><it>i</it></sub>(<it>t</it>)<it>&#952;</it><sub><it>i</it></sub>, where <it>&#952;</it><sub><it>i </it></sub>is the expected sum of counts of tag <it>i </it>over all libraries, and <it>&#955;</it><sub><it>i </it></sub>(<it>t</it>) is the contribution of tag <it>i </it>in library <it>t </it>to the sum <it>&#952;</it><sub><it>i </it></sub>expressed in percentage. The sum of <it>&#955;</it><sub><it>i</it></sub>(<it>t</it>) over all libraries equals to 1. So <it>&#955;</it><sub><it>i</it></sub>(<it>t</it>)<it>&#952;</it><sub><it>i </it></sub>redistributes the tag counts according to the expression shape parameter (<it>&#955;</it><sub><it>i</it></sub>(<it>t</it>)'s) but keeps the sum of counts over libraries constant. The genes with similar <it>&#955;</it><sub><it>i</it></sub>(<it>t</it>)'s over <it>t </it>are considered to be in the same cluster.</p>
            <p>For a cluster consisting of tags 1,2,..., <it>m </it>with the common shape parameter <it>&#955; </it>= (<it>&#955;</it>(1),..., <it>&#955;</it>(<it>T</it>)), the joint likelihood function for <b><it>Y</it></b><sub>1</sub>, <b><it>Y</it></b><sub>2</sub>,...,<b><it>Y</it></b><sub><it>m </it></sub>is</p>
            <p>
               <m:math name="1471-2105-8-29-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>L</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>&#955;</m:mi>
                        <m:mo>,</m:mo>
                        <m:mi>&#952;</m:mi>
                        <m:mo>|</m:mo>
                        <m:mi>Y</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>&#8733;</m:mo>
                        <m:mi>f</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:msub>
                           <m:mi>Y</m:mi>
                           <m:mn>1</m:mn>
                        </m:msub>
                        <m:mo>,</m:mo>
                        <m:mn>...</m:mn>
                        <m:mo>,</m:mo>
                        <m:msub>
                           <m:mi>Y</m:mi>
                           <m:mi>m</m:mi>
                        </m:msub>
                        <m:mo>|</m:mo>
                        <m:mi>&#955;</m:mi>
                        <m:mo>,</m:mo>
                        <m:msub>
                           <m:mi>&#952;</m:mi>
                           <m:mn>1</m:mn>
                        </m:msub>
                        <m:mo>,</m:mo>
                        <m:mn>...</m:mn>
                        <m:mo>,</m:mo>
                        <m:msub>
                           <m:mi>&#952;</m:mi>
                           <m:mi>m</m:mi>
                        </m:msub>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8719;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mi>m</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8719;</m:mo>
                                    <m:mrow>
                                       <m:mi>t</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>T</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:mi>exp</m:mi>
                                          <m:mo>&#8289;</m:mo>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mi>&#955;</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>t</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:msub>
                                             <m:mi>&#952;</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:msup>
                                             <m:mrow>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>&#955;</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>t</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                                <m:msub>
                                                   <m:mi>&#952;</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>Y</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>t</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                          </m:msup>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>Y</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>t</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>!</m:mo>
                                       </m:mrow>
                                    </m:mfrac>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mstyle>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGmbatcqGGOaakiiWacqWF7oaBcqGGSaalcqWF4oqCcqGG8baFieWacqGFzbqwcqGGPaqkcqGHDisTcqWGMbGzcqGGOaakcqGFzbqwdaWgaaWcbaacbeGae0xmaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab+LfaznaaBaaaleaacqGFTbqBaeqaaOGaeiiFaWNae83UdWMaeiilaWccciGaeWhUde3aaSbaaSqaaiabigdaXaqabaGccqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqaF4oqCdaWgaaWcbaGaemyBa0gabeaakiabcMcaPiabg2da9maarahabaWaaebCaeaadaWcaaqaaiGbcwgaLjabcIha4jabcchaWjabcIcaOiabgkHiTiab8T7aSjabcIcaOiabdsha0jabcMcaPiab8H7aXnaaBaaaleaacqWGPbqAaeqaaOGaeiykaKIaeiikaGIaeW3UdWMaeiikaGIaemiDaqNaeiykaKIaeWhUde3aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkdaahaaWcbeqaaiabdMfaznaaBaaameaacqWGPbqAaeqaaSGaeiikaGIaemiDaqNaeiykaKcaaaGcbaGaemywaK1aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqWG0baDcqGGPaqkcqGGHaqiaaaaleaacqWG0baDcqGH9aqpcqaIXaqmaeaacqWGubava0Gaey4dIunaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd2gaTbqdcqGHpis1aOGaaCzcaiaaxMaadaqadaqaaiabigdaXaGaayjkaiaawMcaaaaa@8B55@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The maximum likelihood estimates of <it>&#955; </it>and <it>&#952;</it><sub>1</sub>,..., <it>&#952;</it><sub><it>m </it></sub>are</p>
            <p>
               <m:math name="1471-2105-8-29-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:msub>
                           <m:mrow>
                              <m:mover accent="true">
                                 <m:mi>&#952;</m:mi>
                                 <m:mo stretchy="true">_</m:mo>
                              </m:mover>
                           </m:mrow>
                           <m:mi>i</m:mi>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munder>
                              <m:mo>&#8721;</m:mo>
                              <m:mi>i</m:mi>
                           </m:munder>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>Y</m:mi>
                                 <m:mi>i</m:mi>
                              </m:msub>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>t</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                        </m:mstyle>
                        <m:mo>,</m:mo>
                        <m:mtext>&#160;and&#160;</m:mtext>
                        <m:mrow>
                           <m:mrow>
                              <m:mover accent="true">
                                 <m:mi>&#955;</m:mi>
                                 <m:mo>^</m:mo>
                              </m:mover>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>t</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>=</m:mo>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>m</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>Y</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>t</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                           <m:mo>/</m:mo>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>m</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mrow>
                                          <m:mover accent="true">
                                             <m:mi>&#952;</m:mi>
                                             <m:mo stretchy="true">_</m:mo>
                                          </m:mover>
                                       </m:mrow>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mrow>
                        <m:mo>=</m:mo>
                        <m:mrow>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>m</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>Y</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>t</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                           <m:mo>/</m:mo>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>m</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:munder>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mi>t</m:mi>
                                       </m:munder>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>Y</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>t</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mrow>
                        <m:mo>.</m:mo>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>2</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=H7aXbGaayPadaWaaSbaaSqaaiabdMgaPbqabaGccqGH9aqpdaaeqbqaaiabdMfaznaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaemiDaqNaeiykaKcaleaacqWGPbqAaeqaniabggHiLdGccqGGSaalcqqGGaaicqqGHbqycqqGUbGBcqqGKbazcqqGGaaidaWcgaqaaiqb=T7aSzaajaGaeiikaGIaemiDaqNaeiykaKIaeyypa0ZaaabCaeaacqWGzbqwdaWgaaWcbaGaemyAaKgabeaakiabcIcaOiabdsha0jabcMcaPaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemyBa0ganiabggHiLdaakeaadaaeWbqaamaaHaaabaGae8hUdehacaGLcmaadaWgaaWcbaGaemyAaKgabeaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGTbqBa0GaeyyeIuoaaaGccqGH9aqpdaWcgaqaamaaqahabaGaemywaK1aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqWG0baDcqGGPaqkaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd2gaTbqdcqGHris5aaGcbaWaaabCaeaadaaeqbqaaiabdMfaznaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaemiDaqNaeiykaKcaleaacqWG0baDaeqaniabggHiLdaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGTbqBa0GaeyyeIuoaaaGccqGGUaGlcaWLjaGaaCzcamaabmaabaGaeGOmaidacaGLOaGaayzkaaaaaa@82AF@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Formula (2) forms the basis of the following two measures for evaluating how well a particular tag fits in a cluster. One natural measure is to use the log-likelihood function: log <it>f</it>(<b><it>Y</it></b><sub><it>i</it></sub>|<b><it>&#955;</it></b>, <it>&#952;</it><sub><it>i</it></sub>). The larger the log-likelihood is, the more likely the observed counts are generated from the expected Poisson distributions. So for a cluster consisting of tags 1,2,..., <it>m</it>, a likelihood based measure is defined as</p>
            <p>
               <m:math name="1471-2105-8-29-i7" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>L</m:mi>
                        <m:mo>=</m:mo>
                        <m:mo>&#8722;</m:mo>
                        <m:mi>log</m:mi>
                        <m:mo>&#8289;</m:mo>
                        <m:mi>f</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:msub>
                           <m:mi>Y</m:mi>
                           <m:mn>1</m:mn>
                        </m:msub>
                        <m:mo>,</m:mo>
                        <m:mn>...</m:mn>
                        <m:mo>,</m:mo>
                        <m:msub>
                           <m:mi>Y</m:mi>
                           <m:mi>m</m:mi>
                        </m:msub>
                        <m:mo>|</m:mo>
                        <m:mover accent="true">
                           <m:mi>&#955;</m:mi>
                           <m:mo stretchy="true">^</m:mo>
                        </m:mover>
                        <m:mo>,</m:mo>
                        <m:mover accent="true">
                           <m:mi>&#952;</m:mi>
                           <m:mo stretchy="true">^</m:mo>
                        </m:mover>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:msubsup>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mi>m</m:mi>
                           </m:msubsup>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:msubsup>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>t</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>T</m:mi>
                                 </m:msubsup>
                                 <m:mrow>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mover accent="true">
                                       <m:mi>&#955;</m:mi>
                                       <m:mo stretchy="true">^</m:mo>
                                    </m:mover>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>t</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:msub>
                                       <m:mrow>
                                          <m:mover accent="true">
                                             <m:mi>&#952;</m:mi>
                                             <m:mo stretchy="true">^</m:mo>
                                          </m:mover>
                                       </m:mrow>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo>&#8722;</m:mo>
                                    <m:msub>
                                       <m:mi>Y</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>t</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mi>log</m:mi>
                                    <m:mo>&#8289;</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mover accent="true">
                                       <m:mi>&#955;</m:mi>
                                       <m:mo stretchy="true">^</m:mo>
                                    </m:mover>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>t</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:msub>
                                       <m:mrow>
                                          <m:mover accent="true">
                                             <m:mi>&#952;</m:mi>
                                             <m:mo stretchy="true">^</m:mo>
                                          </m:mover>
                                       </m:mrow>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>+</m:mo>
                                    <m:mi>log</m:mi>
                                    <m:mo>&#8289;</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:msub>
                                       <m:mi>Y</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>t</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>!</m:mo>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mstyle>
                        <m:mo>.</m:mo>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>3</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGmbatcqGH9aqpcqGHsislcyGGSbaBcqGGVbWBcqGGNbWzcqWGMbGzcqGGOaakieWacqWFzbqwdaWgaaWcbaacbeGae4xmaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=LfaznaaBaaaleaacqWFTbqBaeqaaOGaeiiFaW3aaecaaeaaiiWacqqF7oaBaiaawkWaaiabcYcaSmaaHaaabaGae0hUdehacaGLcmaacqGGPaqkcqGH9aqpdaaeWaqaamaaqadabaGaeiikaGYaaecaaeaaiiGacqaF7oaBaiaawkWaaiabcIcaOiabdsha0jabcMcaPmaaHaaabaGaeWhUdehacaGLcmaadaWgaaWcbaGaemyAaKgabeaakiabgkHiTiabdMfaznaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaemiDaqNaeiykaKIagiiBaWMaei4Ba8Maei4zaCMaeiikaGYaaecaaeaacqaF7oaBaiaawkWaaiabcIcaOiabdsha0jabcMcaPmaaHaaabaGaeWhUdehacaGLcmaadaWgaaWcbaGaemyAaKgabeaakiabcMcaPiabgUcaRiGbcYgaSjabc+gaVjabcEgaNjabcIcaOiabdMfaznaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaemiDaqNaeiykaKIaeiyiaeIaeiykaKIaeiykaKcaleaacqWG0baDcqGH9aqpcqaIXaqmaeaacqWGubava0GaeyyeIuoaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd2gaTbqdcqGHris5aOGaeiOla4IaaCzcaiaaxMaadaqadaqaaiabiodaZaGaayjkaiaawMcaaaaa@89D1@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The other measure is based on the Chi-square statistic, a well known statistic for evaluating the deviation of the observations from the expected values. It is defined as</p>
            <p>
               <m:math name="1471-2105-8-29-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>D</m:mi>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:msubsup>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mi>m</m:mi>
                           </m:msubsup>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:msubsup>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>t</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>T</m:mi>
                                 </m:msubsup>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:msup>
                                             <m:mrow>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:msub>
                                                   <m:mi>Y</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>t</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mover accent="true">
                                                   <m:mi>&#955;</m:mi>
                                                   <m:mo stretchy="true">^</m:mo>
                                                </m:mover>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>t</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                                <m:msub>
                                                   <m:mrow>
                                                      <m:mover accent="true">
                                                         <m:mi>&#952;</m:mi>
                                                         <m:mo stretchy="true">^</m:mo>
                                                      </m:mover>
                                                   </m:mrow>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mn>2</m:mn>
                                          </m:msup>
                                       </m:mrow>
                                       <m:mo>/</m:mo>
                                       <m:mrow>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mover accent="true">
                                             <m:mi>&#955;</m:mi>
                                             <m:mo stretchy="true">^</m:mo>
                                          </m:mover>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>t</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:msub>
                                             <m:mrow>
                                                <m:mover accent="true">
                                                   <m:mi>&#952;</m:mi>
                                                   <m:mo stretchy="true">^</m:mo>
                                                </m:mover>
                                             </m:mrow>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mstyle>
                        <m:mo>.</m:mo>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>4</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGebarcqGH9aqpdaaeWaqaamaaqadabaWaaSGbaeaacqGGOaakcqWGzbqwdaWgaaWcbaGaemyAaKgabeaakiabcIcaOiabdsha0jabcMcaPiabgkHiTmaaHaaabaacciGae83UdWgacaGLcmaacqGGOaakcqWG0baDcqGGPaqkdaqiaaqaaiab=H7aXbGaayPadaWaaSbaaSqaaiabdMgaPbqabaGccqGGPaqkdaahaaWcbeqaaiabikdaYaaaaOqaaiabcIcaOmaaHaaabaGae83UdWgacaGLcmaacqGGOaakcqWG0baDcqGGPaqkdaqiaaqaaiab=H7aXbGaayPadaWaaSbaaSqaaiabdMgaPbqabaGccqGGPaqkaaaaleaacqWG0baDcqGH9aqpcqaIXaqmaeaacqWGubava0GaeyyeIuoaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd2gaTbqdcqGHris5aOGaeiOla4IaaCzcaiaaxMaadaqadaqaaiabisda0aGaayjkaiaawMcaaaaa@5F7F@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Using Chi-square statistic as a similarity measure, the penalty for the deviation from large expected count is smaller than that for small expected count. It is consistent with the above likelihood-based measure in that the variance of a Poisson variable equals to its mean. In general, the smaller the value of <it>L </it>or <it>D</it>, the more likely the tags belong to the same cluster. We should also note that the statistics in measure (3) and measure (4) consider both the shape and magnitude information when measuring the cluster dispersion, i.e., the cluster is specified by the shape parameter <b><it>&#955;</it></b>, but the relationship of a tag to a certain cluster is determined by the deviation of observed counts (<m:math name="1471-2105-8-29-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#952;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=H7aXbGaayPadaaaaa@2F2B@</m:annotation></m:semantics></m:math><sub><it>i </it></sub><m:math name="1471-2105-8-29-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGadiab=T7aSbGaayPadaaaaa@2F2A@</m:annotation></m:semantics></m:math><sub><it>i</it></sub>) from the expected values (<m:math name="1471-2105-8-29-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#952;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=H7aXbGaayPadaaaaa@2F2B@</m:annotation></m:semantics></m:math><sub><it>i </it></sub><it>&#955;</it>). Here <m:math name="1471-2105-8-29-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGadiab=T7aSbGaayPadaaaaa@2F2A@</m:annotation></m:semantics></m:math><sub><it>i </it></sub>is the estimated profile shape of tag <it>i </it>(<m:math name="1471-2105-8-29-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGadiab=T7aSbGaayPadaaaaa@2F2A@</m:annotation></m:semantics></m:math><sub><it>i </it></sub>= (<m:math name="1471-2105-8-29-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=T7aSbGaayPadaaaaa@2F29@</m:annotation></m:semantics></m:math><sub><it>i </it></sub>(1),...,<m:math name="1471-2105-8-29-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=T7aSbGaayPadaaaaa@2F29@</m:annotation></m:semantics></m:math><sub><it>i </it></sub>(<it>T</it>)) and <m:math name="1471-2105-8-29-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mrow><m:mrow><m:msub><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:mi>i</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>t</m:mi><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:msub><m:mi>Y</m:mi><m:mi>i</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>t</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:mo>/</m:mo><m:mrow><m:mstyle displaystyle="true"><m:msub><m:mo>&#8721;</m:mo><m:mi>t</m:mi></m:msub><m:mrow><m:msub><m:mi>Y</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:mstyle></m:mrow></m:mrow><m:mo stretchy="false">(</m:mo><m:mi>t</m:mi><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mrow><m:mrow><m:msub><m:mi>Y</m:mi><m:mi>i</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>t</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:mo>/</m:mo><m:mrow><m:msub><m:mrow><m:mover accent="true"><m:mi>&#952;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:mi>i</m:mi></m:msub></m:mrow></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcgaqaamaaHaaabaacciGae83UdWgacaGLcmaadaWgaaWcbaGaemyAaKgabeaakiabcIcaOiabdsha0jabcMcaPiabg2da9iabdMfaznaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaemiDaqNaeiykaKcabaWaaabeaeaacqWGzbqwdaWgaaWcbaGaemyAaKgabeaaaeaacqWG0baDaeqaniabggHiLdaaaOGaeiikaGIaemiDaqNaeiykaKIaeyypa0ZaaSGbaeaacqWGzbqwdaWgaaWcbaGaemyAaKgabeaakiabcIcaOiabdsha0jabcMcaPaqaamaaHaaabaGae8hUdehacaGLcmaadaWgaaWcbaGaemyAaKgabeaaaaaaaa@4F25@</m:annotation></m:semantics></m:math>). A measure that ignores magnitude would take the difference between <m:math name="1471-2105-8-29-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGadiab=T7aSbGaayPadaaaaa@2F2A@</m:annotation></m:semantics></m:math><sub><it>i </it></sub>and <it><b>&#955;</b></it> directly.</p>
            <p>Cai et al. <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> have employed the above measures into a <it>K</it>-means clustering algorithm to perform clustering analysis. <it>K</it>-means clustering procedure <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> generates clusters by assigning each object to one of <it>K </it>clusters so as to minimize a measure of dispersion within the clusters. The algorithm is outlined below:</p>
            <p>1. All SAGE tags are assigned randomly to <it>K </it>sets. Estimate initial parameters <m:math name="1471-2105-8-29-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#952;</m:mi><m:mi>i</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:mn>0</m:mn><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF4oqCdaqhaaWcbaGaemyAaKgabaGaeiikaGIaeGimaaJaeiykaKcaaaaa@3291@</m:annotation></m:semantics></m:math> and <m:math name="1471-2105-8-29-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#955;</m:mi><m:mi>k</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:mn>0</m:mn><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup><m:mo>=</m:mo><m:mo stretchy="false">(</m:mo><m:msubsup><m:mi>&#955;</m:mi><m:mi>k</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:mn>0</m:mn><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup><m:mo stretchy="false">(</m:mo><m:mn>1</m:mn><m:mo stretchy="false">)</m:mo><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msubsup><m:mi>&#955;</m:mi><m:mi>k</m:mi><m:mn>0</m:mn></m:msubsup><m:mo stretchy="false">(</m:mo><m:mi>T</m:mi><m:mo stretchy="false">)</m:mo><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiWacqWF7oaBdaqhaaWcbaGaem4AaSgabaGaeiikaGIaeGimaaJaeiykaKcaaOGaeyypa0JaeiikaGccciGae43UdW2aa0baaSqaaiabdUgaRbqaaiabcIcaOiabicdaWiabcMcaPaaakiabcIcaOiabigdaXiabcMcaPiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab+T7aSnaaDaaaleaacqWGRbWAaeaacqaIWaamaaGccqGGOaakcqWGubavcqGGPaqkcqGGPaqkaaa@4969@</m:annotation></m:semantics></m:math> for each tag and each cluster by formula (2).</p>
            <p>2. In the (b+1)th iteration, assign each tag <it>i </it>to the cluster with minimum deviation from the expected model. The deviation is measured by either <m:math name="1471-2105-8-29-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>L</m:mi><m:mrow><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>k</m:mi></m:mrow><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>b</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup><m:mo>=</m:mo><m:mo>&#8722;</m:mo><m:mi>log</m:mi><m:mo>&#8289;</m:mo><m:mi>f</m:mi><m:mo stretchy="false">(</m:mo><m:msub><m:mi>Y</m:mi><m:mi>i</m:mi></m:msub><m:mo>|</m:mo><m:msubsup><m:mi>&#955;</m:mi><m:mi>k</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>b</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup><m:mo>,</m:mo><m:msubsup><m:mi>&#952;</m:mi><m:mi>i</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>b</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGmbatdaqhaaWcbaGaemyAaKMaeiilaWIaem4AaSgabaGaeiikaGIaemOyaiMaeiykaKcaaOGaeyypa0JaeyOeI0IagiiBaWMaei4Ba8Maei4zaCMaemOzayMaeiikaGccbmGae8xwaK1aaSbaaSqaaiabdMgaPbqabaGccqGG8baFiiWacqGF7oaBdaqhaaWcbaGaem4AaSgabaGaeiikaGIaemOyaiMaeiykaKcaaOGaeiilaWccciGae0hUde3aa0baaSqaaiabdMgaPbqaaiabcIcaOiabdkgaIjabcMcaPaaakiabcMcaPaaa@4F85@</m:annotation></m:semantics></m:math> or <m:math name="1471-2105-8-29-i14" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>D</m:mi><m:mrow><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>k</m:mi></m:mrow><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>b</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:msub><m:mo>&#8721;</m:mo><m:mi>t</m:mi></m:msub><m:mrow><m:mrow><m:mrow><m:msup><m:mrow><m:mrow><m:mo>(</m:mo><m:mrow><m:msub><m:mi>Y</m:mi><m:mi>i</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>t</m:mi><m:mo stretchy="false">)</m:mo><m:mo>&#8722;</m:mo><m:msubsup><m:mi>&#955;</m:mi><m:mi>k</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>b</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup><m:mo stretchy="false">(</m:mo><m:mi>t</m:mi><m:mo stretchy="false">)</m:mo><m:msubsup><m:mi>&#952;</m:mi><m:mi>i</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>b</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:mn>2</m:mn></m:msup></m:mrow><m:mo>/</m:mo><m:mrow><m:mo stretchy="false">(</m:mo><m:msubsup><m:mi>&#955;</m:mi><m:mi>k</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>b</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup><m:mo stretchy="false">(</m:mo><m:mi>t</m:mi><m:mo stretchy="false">)</m:mo><m:msubsup><m:mi>&#952;</m:mi><m:mi>i</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>b</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGebardaqhaaWcbaGaemyAaKMaeiilaWIaem4AaSgabaGaeiikaGIaemOyaiMaeiykaKcaaOGaeyypa0ZaaabeaeaadaWcgaqaamaabmaabaGaemywaK1aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqWG0baDcqGGPaqkcqGHsisliiGacqWF7oaBdaqhaaWcbaGaem4AaSgabaGaeiikaGIaemOyaiMaeiykaKcaaOGaeiikaGIaemiDaqNaeiykaKIae8hUde3aa0baaSqaaiabdMgaPbqaaiabcIcaOiabdkgaIjabcMcaPaaaaOGaayjkaiaawMcaamaaCaaaleqabaGaeGOmaidaaaGcbaGaeiikaGIae83UdW2aa0baaSqaaiabdUgaRbqaaiabcIcaOiabdkgaIjabcMcaPaaakiabcIcaOiabdsha0jabcMcaPiab=H7aXnaaDaaaleaacqWGPbqAaeaacqGGOaakcqWGIbGycqGGPaqkaaGccqGGPaqkaaaaleaacqWG0baDaeqaniabggHiLdaaaa@639B@</m:annotation></m:semantics></m:math>.</p>
            <p>3. Set new cluster centers <m:math name="1471-2105-8-29-i15" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#955;</m:mi><m:mi>k</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>b</m:mi><m:mo>+</m:mo><m:mn>1</m:mn><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiWacqWF7oaBdaqhaaWcbaGaem4AaSgabaGaeiikaGIaemOyaiMaey4kaSIaeGymaeJaeiykaKcaaaaa@34C5@</m:annotation></m:semantics></m:math> by formula (2).</p>
            <p>4. Repeat step 2 till convergence.</p>
            <p>Let <it>c</it>(<it>i</it>) denote the index of the cluster that tag <it>i </it>is assigned to. The above algorithm aims to minimize the within-cluster dispersion &#8721;<sub><it>i</it></sub><it>L</it><sub><it>i,c(i) </it></sub>or &#8721;<sub><it>i</it></sub><it>D</it><sub><it>i,c(i)</it></sub>. The algorithm using measure <it>L </it>is called <it>PoissonL</it>, and the algorithm using measure <it>D </it>is called <it>PoissonC</it>. <it>PoissonL </it>and <it>PoissonC </it>perform similarly in applications. But <it>PoissonC </it>is more practical in terms of running time. So we use <it>PoissonC </it>for comparison in this paper.</p>
            <p><it>PoissonC </it>is designed to group the objects by their departure from the expected Poisson distributions. The success of <it>PoissonC </it>has been shown in applications <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. However, if the clustering purpose is slightly different, some modification on <it>PoissonC </it>may be necessary. For instance, if the shape difference should be more emphasized in determining the relationship, the <it>direction of departure </it>of observed from expected may/should also be considered. As an example, we consider an expression vector <b><it>Y </it></b>= (15, 30, 15) and its relationship with two clusters with shape specified by <it>&#955;</it><sub>1 </sub>= (1/12,5/6,1/12) and <it>&#955;</it><sub>2 </sub>= (5/12, 1/6, 5/12) respectively. The expectation of <b><it>Y </it></b>in cluster 1 is <m:math name="1471-2105-8-29-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Y</m:mi><m:mi>E</m:mi><m:mn>1</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieWacqWFzbqwdaqhaaWcbaGaemyraueabaGaeGymaedaaaaa@301F@</m:annotation></m:semantics></m:math> = (5, 50, 5), and in cluster 2, it is <m:math name="1471-2105-8-29-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Y</m:mi><m:mi>E</m:mi><m:mn>2</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieWacqWFzbqwdaqhaaWcbaGaemyraueabaGaeGOmaidaaaaa@3021@</m:annotation></m:semantics></m:math> = (25, 10, 25). If more emphasis should be put on the shape change in determining the relationship, <b><it>Y </it></b>would be expected to be closer to the first cluster because of the large value observed on the middle component in both <b><it>Y </it></b>and <m:math name="1471-2105-8-29-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Y</m:mi><m:mi>E</m:mi><m:mn>1</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieWacqWFzbqwdaqhaaWcbaGaemyraueabaGaeGymaedaaaaa@301F@</m:annotation></m:semantics></m:math>. <it>PoissonC</it>, however, determines that <b><it>Y </it></b>has the same distance to <m:math name="1471-2105-8-29-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Y</m:mi><m:mi>E</m:mi><m:mn>1</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieWacqWFzbqwdaqhaaWcbaGaemyraueabaGaeGymaedaaaaa@301F@</m:annotation></m:semantics></m:math> and <m:math name="1471-2105-8-29-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Y</m:mi><m:mi>E</m:mi><m:mn>2</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieWacqWFzbqwdaqhaaWcbaGaemyraueabaGaeGOmaidaaaaa@3021@</m:annotation></m:semantics></m:math> (by the measure (4), the distance between <b><it>Y </it></b>and <m:math name="1471-2105-8-29-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Y</m:mi><m:mi>E</m:mi><m:mn>1</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieWacqWFzbqwdaqhaaWcbaGaemyraueabaGaeGymaedaaaaa@301F@</m:annotation></m:semantics></m:math> is 48, so is the distance between <b><it>Y </it></b>and <m:math name="1471-2105-8-29-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Y</m:mi><m:mi>E</m:mi><m:mn>2</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieWacqWFzbqwdaqhaaWcbaGaemyraueabaGaeGOmaidaaaaa@3021@</m:annotation></m:semantics></m:math>). <it>PoissonC </it>ignores the <it>direction of departure</it>. To address this omission we propose to emphasize the profile shape through suitable data transformations, and to define a distance measure in the transformed space. The construction of a proper feature space under a certain clustering purpose is essential to define an effective distance or similarity measure.</p>
         </sec>
         <sec>
            <st>
               <p>Proposed distance measures (I): TransChisq</p>
            </st>
            <p>A simple yet natural data transformation to emphasize the expression shape is to consider the mutual differences of the original vector components. Given a gene with expression profile <b><it>Y</it></b><sub><it>i </it></sub>= (<it>Y</it><sub><it>i</it></sub>(1),..., <it>Y</it><sub><it>i</it></sub>(<it>T</it>)) the transformed vector <b><it>Z</it></b><sub><it>i </it></sub>is of dimension <it>T</it>(<it>T</it>-1)/2 with components in the form of <it>Y</it><sub><it>i</it></sub>(<it>t</it><sub>1</sub>)-<it>Y</it><sub><it>i</it></sub>(<it>t</it><sub>2</sub>) for <it>t</it><sub>1 </sub>= 1,..., <it>T</it>-1 and <it>t</it><sub>2 </sub>= (<it>t</it><sub>1 </sub>+ 1),..., <it>T</it>.</p>
            <p>According to the Poisson model in the previous section, <it>E</it>(<it>Y</it><sub><it>i</it></sub>(<it>t</it><sub>1</sub>)-<it>Y</it><sub><it>i</it></sub>(<it>t</it><sub>2</sub>)) = (<it>&#955;</it><sub><it>i</it></sub>(<it>t</it><sub>1</sub>)-<it>&#955;</it><sub><it>i</it></sub>(<it>t</it><sub>2</sub>))<it>&#952;</it><sub><it>i </it></sub>and <it>Var</it>(<it>Y</it><sub><it>i</it></sub>(<it>t</it><sub>1</sub>)-<it>Y</it><sub><it>i</it></sub>(<it>t</it><sub>2</sub>)) = (<it>&#955;</it><sub><it>i</it></sub>(<it>t</it><sub>1</sub>)+<it>&#955;</it><sub><it>i</it></sub>(<it>t</it><sub>2</sub>))<it>&#952;</it><sub><it>i</it></sub>. For a cluster consisting of tags l, 2,..., <it>m</it>, we can define the following statistic to measure the cluster dispersion:</p>
            <p>
               <m:math name="1471-2105-8-29-i18" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>S</m:mi>
                                       <m:mrow>
                                          <m:mi>t</m:mi>
                                          <m:mi>r</m:mi>
                                          <m:mi>a</m:mi>
                                          <m:mi>n</m:mi>
                                          <m:mi>s</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mstyle displaystyle="true">
                                             <m:munder>
                                                <m:mo>&#8721;</m:mo>
                                                <m:mi>i</m:mi>
                                             </m:munder>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:munder>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>t</m:mi>
                                                            <m:mn>1</m:mn>
                                                         </m:msub>
                                                         <m:mo>,</m:mo>
                                                         <m:msub>
                                                            <m:mi>t</m:mi>
                                                            <m:mn>2</m:mn>
                                                         </m:msub>
                                                      </m:mrow>
                                                   </m:munder>
                                                   <m:mrow>
                                                      <m:msup>
                                                         <m:mrow>
                                                            <m:mrow>
                                                               <m:mo>(</m:mo>
                                                               <m:mrow>
                                                                  <m:mrow>
                                                                     <m:mo>(</m:mo>
                                                                     <m:mrow>
                                                                        <m:msub>
                                                                           <m:mi>Y</m:mi>
                                                                           <m:mi>i</m:mi>
                                                                        </m:msub>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mrow>
                                                                              <m:msub>
                                                                                 <m:mi>t</m:mi>
                                                                                 <m:mn>1</m:mn>
                                                                              </m:msub>
                                                                           </m:mrow>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                        <m:mo>&#8722;</m:mo>
                                                                        <m:msub>
                                                                           <m:mi>Y</m:mi>
                                                                           <m:mi>i</m:mi>
                                                                        </m:msub>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mrow>
                                                                              <m:msub>
                                                                                 <m:mi>t</m:mi>
                                                                                 <m:mn>2</m:mn>
                                                                              </m:msub>
                                                                           </m:mrow>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                     </m:mrow>
                                                                     <m:mo>)</m:mo>
                                                                  </m:mrow>
                                                                  <m:mo>&#8722;</m:mo>
                                                                  <m:mi>E</m:mi>
                                                                  <m:mrow>
                                                                     <m:mo>(</m:mo>
                                                                     <m:mrow>
                                                                        <m:msub>
                                                                           <m:mi>Y</m:mi>
                                                                           <m:mi>i</m:mi>
                                                                        </m:msub>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mrow>
                                                                              <m:msub>
                                                                                 <m:mi>t</m:mi>
                                                                                 <m:mn>1</m:mn>
                                                                              </m:msub>
                                                                           </m:mrow>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                        <m:mo>&#8722;</m:mo>
                                                                        <m:msub>
                                                                           <m:mi>Y</m:mi>
                                                                           <m:mi>i</m:mi>
                                                                        </m:msub>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mrow>
                                                                              <m:msub>
                                                                                 <m:mi>t</m:mi>
                                                                                 <m:mn>2</m:mn>
                                                                              </m:msub>
                                                                           </m:mrow>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                     </m:mrow>
                                                                     <m:mo>)</m:mo>
                                                                  </m:mrow>
                                                               </m:mrow>
                                                               <m:mo>)</m:mo>
                                                            </m:mrow>
                                                         </m:mrow>
                                                         <m:mn>2</m:mn>
                                                      </m:msup>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mstyle>
                                       </m:mrow>
                                       <m:mo>/</m:mo>
                                       <m:mrow>
                                          <m:mi>V</m:mi>
                                          <m:mi>a</m:mi>
                                          <m:mi>r</m:mi>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>Y</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mrow>
                                                   <m:mo>(</m:mo>
                                                   <m:mrow>
                                                      <m:msub>
                                                         <m:mi>t</m:mi>
                                                         <m:mn>1</m:mn>
                                                      </m:msub>
                                                   </m:mrow>
                                                   <m:mo>)</m:mo>
                                                </m:mrow>
                                                <m:mo>&#8722;</m:mo>
                                                <m:msub>
                                                   <m:mi>Y</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mrow>
                                                   <m:mo>(</m:mo>
                                                   <m:mrow>
                                                      <m:msub>
                                                         <m:mi>t</m:mi>
                                                         <m:mn>2</m:mn>
                                                      </m:msub>
                                                   </m:mrow>
                                                   <m:mo>)</m:mo>
                                                </m:mrow>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mo>=</m:mo>
                                          <m:mstyle displaystyle="true">
                                             <m:munder>
                                                <m:mo>&#8721;</m:mo>
                                                <m:mi>i</m:mi>
                                             </m:munder>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:munder>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>t</m:mi>
                                                            <m:mn>1</m:mn>
                                                         </m:msub>
                                                         <m:mo>,</m:mo>
                                                         <m:msub>
                                                            <m:mi>t</m:mi>
                                                            <m:mn>2</m:mn>
                                                         </m:msub>
                                                      </m:mrow>
                                                   </m:munder>
                                                   <m:mrow>
                                                      <m:msup>
                                                         <m:mrow>
                                                            <m:mrow>
                                                               <m:mo>(</m:mo>
                                                               <m:mrow>
                                                                  <m:mrow>
                                                                     <m:mo>(</m:mo>
                                                                     <m:mrow>
                                                                        <m:msub>
                                                                           <m:mi>Y</m:mi>
                                                                           <m:mi>i</m:mi>
                                                                        </m:msub>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mrow>
                                                                              <m:msub>
                                                                                 <m:mi>t</m:mi>
                                                                                 <m:mn>1</m:mn>
                                                                              </m:msub>
                                                                           </m:mrow>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                        <m:mo>&#8722;</m:mo>
                                                                        <m:msub>
                                                                           <m:mi>Y</m:mi>
                                                                           <m:mi>i</m:mi>
                                                                        </m:msub>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mrow>
                                                                              <m:msub>
                                                                                 <m:mi>t</m:mi>
                                                                                 <m:mn>2</m:mn>
                                                                              </m:msub>
                                                                           </m:mrow>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                     </m:mrow>
                                                                     <m:mo>)</m:mo>
                                                                  </m:mrow>
                                                                  <m:mo>&#8722;</m:mo>
                                                                  <m:mrow>
                                                                     <m:mo>(</m:mo>
                                                                     <m:mrow>
                                                                        <m:mover accent="true">
                                                                           <m:mi>&#955;</m:mi>
                                                                           <m:mo stretchy="true">^</m:mo>
                                                                        </m:mover>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mrow>
                                                                              <m:msub>
                                                                                 <m:mi>t</m:mi>
                                                                                 <m:mn>1</m:mn>
                                                                              </m:msub>
                                                                           </m:mrow>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                        <m:msub>
                                                                           <m:mrow>
                                                                              <m:mover accent="true">
                                                                                 <m:mi>&#952;</m:mi>
                                                                                 <m:mo stretchy="true">^</m:mo>
                                                                              </m:mover>
                                                                           </m:mrow>
                                                                           <m:mi>i</m:mi>
                                                                        </m:msub>
                                                                        <m:mo>&#8722;</m:mo>
                                                                        <m:mover accent="true">
                                                                           <m:mi>&#955;</m:mi>
                                                                           <m:mo stretchy="true">^</m:mo>
                                                                        </m:mover>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mrow>
                                                                              <m:msub>
                                                                                 <m:mi>t</m:mi>
                                                                                 <m:mn>2</m:mn>
                                                                              </m:msub>
                                                                           </m:mrow>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                        <m:msub>
                                                                           <m:mrow>
                                                                              <m:mover accent="true">
                                                                                 <m:mi>&#952;</m:mi>
                                                                                 <m:mo stretchy="true">^</m:mo>
                                                                              </m:mover>
                                                                           </m:mrow>
                                                                           <m:mi>i</m:mi>
                                                                        </m:msub>
                                                                     </m:mrow>
                                                                     <m:mo>)</m:mo>
                                                                  </m:mrow>
                                                               </m:mrow>
                                                               <m:mo>)</m:mo>
                                                            </m:mrow>
                                                         </m:mrow>
                                                         <m:mn>2</m:mn>
                                                      </m:msup>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mstyle>
                                       </m:mrow>
                                       <m:mo>/</m:mo>
                                       <m:mrow>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:mover accent="true">
                                                   <m:mi>&#955;</m:mi>
                                                   <m:mo stretchy="true">^</m:mo>
                                                </m:mover>
                                                <m:mrow>
                                                   <m:mo>(</m:mo>
                                                   <m:mrow>
                                                      <m:msub>
                                                         <m:mi>t</m:mi>
                                                         <m:mn>1</m:mn>
                                                      </m:msub>
                                                   </m:mrow>
                                                   <m:mo>)</m:mo>
                                                </m:mrow>
                                                <m:msub>
                                                   <m:mrow>
                                                      <m:mover accent="true">
                                                         <m:mi>&#952;</m:mi>
                                                         <m:mo stretchy="true">^</m:mo>
                                                      </m:mover>
                                                   </m:mrow>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo>+</m:mo>
                                                <m:mover accent="true">
                                                   <m:mi>&#955;</m:mi>
                                                   <m:mo stretchy="true">^</m:mo>
                                                </m:mover>
                                                <m:mrow>
                                                   <m:mo>(</m:mo>
                                                   <m:mrow>
                                                      <m:msub>
                                                         <m:mi>t</m:mi>
                                                         <m:mn>2</m:mn>
                                                      </m:msub>
                                                   </m:mrow>
                                                   <m:mo>)</m:mo>
                                                </m:mrow>
                                                <m:msub>
                                                   <m:mrow>
                                                      <m:mover accent="true">
                                                         <m:mi>&#952;</m:mi>
                                                         <m:mo stretchy="true">^</m:mo>
                                                      </m:mover>
                                                   </m:mrow>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                    </m:mrow>
                                    <m:mo>,</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>5</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqadeGabaaabaGaem4uam1aaSbaaSqaaiabdsha0jabdkhaYjabdggaHjabd6gaUjabdohaZbqabaGccqGH9aqpdaWcgaqaamaaqafabaWaaabuaeaadaqadaqaamaabmaabaGaemywaK1aaSbaaSqaaiabdMgaPbqabaGcdaqadaqaaiabdsha0naaBaaaleaacqaIXaqmaeqaaaGccaGLOaGaayzkaaGaeyOeI0IaemywaK1aaSbaaSqaaiabdMgaPbqabaGcdaqadaqaaiabdsha0naaBaaaleaacqaIYaGmaeqaaaGccaGLOaGaayzkaaaacaGLOaGaayzkaaGaeyOeI0Iaemyrau0aaeWaaeaacqWGzbqwdaWgaaWcbaGaemyAaKgabeaakmaabmaabaGaemiDaq3aaSbaaSqaaiabigdaXaqabaaakiaawIcacaGLPaaacqGHsislcqWGzbqwdaWgaaWcbaGaemyAaKgabeaakmaabmaabaGaemiDaq3aaSbaaSqaaiabikdaYaqabaaakiaawIcacaGLPaaaaiaawIcacaGLPaaaaiaawIcacaGLPaaadaahaaWcbeqaaiabikdaYaaaaeaacqWG0baDdaWgaaadbaGaeGymaedabeaaliabcYcaSiabdsha0naaBaaameaacqaIYaGmaeqaaaWcbeqdcqGHris5aaWcbaGaemyAaKgabeqdcqGHris5aaGcbaGaemOvayLaemyyaeMaemOCai3aaeWaaeaacqWGzbqwdaWgaaWcbaGaemyAaKgabeaakmaabmaabaGaemiDaq3aaSbaaSqaaiabigdaXaqabaaakiaawIcacaGLPaaacqGHsislcqWGzbqwdaWgaaWcbaGaemyAaKgabeaakmaabmaabaGaemiDaq3aaSbaaSqaaiabikdaYaqabaaakiaawIcacaGLPaaaaiaawIcacaGLPaaaaaaabaWaaSGbaeaacqGH9aqpdaaeqbqaamaaqafabaWaaeWaaeaadaqadaqaaiabdMfaznaaBaaaleaacqWGPbqAaeqaaOWaaeWaaeaacqWG0baDdaWgaaWcbaGaeGymaedabeaaaOGaayjkaiaawMcaaiabgkHiTiabdMfaznaaBaaaleaacqWGPbqAaeqaaOWaaeWaaeaacqWG0baDdaWgaaWcbaGaeGOmaidabeaaaOGaayjkaiaawMcaaaGaayjkaiaawMcaaiabgkHiTmaabmaabaWaaecaaeaaiiGacqWF7oaBaiaawkWaamaabmaabaGaemiDaq3aaSbaaSqaaiabigdaXaqabaaakiaawIcacaGLPaaadaqiaaqaaiab=H7aXbGaayPadaWaaSbaaSqaaiabdMgaPbqabaGccqGHsisldaqiaaqaaiab=T7aSbGaayPadaWaaeWaaeaacqWG0baDdaWgaaWcbaGaeGOmaidabeaaaOGaayjkaiaawMcaamaaHaaabaGae8hUdehacaGLcmaadaWgaaWcbaGaemyAaKgabeaaaOGaayjkaiaawMcaaaGaayjkaiaawMcaamaaCaaaleqabaGaeGOmaidaaaqaaiabdsha0naaBaaameaacqaIXaqmaeqaaSGaeiilaWIaemiDaq3aaSbaaWqaaiabikdaYaqabaaaleqaniabggHiLdaaleaacqWGPbqAaeqaniabggHiLdaakeaadaqadaqaamaaHaaabaGae83UdWgacaGLcmaadaqadaqaaiabdsha0naaBaaaleaacqaIXaqmaeqaaaGccaGLOaGaayzkaaWaaecaaeaacqWF4oqCaiaawkWaamaaBaaaleaacqWGPbqAaeqaaOGaey4kaSYaaecaaeaacqWF7oaBaiaawkWaamaabmaabaGaemiDaq3aaSbaaSqaaiabikdaYaqabaaakiaawIcacaGLPaaadaqiaaqaaiab=H7aXbGaayPadaWaaSbaaSqaaiabdMgaPbqabaaakiaawIcacaGLPaaaaaGaeiilaWcaaiaaxMaacaWLjaWaaeWaaeaacqaI1aqnaiaawIcacaGLPaaaaaa@D011@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where <m:math name="1471-2105-8-29-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=T7aSbGaayPadaaaaa@2F29@</m:annotation></m:semantics></m:math>(<it>t</it>) and <m:math name="1471-2105-8-29-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#952;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=H7aXbGaayPadaaaaa@2F2B@</m:annotation></m:semantics></m:math><sub><it>i </it></sub>can be estimated by formula (2). We call the modified <it>K</it>-means algorithm with this measure <it>TransChisq</it>. Applying it to the toy example in the previous section, <it>TransChisq </it>determines that <b><it>Y </it></b>is closer to <m:math name="1471-2105-8-29-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>Y</m:mi><m:mi>E</m:mi><m:mn>1</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieWacqWFzbqwdaqhaaWcbaGaemyraueabaGaeGymaedaaaaa@301F@</m:annotation></m:semantics></m:math> as we expected.</p>
            <p>To better understand the effects of the proposed data transformation, we performed a simple simulation study and presented the results in Additional file <supplr sid="S3">3</supplr>.</p>
            <suppl id="S3">
               <title>
                  <p>Additional File 3</p>
               </title>
               <text>
                  <p><b>The performance of new measures in a hierarchical clustering algorithm</b>. This PDF file presents the application results of the hierarchical clustering algorithms with different measures implemented.</p>
               </text>
               <file name="1471-2105-8-29-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Proposed distance measures (II): a parametric-covariance-matrix-based measure</p>
            </st>
            <p>Now we consider a data transformation determined by a parametric covariance matrix:</p>
            <p><b>R </b>= cov(<b>X</b>) = (<it>&#947;</it><sub><it>ij</it></sub>)<sub><it>i,j </it>= 1,..., <it>T</it></sub>, with <it>&#947;</it><sub><it>ij </it></sub>= <it>&#945; </it>> 0 if <it>i </it>= <it>j </it>and <it>&#947;</it><sub><it>ij </it></sub>= <it>&#946; </it>if <it>i </it>&#8800; <it>j</it>,</p>
            <p>where <b>X </b>is the data matrix with <it>n </it>observations on the rows and <it>T </it>variables on the columns, and <b>R </b>is the covariance matrix of the <it>T </it>variables. The matrix <b>R </b>in this form implies that the variables have identical variances and covariances with each other. These properties are biologically reasonable in that normalized arrays have identical distributions, hence equal variances. Also all pairs of variables would exhibit equal covariance (or un-correlated when <it>&#946; </it>= 0) if each component had been equally important (or independent) to determine a class.</p>
            <p>A data transformation can be defined through the eigenspace of <b>R</b>. One set of column orthonormal eigenvectors, denoted by <b>e</b><sub>1</sub>,<b>e</b><sub>2</sub>,...,<b>e</b><sub><it>T</it></sub>, is presented in Additional file <supplr sid="S4">4</supplr>. Given a gene expression profile <b><it>Y</it></b><sub><it>i </it></sub>= (<it>Y</it><sub><it>i</it></sub>(1),..., <it>Y</it><sub><it>i</it></sub>(<it>T</it>)), a transformation based on <b>R </b>is</p>
            <suppl id="S4">
               <title>
                  <p>Additional File 4</p>
               </title>
               <text>
                  <p><b>The effects of the <it>TransChisq </it>data transformation in measuring pattern similarity</b>. This PDF file presents a simple simulation study for the effects of the data transformation in <it>TransChisq </it>with a comparison to <it>PoissonC</it>.</p>
               </text>
               <file name="1471-2105-8-29-S4.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p><b><it>Z</it></b><sub><it>i </it></sub>= (<it>Z</it><sub><it>i</it>1</sub>,..., <it>Z</it><sub><it>i</it>T</sub>) = <b><it>Y</it></b><sub><it>i </it></sub>(<b>e</b><sub>1 </sub><b>e</b><sub>2</sub>...<b>e</b><sub><it>T</it></sub>).</p>
            <p>A convenient property of this transformation is that each component has a clear meaning: with <b>e</b><sub>1 </sub>= [1/<m:math name="1471-2105-8-29-i19" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mi>T</m:mi></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabdsfaubWcbeaaaaa@2DF8@</m:annotation></m:semantics></m:math>,...,1/<m:math name="1471-2105-8-29-i19" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mi>T</m:mi></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabdsfaubWcbeaaaaa@2DF8@</m:annotation></m:semantics></m:math>]<sup>T</sup>, <b>e</b><sub>2 </sub>= [1/<m:math name="1471-2105-8-29-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>2</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabikdaYaWcbeaaaaa@2DB9@</m:annotation></m:semantics></m:math>, -1/<m:math name="1471-2105-8-29-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>2</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabikdaYaWcbeaaaaa@2DB9@</m:annotation></m:semantics></m:math>,0,...,0]<sup>T </sup>and <b>e</b><sub>3 </sub>= [1/<m:math name="1471-2105-8-29-i21" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>6</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabiAda2aWcbeaaaaa@2DC1@</m:annotation></m:semantics></m:math>,1/<m:math name="1471-2105-8-29-i21" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>6</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabiAda2aWcbeaaaaa@2DC1@</m:annotation></m:semantics></m:math>,-2/<m:math name="1471-2105-8-29-i21" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>6</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabiAda2aWcbeaaaaa@2DC1@</m:annotation></m:semantics></m:math>,0,...,0]<sup>T</sup>, for a profile <b><it>Y </it></b>= (<it>Y</it><sub>1</sub>,..., <it>Y</it><sub>T</sub>), the component associated with <b>e</b><sub>1 </sub>is <b><it>Y</it>e</b><sub>1 </sub>= (<it>Y</it><sub>1 </sub>+ <it>Y</it><sub>2</sub>+...+<it>Y</it><sub>T</sub>)/<m:math name="1471-2105-8-29-i19" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mi>T</m:mi></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabdsfaubWcbeaaaaa@2DF8@</m:annotation></m:semantics></m:math>, which reflects the general expression level; the component associated with <b>e</b><sub>2 </sub>is <b><it>Y</it>e</b><sub>2 </sub>= (<it>Y</it><sub>1</sub>-<it>Y</it><sub>2</sub>)/<m:math name="1471-2105-8-29-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>2</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabikdaYaWcbeaaaaa@2DB9@</m:annotation></m:semantics></m:math>, which reflects the difference between <it>Y</it><sub>1 </sub>and <it>Y</it><sub>2</sub>; the component associated with <b>e</b><sub>3 </sub>is <b><it>Y</it>e</b><sub>3 </sub>= (<it>Y</it><sub>1</sub>+<it>Y</it><sub>2</sub>-2<it>Y</it><sub>3</sub>)/<m:math name="1471-2105-8-29-i21" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>6</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabiAda2aWcbeaaaaa@2DC1@</m:annotation></m:semantics></m:math>, which reflects the relationship among <it>Y</it><sub>1</sub>, <it>Y</it><sub>2 </sub>and <it>Y</it><sub>3</sub>.</p>
            <p>According to the Poisson model, <it>E</it>(<it>Z</it><sub><it>it</it></sub>) = <it>E</it>(<b><it>Y</it></b><sub><it>i</it></sub>)<b>e</b><sub><it>t </it></sub>= (<it>&#955;</it><sub><it>i</it></sub>(1)<it>&#952;</it><sub><it>i</it></sub>,..., <it>&#955;</it><sub><it>i</it></sub>(<it>T</it>)<it>&#952;</it><sub><it>i</it></sub>)<b>e</b><sub><it>t</it></sub>, <it>Var</it>(<it>Z</it><sub><it>i</it>t</sub>) = (<it>&#955;</it><sub><it>i</it></sub>(1)<it>&#952;</it><sub><it>i</it></sub>,..., <it>&#955;</it><sub><it>i</it></sub>(<it>T</it>)<it>&#952;</it><sub><it>i</it></sub>)<m:math name="1471-2105-8-29-i22" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>e</m:mi><m:mi>t</m:mi><m:mn>2</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieqacqWFLbqzdaqhaaWcbaGaemiDaqhabaGaeGOmaidaaaaa@3095@</m:annotation></m:semantics></m:math> and <it>Cov</it>(<it>Z</it><sub><it>it</it></sub>, <it>Z</it><sub><it>ik</it></sub>) = 0 when <it>t </it>&#8800; <it>k</it>. Then for a cluster consisting of tags 1, 2,..., <it>m</it>, we can measure the cluster dispersion by:</p>
            <p>
               <m:math name="1471-2105-8-29-i23" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>S</m:mi>
                                       <m:mrow>
                                          <m:mi>t</m:mi>
                                          <m:mi>r</m:mi>
                                          <m:mi>a</m:mi>
                                          <m:mi>n</m:mi>
                                          <m:mi>s</m:mi>
                                          <m:mo>_</m:mo>
                                          <m:mi>N</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mstyle displaystyle="true">
                                             <m:munder>
                                                <m:mo>&#8721;</m:mo>
                                                <m:mi>i</m:mi>
                                             </m:munder>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:munder>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:mi>t</m:mi>
                                                         <m:mo>=</m:mo>
                                                         <m:mn>1</m:mn>
                                                         <m:mo>,</m:mo>
                                                         <m:mn>..</m:mn>
                                                         <m:mo>,</m:mo>
                                                         <m:mi>T</m:mi>
                                                      </m:mrow>
                                                   </m:munder>
                                                   <m:mrow>
                                                      <m:msup>
                                                         <m:mrow>
                                                            <m:mrow>
                                                               <m:mo>(</m:mo>
                                                               <m:mrow>
                                                                  <m:msub>
                                                                     <m:mi>Z</m:mi>
                                                                     <m:mrow>
                                                                        <m:mi>i</m:mi>
                                                                        <m:mi>t</m:mi>
                                                                     </m:mrow>
                                                                  </m:msub>
                                                                  <m:mo>&#8722;</m:mo>
                                                                  <m:mi>E</m:mi>
                                                                  <m:mrow>
                                                                     <m:mo>(</m:mo>
                                                                     <m:mrow>
                                                                        <m:msub>
                                                                           <m:mi>Z</m:mi>
                                                                           <m:mrow>
                                                                              <m:mi>i</m:mi>
                                                                              <m:mi>t</m:mi>
                                                                           </m:mrow>
                                                                        </m:msub>
                                                                     </m:mrow>
                                                                     <m:mo>)</m:mo>
                                                                  </m:mrow>
                                                               </m:mrow>
                                                               <m:mo>)</m:mo>
                                                            </m:mrow>
                                                         </m:mrow>
                                                         <m:mn>2</m:mn>
                                                      </m:msup>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mstyle>
                                       </m:mrow>
                                       <m:mo>/</m:mo>
                                       <m:mrow>
                                          <m:mi>V</m:mi>
                                          <m:mi>a</m:mi>
                                          <m:mi>r</m:mi>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>Z</m:mi>
                                                   <m:mrow>
                                                      <m:mi>i</m:mi>
                                                      <m:mi>t</m:mi>
                                                   </m:mrow>
                                                </m:msub>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mo>=</m:mo>
                                          <m:mstyle displaystyle="true">
                                             <m:munder>
                                                <m:mo>&#8721;</m:mo>
                                                <m:mi>i</m:mi>
                                             </m:munder>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:munder>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:mi>t</m:mi>
                                                         <m:mo>=</m:mo>
                                                         <m:mn>2</m:mn>
                                                         <m:mo>,</m:mo>
                                                         <m:mn>..</m:mn>
                                                         <m:mo>,</m:mo>
                                                         <m:mi>T</m:mi>
                                                      </m:mrow>
                                                   </m:munder>
                                                   <m:mrow>
                                                      <m:msup>
                                                         <m:mrow>
                                                            <m:mrow>
                                                               <m:mo>(</m:mo>
                                                               <m:mrow>
                                                                  <m:msub>
                                                                     <m:mi>Z</m:mi>
                                                                     <m:mrow>
                                                                        <m:mi>i</m:mi>
                                                                        <m:mi>t</m:mi>
                                                                     </m:mrow>
                                                                  </m:msub>
                                                                  <m:mo>&#8722;</m:mo>
                                                                  <m:mrow>
                                                                     <m:mo>(</m:mo>
                                                                     <m:mrow>
                                                                        <m:mover accent="true">
                                                                           <m:mi>&#955;</m:mi>
                                                                           <m:mo stretchy="true">^</m:mo>
                                                                        </m:mover>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mn>1</m:mn>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                        <m:msub>
                                                                           <m:mrow>
                                                                              <m:mover accent="true">
                                                                                 <m:mi>&#952;</m:mi>
                                                                                 <m:mo stretchy="true">^</m:mo>
                                                                              </m:mover>
                                                                           </m:mrow>
                                                                           <m:mi>i</m:mi>
                                                                        </m:msub>
                                                                        <m:mo>,</m:mo>
                                                                        <m:mn>...</m:mn>
                                                                        <m:mo>,</m:mo>
                                                                        <m:mover accent="true">
                                                                           <m:mi>&#955;</m:mi>
                                                                           <m:mo stretchy="true">^</m:mo>
                                                                        </m:mover>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mi>T</m:mi>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                        <m:msub>
                                                                           <m:mrow>
                                                                              <m:mover accent="true">
                                                                                 <m:mi>&#952;</m:mi>
                                                                                 <m:mo stretchy="true">^</m:mo>
                                                                              </m:mover>
                                                                           </m:mrow>
                                                                           <m:mi>i</m:mi>
                                                                        </m:msub>
                                                                     </m:mrow>
                                                                     <m:mo>)</m:mo>
                                                                  </m:mrow>
                                                                  <m:msub>
                                                                     <m:mi>e</m:mi>
                                                                     <m:mi>t</m:mi>
                                                                  </m:msub>
                                                               </m:mrow>
                                                               <m:mo>)</m:mo>
                                                            </m:mrow>
                                                         </m:mrow>
                                                         <m:mn>2</m:mn>
                                                      </m:msup>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mstyle>
                                       </m:mrow>
                                       <m:mo>/</m:mo>
                                       <m:mrow>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:mover accent="true">
                                                   <m:mi>&#955;</m:mi>
                                                   <m:mo stretchy="true">^</m:mo>
                                                </m:mover>
                                                <m:mrow>
                                                   <m:mo>(</m:mo>
                                                   <m:mn>1</m:mn>
                                                   <m:mo>)</m:mo>
                                                </m:mrow>
                                                <m:msub>
                                                   <m:mrow>
                                                      <m:mover accent="true">
                                                         <m:mi>&#952;</m:mi>
                                                         <m:mo stretchy="true">^</m:mo>
                                                      </m:mover>
                                                   </m:mrow>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo>,</m:mo>
                                                <m:mn>...</m:mn>
                                                <m:mo>,</m:mo>
                                                <m:mover accent="true">
                                                   <m:mi>&#955;</m:mi>
                                                   <m:mo stretchy="true">^</m:mo>
                                                </m:mover>
                                                <m:mrow>
                                                   <m:mo>(</m:mo>
                                                   <m:mi>T</m:mi>
                                                   <m:mo>)</m:mo>
                                                </m:mrow>
                                                <m:msub>
                                                   <m:mrow>
                                                      <m:mover accent="true">
                                                         <m:mi>&#952;</m:mi>
                                                         <m:mo stretchy="true">^</m:mo>
                                                      </m:mover>
                                                   </m:mrow>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                          <m:msubsup>
                                             <m:mi>e</m:mi>
                                             <m:mi>t</m:mi>
                                             <m:mn>2</m:mn>
                                          </m:msubsup>
                                       </m:mrow>
                                    </m:mrow>
                                    <m:mo>.</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>6</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqadeGabaaabaGaem4uam1aaSbaaSqaaiabdsha0jabdkhaYjabdggaHjabd6gaUjabdohaZjabc+faFjabd6eaobqabaGccqGH9aqpdaWcgaqaamaaqafabaWaaabuaeaadaqadaqaaiabdQfaAnaaBaaaleaacqWGPbqAcqWG0baDaeqaaOGaeyOeI0Iaemyrau0aaeWaaeaacqWGAbGwdaWgaaWcbaGaemyAaKMaemiDaqhabeaaaOGaayjkaiaawMcaaaGaayjkaiaawMcaamaaCaaaleqabaGaeGOmaidaaaqaaiabdsha0jabg2da9iabigdaXiabcYcaSiabc6caUiabc6caUiabcYcaSiabdsfaubqab0GaeyyeIuoaaSqaaiabdMgaPbqab0GaeyyeIuoaaOqaaiabdAfawjabdggaHjabdkhaYnaabmaabaGaemOwaO1aaSbaaSqaaiabdMgaPjabdsha0bqabaaakiaawIcacaGLPaaaaaaabaWaaSGbaeaacqGH9aqpdaaeqbqaamaaqafabaWaaeWaaeaacqWGAbGwdaWgaaWcbaGaemyAaKMaemiDaqhabeaakiabgkHiTmaabmaabaWaaecaaeaaiiGacqWF7oaBaiaawkWaamaabmaabaGaeGymaedacaGLOaGaayzkaaWaaecaaeaacqWF4oqCaiaawkWaamaaBaaaleaacqWGPbqAaeqaaOGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWYaaecaaeaacqWF7oaBaiaawkWaamaabmaabaGaemivaqfacaGLOaGaayzkaaWaaecaaeaacqWF4oqCaiaawkWaamaaBaaaleaacqWGPbqAaeqaaaGccaGLOaGaayzkaaacbeGae4xzau2aaSbaaSqaaiabdsha0bqabaaakiaawIcacaGLPaaadaahaaWcbeqaaiabikdaYaaaaeaacqWG0baDcqGH9aqpcqaIYaGmcqGGSaalcqGGUaGlcqGGUaGlcqGGSaalcqWGubavaeqaniabggHiLdaaleaacqWGPbqAaeqaniabggHiLdaakeaadaqadaqaamaaHaaabaGae83UdWgacaGLcmaadaqadaqaaiabigdaXaGaayjkaiaawMcaamaaHaaabaGae8hUdehacaGLcmaadaWgaaWcbaGaemyAaKgabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSmaaHaaabaGae83UdWgacaGLcmaadaqadaqaaiabdsfaubGaayjkaiaawMcaamaaHaaabaGae8hUdehacaGLcmaadaWgaaWcbaGaemyAaKgabeaaaOGaayjkaiaawMcaaiab+vgaLnaaDaaaleaacqWG0baDaeaacqaIYaGmaaaaaOGaeiOla4caaiaaxMaacaWLjaWaaeWaaeaacqaI2aGnaiaawIcacaGLPaaaaaa@B13C@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>We should note the connection between this measure and the <it>S</it><sub><it>trans </it></sub>in formula (5). As we discussed above, the component associated with <b>e</b><sub>2 </sub>is (<it>Y</it><sub>1</sub>-<it>Y</it><sub>2</sub>)/<m:math name="1471-2105-8-29-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mn>2</m:mn></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabikdaYaWcbeaaaaa@2DB9@</m:annotation></m:semantics></m:math>. Thus the new space associated with <it>S</it><sub><it>trans </it></sub>is equivalent to the space determined by <b>e</b><sub>2 </sub>and all its row-switching transformations. We can also define a measure similarly through <b>e</b><sub>3 </sub>or other eigenvectors. <it>S</it><sub><it>trans </it></sub>seems to have the potential of losing the information carried by <b>e</b><sub>3 </sub>and other eigenvectors. However, applications of <it>TransChisq </it>to a variety of datasets suggested that this potential information loss is minor and can be ignored in most cases in practice. In fact, the row-switching transformations of <b>e</b><sub>2 </sub>make up most of the information included in <b>e</b><sub>3 </sub>and other eigenvectors.</p>
            <p>A potential shortcoming of <it>S</it><sub><it>trans_N </it></sub>comes from the fact that it is defined based on only one set of eigenvectors. The orthonormal eigenspace of a covariance matrix is not unique (e.g., the row switching operation can result in a different set of eigenvectors) and different eigenspaces may result in different values of <it>S</it><sub><it>trans_N </it></sub>. Although one can consider all possible eigenspaces to overcome the limitation of <it>S</it><sub><it>trans_N</it></sub>, it is not computationally feasible.</p>
            <p>Applying <it>S</it><sub><it>trans_N </it></sub>to several different datasets, we observed that i) using the eigenvectors <b>e</b><sub>1</sub>,<b>e</b><sub>2</sub>,...,<b>e</b><sub><it>T </it></sub>in Additional file <supplr sid="S4">4</supplr>, <it>S</it><sub><it>trans_N </it></sub>performs very similarly to <it>S</it><sub><it>trans </it></sub>and ii) when a different set of eigenvectors used, the clustering results can be different, though the difference is not obvious. These results are not presented in this paper.</p>
         </sec>
         <sec>
            <st>
               <p>Proposed distance measures (III): PCAChisq</p>
            </st>
            <p>For comparison purposes, we applied PCA to transform the data <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. PCA is useful to simplify the analysis of a high dimensional dataset. Recently, PCA has been explored as a method for clustering gene expression data <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. But a blind application of PCA in clustering analysis is dangerous in that PCA chooses principal component axes based on the empirical covariance matrix rather than the class information, and thus it does not necessarily give good clustering results <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>.</p>
            <p>In some theoretical <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and empirical <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> studies, there have been observations that the first few principal components (PCs) in PCA are not always helpful to extract meaningful signals from data. Thus, we considered all PCs in this study. By substituting the <b>e</b><sub>1 </sub><b>e</b><sub>2</sub>...<b>e</b><sub><it>T </it></sub>in measure (6) by the eigenvectors from the sample covariance matrix, we defined a new measure and implemented it in the <it>PCAChisq</it>. The Results section gives examples showing the positive and negative effects of the PCA transformation. In general, <it>PCAChisq </it>is difficult to use. Firstly, it is unclear what types of variances the principal components are capturing (if it is the within-cluster variance, the principal components would lead to wrong clustering results). Next, it is unclear how many principal components should be used. The optimal number of PCs is unavailable before we compare the results to the ground truth. To be brief, <it>PCAChisq </it>is only efficient when the principal components happen to match the key features that determine a cluster.</p>
         </sec>
         <sec>
            <st>
               <p>Clustering analysis of microarray data</p>
            </st>
            <p>We explored the potential application of the proposed measures to a clustering analysis of microarray data. We proposed the following restricted normal model for this purpose. The parameter notations in the Poisson model were adopted. Given a microarray dataset of expressions of <it>n </it>genes in <it>T </it>experiments, the expression of gene <it>i </it>in experiment <it>t</it>, <it>X</it><sub><it>i</it></sub>(<it>t</it>), is assumed to be normally distributed with mean <it>&#956;</it><sub><it>i</it></sub>(<it>t</it>) = <it>&#955;</it><sub><it>i</it></sub>(<it>t</it>)<it>&#952;</it><sub><it>i </it></sub>and variance <m:math name="1471-2105-8-29-i24" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#963;</m:mi><m:mi>i</m:mi><m:mn>2</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFdpWCdaqhaaWcbaGaemyAaKgabaGaeGOmaidaaaaa@30F0@</m:annotation></m:semantics></m:math>(<it>t</it>) = <it>k&#955;</it><sub><it>i</it></sub>(<it>t</it>)<it>&#952;</it><sub><it>i</it></sub>, where <it>k </it>is an unknown constant. The derivation of the maximum likelihood estimates (MLEs) of <it>&#955;</it><sub><it>i</it></sub>(<it>t</it>) and <it>&#952;</it><sub><it>i </it></sub>under the normal model is rather involved. So we borrowed the estimators in formula (2). It can be shown that <m:math name="1471-2105-8-29-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#952;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=H7aXbGaayPadaaaaa@2F2B@</m:annotation></m:semantics></m:math><sub><it>i </it></sub>in formula (2) is unbiased and <m:math name="1471-2105-8-29-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=T7aSbGaayPadaaaaa@2F29@</m:annotation></m:semantics></m:math><sub><it>t </it></sub>in formula (2) is consistent under the restricted normal model [see Additional file <supplr sid="S5">5</supplr>]. With <m:math name="1471-2105-8-29-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#952;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=H7aXbGaayPadaaaaa@2F2B@</m:annotation></m:semantics></m:math><sub><it>i </it></sub>and <m:math name="1471-2105-8-29-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#955;</m:mi><m:mo stretchy="true">^</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqiaaqaaGGaciab=T7aSbGaayPadaaaaa@2F29@</m:annotation></m:semantics></m:math><sub><it>t </it></sub>available under the normal model, <it>TransChisq</it>, <it>PCAChisq </it>and <it>PoissonC </it>can be applied.</p>
            <suppl id="S5">
               <title>
                  <p>Additional File 5</p>
               </title>
               <text>
                  <p><b>The guideline on the various parameters in the simulation dataset in </b>Table <tblr tid="T2">2</tblr>. This PDF file presents the motivation and guideline for choosing the various parameters in the simulation dataset in Table <tblr tid="T2">2</tblr>.</p>
               </text>
               <file name="1471-2105-8-29-S5.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>For both oligonucleotide and cDNA microarray data, it is widely observed that there is strong dependence of the variance on the mean: variance increases with mean <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>. So it is reasonable to expect that our restricted normal model is applicable to many microarray datasets. One example of this application on the yeast sporulation dataset has been presented to demonstrate the power of <it>TransChisq </it>in analyzing microarray data (see the Results section). We should also note that <it>TransChisq </it>would deliver less promising results if the assumption on the relationship between the variance and the mean is seriously violated.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>KK participated in the design of the study, performed the analysis and drafted the Results section of the manuscript. SZ, KJ and LJF provided the Maize root microarray data, which helped in motivating this research. SZ, KJ and LJF were responsible for the biological explanations on the results related to maize data. LC provided the developing mouse retina SAGE data and was responsible to the biological explanations on the clustering results related to SAGE data. IBL helped in formulating PCA related studies. HH conceived of this study, proposed the method, coordinated the collaborations and wrote the paper. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The work of K. Kim was supported by Pohang University of Science and Technology (POSTECH), Korea and NIH R01GM075312. The work of H. Huang was supported by NIH R01GM075312.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Gene expression data analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Brazma</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vilo</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>FEES Lett</source>
            <pubdate>2000</pubdate>
            <volume>480</volume>
            <fpage>17</fpage>
            <lpage>24</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0014-5793(00)01772-5</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Computational analysis of microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>418</fpage>
            <lpage>427</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35076576</pubid>
                  <pubid idtype="pmpid" link="fulltext">11389458</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Cluster analysis and display of genome-wide expression patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>14863</fpage>
            <lpage>14868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">24541</pubid>
                  <pubid idtype="pmpid" link="fulltext">9843981</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.25.14863</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Hierarchical Clustering Schemes</p>
            </title>
            <aug>
               <au>
                  <snm>Johnson</snm>
                  <fnm>SC</fnm>
               </au>
            </aug>
            <source>Psychometrika</source>
            <pubdate>1967</pubdate>
            <volume>2</volume>
            <fpage>241</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/BF02289588</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <aug>
               <au>
                  <snm>Hartigan</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Clustering algorithms</source>
            <publisher>New York: John Wiley &amp; Sons, Inc</publisher>
            <pubdate>1975</pubdate>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation</p>
            </title>
            <aug>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Kitareewan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dmitrovsky</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>2907</fpage>
            <lpage>2912</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">15868</pubid>
                  <pubid idtype="pmpid" link="fulltext">10077610</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.6.2907</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <aug>
               <au>
                  <snm>McLachlan</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Basford</snm>
                  <fnm>KE</fnm>
               </au>
            </aug>
            <source>Mixture models: inference and applications to clustering</source>
            <publisher>New York: Dekker</publisher>
            <pubdate>1988</pubdate>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Model-based Gaussian and non-Gaussian clustering</p>
            </title>
            <aug>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Raftery</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Biometrics</source>
            <pubdate>1993</pubdate>
            <volume>49</volume>
            <fpage>803</fpage>
            <lpage>821</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2532201</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Model-based clustering, discriminant analysis and density estimation</p>
            </title>
            <aug>
               <au>
                  <snm>Fraley</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Raftery</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Journal of the American Statistical Association</source>
            <pubdate>2002</pubdate>
            <volume>97</volume>
            <fpage>611</fpage>
            <lpage>631</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1198/016214502760047131</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Estimating the number of clusters in a data set via the gap statistic</p>
            </title>
            <aug>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Walther</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J R Statist Soc B</source>
            <pubdate>2001</pubdate>
            <volume>63</volume>
            <fpage>411</fpage>
            <lpage>423</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1111/1467-9868.00293</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Fuzzy clustering as a means of selecting representative conformers and molecular alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Feher</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schmidt</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Chem Inf Comput Sci</source>
            <pubdate>2003</pubdate>
            <volume>43</volume>
            <fpage>810</fpage>
            <lpage>818</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ci0200671</pubid>
                  <pubid idtype="pmpid" link="fulltext">12767138</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Knowledge-assisted recognition of cluster boundaries in gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Okada</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sahara</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Mitsubayashi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ohgiya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nagashima</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Artif Intell Med</source>
            <pubdate>2005</pubdate>
            <volume>35</volume>
            <fpage>171</fpage>
            <lpage>183</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.artmed.2005.02.007</pubid>
                  <pubid idtype="pmpid" link="fulltext">16054350</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Self organizing hierarchical multicast trees and their optimization</p>
            </title>
            <aug>
               <au>
                  <snm>Baccelli</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kofman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rougier</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Proceedings of IEEE Inforcom'99</source>
            <pubdate>1999</pubdate>
            <volume>3</volume>
            <fpage>1081</fpage>
            <lpage>1089</lpage>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Optimization based clustering algorithms in multicast group hierarchies</p>
            </title>
            <aug>
               <au>
                  <snm>Jia</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bagirov</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Ouveysi</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Rubinov</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Proceedings of the Australian Telecommunications, Networks and Applications Conference (ATNAC)</source>
            <publisher>Melbourne Australia</publisher>
            <pubdate>2003</pubdate>
            <note>(published on CD, ISNB 0-646-42229-4).</note>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Using Bayesian networks to analyze expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Friedman</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Linial</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nachman</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pe'er</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>601</fpage>
            <lpage>620</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652700750050961</pubid>
                  <pubid idtype="pmpid" link="fulltext">11108481</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Large-scale temporal gene expression mapping of central nervous system development</p>
            </title>
            <aug>
               <au>
                  <snm>Wen</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Fuhrman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Michaels</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Carr</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Somogyi</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>334</fpage>
            <lpage>339</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">18216</pubid>
                  <pubid idtype="pmpid" link="fulltext">9419376</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.1.334</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Analysis techniques for microarray time-series data</p>
            </title>
            <aug>
               <au>
                  <snm>Filkov</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Skiena</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zhi</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2002</pubdate>
            <volume>9</volume>
            <fpage>317</fpage>
            <lpage>330</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270252935485</pubid>
                  <pubid idtype="pmpid" link="fulltext">12015884</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Clustering of gene expression data using a local shape-based similarity measure</p>
            </title>
            <aug>
               <au>
                  <snm>Balasubramaniyan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hullermeier</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Weskamp</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kamper</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1069</fpage>
            <lpage>1077</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti095</pubid>
                  <pubid idtype="pmpid" link="fulltext">15513997</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <aug>
               <au>
                  <snm>Jolliffe</snm>
                  <fnm>IT</fnm>
               </au>
            </aug>
            <source>Principal Component Analysis</source>
            <publisher>New York: Springer-Verlag</publisher>
            <pubdate>1986</pubdate>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Cluster analysis of SAGE data using a Poisson approach</p>
            </title>
            <aug>
               <au>
                  <snm>Cai</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Blackshaw</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Cepko</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R51</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">463327</pubid>
                  <pubid idtype="pmpid" link="fulltext">15239836</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-7-r51</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Genomic analysis of mouse retinal development</p>
            </title>
            <aug>
               <au>
                  <snm>Blackshaw</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Harpavat</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Trimarchi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cai</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kuo</snm>
                  <fnm>WP</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Fraioli</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Cho</snm>
                  <fnm>S-H</fnm>
               </au>
               <au>
                  <snm>Yung</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Asch</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ohno-Machado</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Cepko</snm>
                  <fnm>CL</fnm>
               </au>
            </aug>
            <source>PLoS Biology</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <fpage>e247</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">439783</pubid>
                  <pubid idtype="pmpid" link="fulltext">15226823</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0020247</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>The transcriptional program of sporulation in budding yeast</p>
            </title>
            <aug>
               <au>
                  <snm>Chu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>DeRisi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mulholland</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Herskowitz</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <fpage>699</fpage>
            <lpage>705</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.282.5389.699</pubid>
                  <pubid idtype="pmpid" link="fulltext">9784122</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Transcription profile analysis identify genes and pathways central to root cap functions in maize</p>
            </title>
            <aug>
               <au>
                  <snm>Jiang</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tsai</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Chilcott</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Feldman</snm>
                  <fnm>LJ</fnm>
               </au>
            </aug>
            <source>Plant Molecular Biology</source>
            <pubdate>2006</pubdate>
            <volume>60</volume>
            <fpage>343</fpage>
            <lpage>363</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s11103-005-4209-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">16514559</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Alizadeh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Staudt</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2000</pubdate>
            <volume>1</volume>
            <issue>2</issue>
            <fpage>research0003</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">15015</pubid>
                  <pubid idtype="pmpid" link="fulltext">11178228</pubid>
                  <pubid idtype="doi">10.1186/gb-2000-1-2-research0003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Estimating the number of clusters in a data set via the gap statistic</p>
            </title>
            <aug>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Walther</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J R Statist Soc B</source>
            <pubdate>2001</pubdate>
            <volume>63</volume>
            <fpage>411</fpage>
            <lpage>423</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1111/1467-9868.00293</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Comparing partitions</p>
            </title>
            <aug>
               <au>
                  <snm>Hubert</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Arabie</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>J Classifi</source>
            <pubdate>1995</pubdate>
            <fpage>193</fpage>
            <lpage>218</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>POWER_SAGE: comparing statistical tests for SAGE experiments</p>
            </title>
            <aug>
               <au>
                  <snm>Man</snm>
                  <fnm>MZ</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>953</fpage>
            <lpage>959</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.11.953</pubid>
                  <pubid idtype="pmpid" link="fulltext">11159306</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Principal components analysis to summarize microarray experiments: application to sporulation time series</p>
            </title>
            <aug>
               <au>
                  <snm>Raychaudhuri</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Stuart</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2000</pubdate>
            <volume>5</volume>
            <fpage>452</fpage>
            <lpage>463</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Principal component analysis for clustering gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Yeung</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Ruzzo</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>763</fpage>
            <lpage>774</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.9.763</pubid>
                  <pubid idtype="pmpid" link="fulltext">11590094</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Singular value decomposition for genome-wide expression data processing and modeling</p>
            </title>
            <aug>
               <au>
                  <snm>Alter</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Bostein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>10101</fpage>
            <lpage>10106</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">27718</pubid>
                  <pubid idtype="pmpid" link="fulltext">10963673</pubid>
                  <pubid idtype="doi">10.1073/pnas.97.18.10101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Fundamental patterns underlying gene expression profiles: simplicity from complexity</p>
            </title>
            <aug>
               <au>
                  <snm>Holter</snm>
                  <fnm>NS</fnm>
               </au>
               <au>
                  <snm>Mitra</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Maritan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cieplak</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Banavar</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Fedoroff</snm>
                  <fnm>NV</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>8409</fpage>
            <lpage>8414</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">26961</pubid>
                  <pubid idtype="pmpid" link="fulltext">10890920</pubid>
                  <pubid idtype="doi">10.1073/pnas.150242097</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>PCA disjoint models for multiclass cancer analysis using gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Bicciato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Luchini</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Di Bello</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>571</fpage>
            <lpage>578</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg051</pubid>
                  <pubid idtype="pmpid" link="fulltext">12651714</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Interactive exploration of microarray gene expression patterns in a reduced dimensional space</p>
            </title>
            <aug>
               <au>
                  <snm>Misra</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schmitt</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hwang</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hsiao</snm>
                  <fnm>L-L</fnm>
               </au>
               <au>
                  <snm>Gullans</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Stephanopoulos</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1112</fpage>
            <lpage>1120</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186614</pubid>
                  <pubid idtype="pmpid" link="fulltext">12097349</pubid>
                  <pubid idtype="doi">10.1101/gr.225302</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Multidimensional support vector machines for visualization of gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Komura</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tsutsumi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Aburatani</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ihara</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>439</fpage>
            <lpage>444</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti188</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608050</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>On using principal components before separating a mixture of two multivariate normal distributions</p>
            </title>
            <aug>
               <au>
                  <snm>Chang</snm>
                  <fnm>W-C</fnm>
               </au>
            </aug>
            <source>Appl Statist</source>
            <pubdate>1983</pubdate>
            <volume>32</volume>
            <fpage>267</fpage>
            <lpage>275</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2347949</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A variance-stabilizing transformation for gene-expression microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Durbin</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Hardin</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Hawkins</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Rocke</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>S105</fpage>
            <lpage>S110</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12169537</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Heterogeneity of variance in gene expression microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Rocke</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <publisher>University of California at Davis, Department of Applied Science and Division of Bio statistics</publisher>
            <pubdate>2003</pubdate>
            <url>http://www.cipic.ucdavis.edu/~dmrocke/papers/empbayes2.pdf</url>
         </bibl>
      </refgrp>
   </bm>
</art>
