<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-427</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Extracting gene expression patterns and identifying co-expressed genes from microarray data reveals biologically responsive processes</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Chou</snm>
               <mi>W</mi>
               <fnm>Jeff</fnm>
               <insr iid="I1"/>
               <email>chou@niehs.nih.gov</email>
            </au>
            <au id="A2">
               <snm>Zhou</snm>
               <fnm>Tong</fnm>
               <insr iid="I2"/>
               <email>tzhou@email.unc.edu</email>
            </au>
            <au id="A3">
               <snm>Kaufmann</snm>
               <mi>K</mi>
               <fnm>William</fnm>
               <insr iid="I2"/>
               <email>bill_kaufmann@med.unc.edu</email>
            </au>
            <au id="A4">
               <snm>Paules</snm>
               <mi>S</mi>
               <fnm>Richard</fnm>
               <insr iid="I1"/>
               <email>paules@niehs.nih.gov</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Bushel</snm>
               <mi>R</mi>
               <fnm>Pierre</fnm>
               <insr iid="I1"/>
               <email>bushel@niehs.nih.gov</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Microarray Group, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Pathology and Laboratory Medicine, Center for Environmental Health and Susceptibility, and Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>427</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/427</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17980031</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-427</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>27</day>
               <month>4</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>02</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>02</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Chou et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>A common observation in the analysis of gene expression data is that many genes display similarity in their expression patterns and therefore appear to be co-regulated. However, the variation associated with microarray data and the complexity of the experimental designs make the acquisition of co-expressed genes a challenge. We developed a novel method for Extracting microarray gene expression Patterns and Identifying co-expressed Genes, designated as EPIG. The approach utilizes the underlying structure of gene expression data to extract patterns and identify co-expressed genes that are responsive to experimental conditions.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Through evaluation of the correlations among profiles, the magnitude of variation in gene expression profiles, and profile signal-to-noise ratio's, EPIG extracts a set of patterns representing co-expressed genes. The method is shown to work well with a simulated data set and microarray data obtained from time-series studies of dauer recovery and L1 starvation in C. elegans and after ultraviolet (UV) or ionizing radiation (IR)-induced DNA damage in diploid human fibroblasts. With the simulated data set, EPIG extracted the appropriate number of patterns which were more stable and homogeneous than the set of patterns that were determined using the CLICK or CAST clustering algorithms. However, CLICK performed better than EPIG and CAST with respect to the average correlation between clusters/patterns of the simulated data. With real biological data, EPIG extracted more dauer-specific patterns than CLICK. Furthermore, analysis of the IR/UV data revealed 18 unique patterns and 2661 genes out of approximately 17,000 that were identified as significantly expressed and categorized to the patterns by EPIG. The time-dependent patterns displayed similar and dissimilar responses between IR and UV treatments. Gene Ontology analysis applied to each pattern-related subset of co-expressed genes revealed underlying biological processes affected by IR- and/or UV- induced DNA damage.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>EPIG competed with CLICK and performed better than CAST in extracting patterns from simulated data. EPIG extracted more biological informative patterns and co-expressed genes from both C. elegans and IR/UV-treated human fibroblasts. Using Gene Ontology analysis of the genes in the patterns extracted by EPIG, several key biological categories related to p53-dependent cell cycle control were revealed from the IR/UV data. Among them were mitotic cell cycle, DNA replication, DNA repair, cell cycle checkpoint, and G<sub>0</sub>-like status transition. EPIG can be applied to data sets from a variety of experimental designs.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>A common observation in the analysis of gene expression is that many genes are co-regulated<abbrgrp><abbr bid=" B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. When genes are co-regulated under various biological conditions, the corresponding expression profiles may display relative similarity, or co-expression. To identify these co-expressed genes, various cluster and factor analysis methods have been applied to microarray datasets. Among the most popular unsupervised clustering methods used for the analysis of gene expression data are hierarchical clustering <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, <it>K</it>-means <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, self-organizing maps (SOM) <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, Clustering Affinity Search Technique (CAST) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, partitioning around medoids (PAM) <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and CLICK<abbrgrp><abbr bid=" B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. These clustering methods, e.g. SOM and <it>K</it>-means, provide a number of cluster centroids to which genes with similar expression profiles are closely situated. However, it is well-known that the number of clusters or varying the starting seed in these cluster methods can produce very different results. In addition, these clustering methods are vulnerable to the presence of "scattered" genes <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> and can lack robustness when there is little spatial separation between the clusters <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Numerous alternative methods have been developed to improve the utility of <it>K</it>-means and SOM for clustering gene expression data, such as clustering based on pre-defined sets of gene expression profiles <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. Supervised clustering methods such as those utilizing predefined patterns <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>, require a priori knowledge of the underlying pattern(s) in the data and may allow exclusion of unknown but biologically meaningful patterns.</p>
         <p>CLICK, a clustering algorithm based on graph theory connectivity, a probabilistic framework and fundamental statistics, does not rely on assumptions or prior knowledge about the clusters or their structure, yet identifies tight and highly similar groups (kernels) that are likely to belong to the same true cluster <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Overall homogeneity (the degree of similarity of elements in the same cluster), and separation of elements in clusters from each other, are two criteria that CLICK relies on to evaluate the quality of the cluster structure. These measures are similar to the intra-compactness and inter-separation distances indices that are used to evaluate the validity of clustering solutions <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         <p>Among factor analysis methods are independent component analysis (ICA) <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp> and partial least squares (PLS) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. ICA and PLS approaches for analysis of gene expression data generally provide biologically meaningful patterns among the top listed components which consist of a large number of co-expressed genes. However, if a perturbation to a biological system stimulates a relatively large number of co-expressed patterns or when a pattern contains only a small number of co-expressed genes, many of them may not be revealed by these methods. We present a novel profile-based method for <ul>E</ul>xtracting microarray gene expression <ul>P</ul>atterns and <ul>I</ul>dentifying co-expressed <ul>G</ul>enes, designated as EPIG. Through analysis of correlations among profiles, the magnitude of variation in gene expression within profiles, and evaluation of the profile signal-to-noise ratios, EPIG extracts a set of patterns representing co-expressed genes. Using a simulated data set, we compared EPIG to CLICK and CAST to evaluate the generation of the appropriate number of patterns. In addition, we applied EPIG and CLICK to a C. elegans dauer recovery and L1 starvation time course gene expression data set <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> to compare the ability of the algorithms to extract patterns related to either dauer-specific recovery or L1 starvation. Finally, we applied EPIG to a combined UV and IR treated time-series data set. Through Gene Ontology analysis of the co-expressed genes, enriched categories provided hints to underlying co-expression, biological processes, and the molecular function of the genes.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>To evaluate the ability of EPIG to extract patterns from gene expression data, we used both EPIG and CLICK to analyze a simulated data set, a publicly available dauer recovery and L1 starvation gene expression data set from Caenorhaditis elegans <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp> and the UV and IR treated fibroblast lines gene expression data. CLICK was selected as the comparator because, as similar to EPIG, the method does not rely on any assumptions or prior knowledge about the clusters or their structure unlike that of SOMs or <it>K</it>-means. The extraction of patterns and categorization of co-expressed genes by EPIG is analogous to clustering the genes (all or the differentially expressed ones) by CLICK. In other words, a pattern in EPIG is equivalent to the pattern extracted from the centroid of a cluster generated by CLICK.</p>
         <sec>
            <st>
               <p>Comparison of CLICK and EPIG using simulated data</p>
            </st>
            <p>Table <tblr tid="T1">1</tblr> lists the mean value distributions used in a simulation of data where the standard deviation was set to be constant at 0.4. Figure <figr fid="F1">1</figr> displays the mean values of the six probability distributions for generating the data, where Figure <figr fid="F1">1(A)</figr> and <figr fid="F1">1(B)</figr> are monotonically up and down, respectively. Figure <figr fid="F1">1(C)</figr> and <figr fid="F1">1(D)</figr> start and end at zero but peak at the third and second data points respectively. Figure <figr fid="F1">1(E)</figr> and <figr fid="F1">1(F)</figr> are flat at 3 and 0, respectively. The mean values equal to zero in Figure <figr fid="F1">1(F)</figr> reflect a flat response analogous to real gene expression data in which a number of genes may be not responsive to a given series of treatments. Normal deviates were drawn at random to generate 15 profiles for each of the six distributions. Figure <figr fid="F2">2</figr> is a principal component analysis (PCA) of the simulated data, where the six distinct clusters of the profiles are distributed in 3-dimensional space. When the simulated data was analyzed by EPIG and CLICK for comparison, EPIG extracted 5 patterns from the data (see Figure <figr fid="F3">3A</figr>) corresponding to profile probability distributions depicted in Figures <figr fid="F1">1(A)</figr> to <figr fid="F1">1(E)</figr> and assigned all of the profiles to their proper patterns (15 profiles each) with 100% accuracy, except for the pattern from the distribution of the data shown in Figure <figr fid="F1">1(F)</figr> and its corresponding and uncategorized profiles (the ones generated to best represent a nonresponsive gene expression pattern). On the other hand, CLICK with the default homogeneity setting, only returned 3 clusters with the patterns of the centroids as shown in Figure <figr fid="F3">3B</figr> from the data omitting the clusters from the distribution of the data shown in Figure <figr fid="F1">1(E)</figr> which had all inter-group mean values at 3. CLICK merged the profiles from the patterns of the distributions of the data of Figure <figr fid="F1">1(C)</figr> and <figr fid="F1">1(D)</figr> together, despite the peaks at the distinct data points in the patterns. In addition, CLICK assigned 32 profiles (two of them from the distribution E in Table <tblr tid="T1">1</tblr> or Figure <figr fid="F1">1(E)</figr>) to its Cluster 1 and 16 profiles (one of them also from the distribution E) to its Cluster 2. We also varied the homogeneity setting. With homogeneity settings at 0.83 or 0.84, CLICK generated the highest overall average homogeneity within the patterns and produced essentially the same three clusters as were produced using the default setting (data not shown).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Six probability distribution profiles</p>
               </caption>
               <text>
                  <p><b>Six probability distribution profiles</b>. Plot of the six probability distribution profiles in terms of mean values and standard deviations given in Table 1. In each of the figures from (A) to (F), there are four data points marked as crosses. The four data points from left to right correspond to inter-group 1 to 4, respectively. The labels of the vertical axis indicate the mean values of the data points. The vertical bars are the standard deviation of 0.4 to each of the mean values.</p>
               </text>
               <graphic file="1471-2105-8-427-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Plot of first three components of a PCA using 90 simulated profiles</p>
               </caption>
               <text>
                  <p><b>Plot of first three components of a PCA using 90 simulated profiles</b>. The six clusters, from A to F, labelled in different colors correspond to the distributions from A to F in table 1. Each of the clusters consists of 15 profiles generated. 84.3% of the variability in the data was captured by the first 3 principal components (PCs). The x-axis is PC1, the y-axis PC2 and the z-axis PC3.</p>
               </text>
               <graphic file="1471-2105-8-427-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Patterns of the simulated data extracted by EPIG and CLICK</p>
               </caption>
               <text>
                  <p><b>Patterns of the simulated data extracted by EPIG and CLICK</b>. The four inter-groups (red, green, blue and black) from left to right in each pattern correspond to the inter-groups from 1 to 4 shown in Table 1. A) The patterns extracted by EPIG are labelled from 1 to 5 correspond to the distributions A to E, respectively. All profiles were categorized to their respective pattern. B) The pattern extracted by CLICK from Cluster 1 with 32 profiles assigned to it appears to have emerged from both distributions C and D in Table 1. The patterns for Clusters 2 and 3 correspond to distributions A and B in Table 1. The two clusters have 16 and 15 profiles assigned respectively.</p>
               </text>
               <graphic file="1471-2105-8-427-3"/>
            </fig>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Intra-group means of simulated profiles where the standard deviation was set at 0.4.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Intra-group means</p>
                     </c>
                     <c ca="center">
                        <p>Inter- group 1</p>
                     </c>
                     <c ca="center">
                        <p>Inter- group 2</p>
                     </c>
                     <c ca="center">
                        <p>Inter- group 3</p>
                     </c>
                     <c ca="center">
                        <p>Inter- group 4</p>
                     </c>
                     <c ca="center">
                        <p>Number of profiles generated</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Distribution A</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Distribution B</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Distribution C</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>1.5</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Distribution D</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>1.5</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Distribution E</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Distribution F</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Analysis of dauer recovery and L1 starvation gene expression from Caenorhabditis elegans</p>
            </st>
            <p>A data set consisting of gene expression profiles from a C. elegans dauer recovery and L1 starvation time course study was obtained to compare EPIG's and CLICK's ability to extract patterns/clusters the data for identification of co-expressed genes from a publicly available microarray data set. The experimental design and other details of the study can be obtained from Wang et al <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Applied to the data which included all the genes, EPIG extracted 18 patterns of gene expression and identified 1597 co-expressed genes (see extracted patterns in Figure S1 and heat map of the 1597 genes in Figure S2 in Additional file <supplr sid="S1">1</supplr>). The numbers of genes categorized to each of the patterns varied from 15 to 263 (see Table S1 in Additional file <supplr sid="S1">1</supplr>). On the other hand, CLICK generated only 6 clusters of genes from the data (1644 selected genes, based on SNR (>3) and magnitude of expression (>0.5), including all 1597 genes identified by EPIG) as shown by the patterns of the centroids in Figure S3 in Additional file <supplr sid="S1">1</supplr>. If we applied CLICK to the whole gene data set as what was done for EPIG, CLICK produced over 60 clusters (data not shown).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Supplemental materials</b>. Additional data 1.pdf is a pdf file to be opened and viewed with Adobe Acrobat.</p>
               </text>
               <file name="1471-2105-8-427-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>In dauer-specific processes, there were four groups of genes corresponding to the dauer recovery: transient, early, climbing and late <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Table <tblr tid="T2">2</tblr> lists the dauer recovery-specific gene expression patterns from both EPIG and CLICK corresponding to the four groups of co-expressed genes along with the dauer enriched state. There were no clusters generated by CLICK which contained genes with expression patterns related to either early or climbing states. However, there were four patterns which corresponded to the transient, late and dauer enriched states. On the other hand, the patterns of the genes extracted by EPIG corresponded to each of these dauer recovery response groups. For example, there were four patterns (Patterns 2, 3, 10 and 11 in Figure S1 in Additional file <supplr sid="S1">1</supplr>) in which expression levels of all of the genes decreased from early to late corresponding to the dauer transient state. However, these four patterns differed slightly from one another. For instance, Patterns 2 and 10 have no change at the start, then the expression of the genes gradually decreases to be substantially down-regulated at the end. However, their corresponding L1 starvation expression levels were either down-regulated (in the case of Pattern 2) or not changed (in the case of Pattern 10) across all the time points. Conversely, Patterns 3 and 11 reflect no change at all time points in L1 starvation, but the response of the genes to dauer recovery were either up-regulated at the start, then gradually decreased to no change at end (in the case of Pattern 3) or had no change at the start, were up-regulated early, then gradually decreased to no change at end (in the case of Pattern 11). Similarly, one may relate EPIG's Patterns 6, 7 and 9 to the dauer recovery early state and Pattern 14 to the climbing state. There were no clusters generated from CLICK that contained patterns of the centroids corresponding to these two dauer-specific states (see Figure S3 in Additional file <supplr sid="S1">1</supplr>). The patterns from EPIG and CLICK showing similar responses between dauer recovery and L1 starvation were not listed in Table <tblr tid="T2">2</tblr>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Dauer recovery-specific gene expression profiles.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>EPIG</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>CLICK</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>pattern number</p>
                     </c>
                     <c ca="left">
                        <p>Dauer recovery</p>
                     </c>
                     <c ca="left">
                        <p>L1 starvation</p>
                     </c>
                     <c ca="left">
                        <p>pattern number</p>
                     </c>
                     <c ca="left">
                        <p>Dauer recovery</p>
                     </c>
                     <c ca="left">
                        <p>L1 starvation</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Transient</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>No change at start, gradually decrease, to significantly down regulated at end</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at all time points</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>From up regulated to no change</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at start, gradually decrease to no change at end</p>
                     </c>
                     <c ca="left">
                        <p>No change at all time points</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>No change at start, gradually decrease, to significantly down regulated at end</p>
                     </c>
                     <c ca="left">
                        <p>No change at start, then, stay minimally down regulated</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>No change at start, gradually decrease, to significantly down regulated at end</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>No change at start, jump to significantly up regulated after the start, gradually decrease to no change at end</p>
                     </c>
                     <c ca="left">
                        <p>No change at all time points</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Early</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 0 h. from up regulated to minimally up regulated</p>
                     </c>
                     <c ca="left">
                        <p>No or minimal change</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at 0 h, peak at 5 h, then to minimally up regulated</p>
                     </c>
                     <c ca="left">
                        <p>Minimally down regulated</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>From no change to down regulated</p>
                     </c>
                     <c ca="left">
                        <p>No change or Minimally down regulated</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Climbing</p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 0 h, climbing to peak at 3 to 5 h</p>
                     </c>
                     <c ca="left">
                        <p>No or minimal change</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Late</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at late time points</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at late time points</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at late time points</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at late time points</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Dauer enriched</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated</p>
                     </c>
                     <c ca="left">
                        <p>From up regulated to no change</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated</p>
                     </c>
                     <c ca="left">
                        <p>From up regulated to no change</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Those patterns not listed below possess similar responses between dauer recovery and L1 starvation except the late responses.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Expression patterns and co-expressed genes in response to UV- and/or IR-induced DNA damage</p>
            </st>
            <p>We next applied EPIG to a microarray data set that combined gene expression profiles of both ionizing radiation (IR)- and ultraviolet (UV)-treated human fibroblast cells with two goals in mind: 1) to find similar and dissimilar responses between treatments and 2) to reveal differences in gene regulation upon DNA damage caused by IR or UV. In each of the two treatments, the data consisted of four biological states, i.e. sham-treated, 2 h-, 6 h-, and 24 h post UV- or IR-treatment. A gene expression profile consists of eight inter-groups, corresponding to four states from the two treatments. Each of the intra-groups contains six data points from three biological replicates and two technical replicates (dye-swap pairs) for a given treatment at a given time point. As such, each gene expression profile consisted of 48 data points. EPIG analysis using the whole data as its input resulted in total of 18 patterns as shown in Figure <figr fid="F4">4</figr> with a total of 2661 co-expressed genes being identified. Each of the co-expressed genes was categorized to a particular pattern. Figure <figr fid="F5">5</figr> is a heat map of the 2661 genes that are arranged in the order of pattern number from top to bottom. Table <tblr tid="T3">3</tblr> lists the number of genes in each of the patterns and denotes their over-represented Gene Ontology biological processes<abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>The patterns extracted by EPIG from the combined UV- and IR- treated data</p>
               </caption>
               <text>
                  <p><b>The patterns extracted by EPIG from the combined UV- and IR- treated data</b>. In each of these patterns, 1 to 18, the first half with open circles were UV-treated and the second half with solid circles were IR-treated. For each treatment, there were three individual cell lines, F1-HTERT, F3-HTERT and F10-HTERT, positioned from left to right. Each cell line consisted of eight data points with four different treatment conditions, i.e., sham-treatment and 2, 6, and 24 h post-treatment colored red, green, blue and magenta, respectively. The vertical axes with zero at the middle are the changes in gene expression (log2 intensity) relative to the sham-treated controls.</p>
               </text>
               <graphic file="1471-2105-8-427-4"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Heat Map of the 2661 genes selected by EPIG</p>
               </caption>
               <text>
                  <p><b>Heat Map of the 2661 genes selected by EPIG</b>. From top to bottom are the 2661 genes selected by EPIG listed in an order from Pattern 1 to 18. The left half is UV-treated and the right half is IR-treated. For each treatment, three individual cell lines, F1-HTERT, F3-HTERT and F10-HTERT, are positioned from left to right. Each cell line consisted of four different treatment conditions, sham-treatment, 2, 6, and 24 h post-treatment from left to right. Red and green colors correspond to up and down regulation, respectively, with a darker color denoting less differential expression.</p>
               </text>
               <graphic file="1471-2105-8-427-5"/>
            </fig>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Pattern response trends and selected over represented Gene Ontology categories.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="center">
                        <p>Pattern</p>
                     </c>
                     <c ca="center">
                        <p>No. of genes</p>
                     </c>
                     <c ca="center">
                        <p>UV Response trends</p>
                     </c>
                     <c ca="center">
                        <p>IR Response trends</p>
                     </c>
                     <c ca="left">
                        <p>Selected over represented Gene Ontology categories*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at 2 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Insignificant response</p>
                     </c>
                     <c ca="left">
                        <p>Transcription factor complex, development, nucleoplasm, regulation of transcription from Pol II promoter, morphogenesis</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at 2 h and 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Insignificant response</p>
                     </c>
                     <c ca="left">
                        <p>Nucleus, nucleic acid binding</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>173</p>
                     </c>
                     <c ca="left">
                        <p>Moderately up regulated at 2 h and peak 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Insignificant response</p>
                     </c>
                     <c ca="left">
                        <p>RNA metabolism, RNA processing, methyltransferase activity, nucleolus</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="left">
                        <p>Moderately down regulated at 2 h and up regulated at 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Insignificant response</p>
                     </c>
                     <c ca="left">
                        <p>Metabolism, nucleobase\, nucleoside\, nucleotide and nucleic acid metabolism, regulation of transcription, DNA-dependent</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 2 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Insignificant response</p>
                     </c>
                     <c ca="left">
                        <p>Regulation of transcription, transcription\DNA-dependent, nucleic acid binding, nucleus</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 2 h and 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Insignificant response</p>
                     </c>
                     <c ca="left">
                        <p>Protein serine/threonine kinase activity, nucleus</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>52</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 2 h and Moderately up regulated 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Insignificant response</p>
                     </c>
                     <c ca="left">
                        <p>Transcription regulator activity</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>616</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Insignificant response</p>
                     </c>
                     <c ca="left">
                        <p>Purine nucleotide binding, protein modification, protein amino acid phosphorylation, ubiquitin cycle, kinase activity, cell growth and/or maintenance</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>172</p>
                     </c>
                     <c ca="left">
                        <p>Moderately up regulated at 2 h and peak 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Moderately up regulated at 2 h and down regulated at 24 h post IR</p>
                     </c>
                     <c ca="left">
                        <p>Nucleolus, ribosome biogenesis and assembly, mitotic cell cycle, rRNA processing, DNA replication, S phase of mitotic cell cycle</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>Moderately up regulated at 2 h and peak 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at 2 h post IR, then decrease at 6 h and remained stable through 24 h post IR.</p>
                     </c>
                     <c ca="left">
                        <p>cell proliferation</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>Moderately up regulated at 2 h and peak 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Progressively up regulated from 2 h to 24 h post IR</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>47</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 2 h and Moderately down regulated 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 2 h post IR</p>
                     </c>
                     <c ca="left">
                        <p>protein binding</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>385</p>
                     </c>
                     <c ca="left">
                        <p>Moderately up regulated at 24 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at 24 h post IR</p>
                     </c>
                     <c ca="left">
                        <p>Lysosome, lytic vacuole, complement activation</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>563</p>
                     </c>
                     <c ca="left">
                        <p>Moderately down regulated at 24 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 24 h post IR</p>
                     </c>
                     <c ca="left">
                        <p>Mitotic cell cycle, DNA replication and chromosome cycle, M phase, nuclear division, cell growth and/or maintenance, RNA processing, RNA metabolism, response to DNA damage stimulus, DNA repair, cell cycle checkpoint, cell growth and/or maintenance, G1/S transition of mitotic cell cycle, G2/M transition of mitotic cell cycle,</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>49</p>
                     </c>
                     <c ca="left">
                        <p>Insignificant response</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at 24 h post IR</p>
                     </c>
                     <c ca="left">
                        <p>cell adhesion</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>115</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at 24 h post IR</p>
                     </c>
                     <c ca="left">
                        <p>catalytic activity</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                     <c ca="left">
                        <p>Up regulated at 6 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Moderately down regulated at 2 h post IR</p>
                     </c>
                     <c ca="left">
                        <p>DNA binding</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>56</p>
                     </c>
                     <c ca="left">
                        <p>Down regulated at 6 h and 24 h post UV</p>
                     </c>
                     <c ca="left">
                        <p>Moderately up regulated at 24 h post IR</p>
                     </c>
                     <c ca="left">
                        <p>plasma membrane, morphogenesis</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*Includes biological processes, molecular function and cellular component.</p>
               </tblfn>
            </tbl>
            <p>Each pattern shown in Figure <figr fid="F4">4</figr> includes both UV (the first half of the profile from left to right) and IR (the second half) treatments. For each treatment there are three individual cell lines (F1-hTERT, F3-hTERT and F10-hTERT). Each one consists of four time-series points, i.e. sham-treated controls (red), 2 h- (green), 6 h- (blue), and 24 h- (magenta) post UV or IR treatments. Patterns 1 through 8 show UV-specific expression (either up- or down-regulated) with little or no changes in gene expression in IR-treated cells. UV-specific up-regulation of gene expression (Pattern 1 to 4) happened only at early time points, 2 and/or 6 h after UV irradiation, and fully recovered to baseline levels at 24 h. As shown in Table <tblr tid="T3">3</tblr>, the 324 genes in Patterns 1&#8211;4 were related to transcriptional regulation, RNA processing and nucleic acid binding. UV-specific down-regulation of gene expression (Patterns 5 to 8) also occurred at early time points (2 h and/or 6 h post UV). Biological processes related to regulation of transcription were also over-represented by these genes. In addition, over 600 genes in Pattern 8, which were substantially repressed at 6 h post UV, contained about 30 biological processes over-represented including purine nucleotide binding, protein modification, ubiquitin cycle, kinase activity, and cell growth.</p>
            <p>Genes in Patterns 9 to 15 responded to both UV- and IR-treatment but with different time dependencies. In Pattern 9 expression was up-regulated at 2 and 6 h post UV-treatment and then recovered to near baseline levels at 24 h. Genes in this pattern were minimally up-regulated at 2 h post IR but showed substantial down-regulation at 6 h and even greater down-regulation at 24 h. Many genes that are maximally expressed in S phase were in this pattern, including CDC6, FEN1, MSH6, ORC6L, PCNA, POLG, RBM14, CCNE2 and TOP3A. The changes in expression of these genes were coincidental with changes in the S phase compartment of the cell cycle. S phase cells were increased over control at 2 and 6 h post-UV (data not shown), but were moderately reduced relative to control at 6 h and markedly reduced at 24 h post-IR <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
            <p>Patterns 10 and 11 contained 42 genes including the p53-target genes GDF15, BTG2, PLK2, BAK1, PLK3 and CDKN1A (Pattern 10) and TP53INP1, SESN1, DDB2 and FDXR (Pattern 11). All of these genes were up-regulated at 2 h after UV treatment, they peaked at 6 h and then returned to baseline at 24 h. However, they responded differently to IR. In Pattern 10, the genes peaked at 2 h post IR then decreased to a stable level at 6 h and 24 h. The genes in Pattern 11 displayed continuous increases in expression until 24 h after IR.</p>
            <p>Patterns 13 to 15 contained 997 genes in total that were mainly regulated at the late time point after treatment (24 h). More than 500 genes in Pattern 14 were down-regulated at 24 h post IR or UV treatment. This pattern included many cell cycle-regulated genes functioning in the G1/S and G2/M transitions, such as CDK2, CDC2 and MCM2. These genes were strongly down-regulated in IR-treated cells, but had substantially less change in UV-treated cells. Both Patterns 13 and 15 show late up-regulated responses, but genes in former were up-regulated after treatment with both UV and IR while those in the latter only responded to IR. The top biological process in Pattern 13 was complement activation with lysosome and lytic vacuole being the main cellular components that the products of the genes in the patterns might be associated with or located in. Finally, many genes exhibited co-expression in Patterns 16 (115 genes), 17 (61 genes) and 18 (56 genes) but UV and IR treatments induced opposite changes in their gene expression.</p>
            <p>As a comparison, we also applied CLICK to the 2726 genes identified as differentially expressed using the SNR (> 3) and magnitude of expression (> 0.5) criteria. This set of genes included all 2661 co-expressed genes identified by EPIG. In this case, CLICK clustered the gene expression data into 11 groups (in Figure S4 in Additional file <supplr sid="S1">1</supplr>). Similar patterns from the centroids of the clusters were revealed in comparison to the ones extracted by EPIG. However, CLICK was unable to reveal some patterns extracted by EPIG which we consider to have biological importance related to DNA repair in responding to UV and IR treatments. Among them are EPIG's Patterns 9, 10 and 11 (see Figure <figr fid="F4">4</figr>). As presented above, many of the genes in Pattern 9 were S phase-related and the majority of genes in Patterns 10 and 11 were known to be related to p53-dependent cell cycle control.</p>
         </sec>
         <sec>
            <st>
               <p>Homogeneity evaluation of the patterns/clusters of gene expression profiles</p>
            </st>
            <p>It is vital that the extracted patterns and their associated genes can be inspected for their biological meaningfulness as presented above. On the other hand, it is also essential that that the extracted patterns or clusters of genes formed be validated objectively using some evaluation indicies. Although there were a number of methods developed to validate clusters, each of them is suitable to some specific applications. For example, one may use the General Silhouette (GS) to measure the stability of the clustering structure <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The higher GS score indicates better formed clusters. But the GS measure seems to work best for cases where the number of clusters are small <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. Given the above simulated data, we obtained GS scores of 0.77 and 0.71 for the pattern from EPIG and the clusters from CLICK, respectively. However, when we applied the GS measure to either of the above dauer recovery and L1 starvation or UV and IR DNA damage data sets, the GS scores was deemed inappropriate to judge cluster validity due to the higher numbers of extracted clusters. Other methods, for example, the Gap statistic <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, validate clusters depending on the number of groups in the data as determined a priori. However, both EPIG and CLICK are unsupervised approaches. Therefore, to objectively and appropriately compare the two methods, we calculated the overall average homogeneity <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> within clusters (<it>i.e. </it>the pattern-categorized gene expression profiles in the case of EPIG) and the averaged correlations between clusters/extracted patterns. The overall average homogeneity measures the amount of cohesion within a cluster/pattern whereas the averaged correlation measures the amount of separation between clusters/patterns <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The results are listed in Table <tblr tid="T4">4</tblr>. As can be seen, in each of the cases of the three data sets, the overall average homogeneities from EPIG are consistently higher than those from CLICK and CAST (with affinity threshold set to 0.7, 0.8, or 0.85) suggesting that EPIG has better compactness of the profiles in the patterns than the clusters of the genes generated by the other clustering algorithms. The averaged correlations for all the methods were low (&lt;= +/- 0.35), indicating that the expression of the genes in the patterns were quite dissimilar each other. However, it is clear from this result that CLICK performed better than EPIG and CAST with respect to the between cluster correlations (i.e., CLICK had a lower average correlation between clusters/extracted patterns than EPIG and CAST did).</p>
            <p>Another observation is that the numbers of clusters generated by CLICK were always lower than the number of patterns extracted by EPIG and the number of clusters generated by CAST was about 8 (on average and not including singletons) when different affinity threshold values were used for analysis. However, if we use the whole data set as the input data for CLICK, as what was done for the analysis of the data with EPIG, CLICK produces over 60 clusters in both dauer recovery/L1 starvation and UV/IR DNA damage data sets. A Figure of merit (FOM) <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> analysis of the either data set showed that for cases when the number of clusters were larger than 20, the adjusted FOM values slowly decreased. This suggests that the number of clusters of genes in the data set could be between 5 and 20, well below the 60 clusters produced by CLICK when using the entire gene expression data set (Figure S5 in Additional file <supplr sid="S1">1</supplr>).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>With a pair-wise calculation of the Pearson's correlation coefficient <it>r</it>, co-expressed profiles form discrete local clusters or mountains in a correlation topomap <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The probability is less than 10<sup>-12 </sup>that two arbitrary and independently generated data sets of size 48, e.g. the joined UV and IR data set shown in Figure <figr fid="F4">4</figr>, are correlated with an <it>r</it>-value greater than 0.8 <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. When there exist tens of thousands of gene expression profiles, many of them, which are inconsistently expressed among replicate groups or intra-groups, may appear to be similar by chance and form a correlation local cluster due to stochastic noise. Unlike factor analysis methods, such as ICA, or clustering methods, such as CLICK, <it>K</it>-means or SOM, our approach called <ul>E</ul>xtracting <ul>P</ul>atterns and <ul>I</ul>dentifying co-expressed <ul>G</ul>enes (EPIG) not only calculates the similarity among the profiles, but also evaluates each profile via signal-to-noise (S/N) ratio measurements (Equations 1 and 3). Through a filtering procedure, EPIG removes profiles that don't fit into a pattern. Only the profiles with high S/N ratios and desirable magnitudes of expression change are included in the formation of patterns representing co-expressed genes. With such a profile evaluation strategy, EPIG is able to extract patterns of co-expressed genes without predefined seeding.</p>
         <p>In a head-to-head comparison, EPIG competed with CLICK and CAST in the analysis of a simulated data set by 1) extracting all of the designated patterns, 2) accurately categorizing the profiles to their appropriate patterns, and 3) generating patterns of profiles with higher homogeneity and more stability (Table <tblr tid="T4">4</tblr>). However, it is clear that CLICK outperformed EPIG and CAST in terms of generating clusters/extracted patterns that are more dissimilar to each other (i.e., they have a lower average correlation between clusters/patterns). Furthermore, given the two experimental data sets presented above (one from the public domain), EPIG extracted more patterns of gene expression than CLICK (Tables <tblr tid="T2">2</tblr> and <tblr tid="T4">4</tblr>). The patterns extracted by EPIG which were not represented by any of the cluster centroids generated by CLICK contained genes which related to key biological responses coupled to the experimental treatments. For example, in the case of UV and IR DNA damage, the patterns extracted by EPIG contained p53 cell cycle control target genes (in Patterns 10 and 11) and many S phase genes (in Pattern 9) of the mitotic cell cycle (Figure <figr fid="F4">4</figr>).</p>
         <tbl id="T4">
            <title>
               <p>Table 4</p>
            </title>
            <caption>
               <p>Homogeneity of gene expression profiles within patterns and correlations between patterns</p>
            </caption>
            <tblbdy cols="5">
               <r>
                  <c ca="left">
                     <p>Data</p>
                  </c>
                  <c ca="left">
                     <p>Algorithm</p>
                  </c>
                  <c ca="left">
                     <p>Number of patterns</p>
                  </c>
                  <c ca="left">
                     <p>Overall average homogeneity within patterns</p>
                  </c>
                  <c ca="left">
                     <p>Averaged correlation between patterns</p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>The simulated data</p>
                  </c>
                  <c ca="left">
                     <p>EPIG</p>
                  </c>
                  <c ca="left">
                     <p>5</p>
                  </c>
                  <c ca="left">
                     <p>0.95</p>
                  </c>
                  <c ca="left">
                     <p>0.29</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>CLICK</p>
                  </c>
                  <c ca="left">
                     <p>3</p>
                  </c>
                  <c ca="left">
                     <p>0.72</p>
                  </c>
                  <c ca="left">
                     <p>-0.35</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>CAST*</p>
                  </c>
                  <c ca="left">
                     <p>8</p>
                  </c>
                  <c ca="left">
                     <p>0.30</p>
                  </c>
                  <c ca="left">
                     <p>-0.08</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Dauer recovery and L1 starvation data</p>
                  </c>
                  <c ca="left">
                     <p>EPIG</p>
                  </c>
                  <c ca="left">
                     <p>18</p>
                  </c>
                  <c ca="left">
                     <p>0.83</p>
                  </c>
                  <c ca="left">
                     <p>-0.03</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>CLICK</p>
                  </c>
                  <c ca="left">
                     <p>5</p>
                  </c>
                  <c ca="left">
                     <p>0.74</p>
                  </c>
                  <c ca="left">
                     <p>-0.15</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>UV and IR DNA damage data</p>
                  </c>
                  <c ca="left">
                     <p>EPIG</p>
                  </c>
                  <c ca="left">
                     <p>18</p>
                  </c>
                  <c ca="left">
                     <p>0.84</p>
                  </c>
                  <c ca="left">
                     <p>-0.02</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>CLICK</p>
                  </c>
                  <c ca="left">
                     <p>11</p>
                  </c>
                  <c ca="left">
                     <p>0.78</p>
                  </c>
                  <c ca="left">
                     <p>-0.08</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>* Results generated with an affinity threshold value of 0.8. Two singleton profiles were not reported or included in the analysis of the homogeneity and correlation measures.</p>
            </tblfn>
         </tbl>
         <p>There are two main thresholds used in EPIG pattern extraction: the local cluster size threshold M<sub><it>t </it></sub>and the correlation threshold R<sub><it>t</it></sub>. R<sub><it>t </it></sub>determines the closeness in similarity that is allowed among the extracted patterns. Depending on the sample size, one may determine R<sub><it>t </it></sub>such that the most similar patterns possess clear response differences. For example, in Figure <figr fid="F4">4</figr>, Patterns 5 and 6 have a correlation <it>r</it>-value of 0.77. But the two contain genes with expression patterns that display a clear difference in the response to UV-induced DNA damage. In Pattern 5, gene expression was repressed only at 2 h post-UV while in Pattern 6, gene expression was repressed at both 2 and 6 h post-UV. M<sub><it>t </it></sub>is the minimum number of the genes in a local cluster needed to have a profile candidate deemed as a pattern. The value of M<sub><it>t </it></sub>affects the pattern extraction outcome. Higher M<sub><it>t </it></sub>values may cause a meaningful pattern with a lower number of co-expressed genes to be concealed. On the other hand, lower M<sub><it>t </it></sub>values may lead to the extraction of some patterns lacking biological meaningfulness. To test for an optimal M<sub><it>t </it></sub>setting, we varied its values from 2 to 19 and performed EPIG analysis on the IR-treated gene expression data. Figure <figr fid="F6">6</figr> shows that the average Pattern SNR increased with the increase of M<sub><it>t</it></sub>, while the number of extracted patterns decreased. This result is seems plausible considering the observation that as M<sub><it>t </it></sub>increases, more correlation local clusters are filtered-out since their cluster sizes are less than M<sub><it>t</it></sub>. The fewer extracted patterns then have higher averaged SNRs. However, when M<sub><it>t </it></sub>&#8805; 6, the SNR had an up-shift and the number of extracted patterns had a down-shift. This result prompted us to set M<sub><it>t </it></sub>to 6 in the given data set. To be precise, one should vary these thresholds empirically for a given data set to examine the outcomes. We have done just that and have concluded that, upon many sets of the gene expression data analyzed by using EPIG, selections of M<sub><it>t </it></sub>at 6 and R<sub><it>t </it></sub>at 0.8 have worked reasonably well (data not shown). Incidentally, there may be some genes with profiles not similar to any other gene(s) or their related local cluster had a size less than M<sub><it>t</it></sub>. Then these "orphan" genes (singletons) will not be considered as a pattern candidate nor will they be categorized to any extracted patterns. Attention certainly needs to be paid to these orphan genes, as a part of the EPIG analysis result, to determine if they have a unique role in the treatment response.</p>
         <fig id="F6">
            <title>
               <p>Figure 6</p>
            </title>
            <caption>
               <p>Optimization of the M<sub>t </sub>value</p>
            </caption>
            <text>
               <p><b>Optimization of the M<sub>t </sub>value</b>. Cluster size threshold M<sub>t </sub>(the horizontal axis) verses average of patterns' SNR (A) and number of extracted patterns (B).</p>
            </text>
            <graphic file="1471-2105-8-427-6"/>
         </fig>
         <p>EPIG is a general method for gene expression analysis when the data consists of profiles with multiple inter-groups and multiple samples intra-groups. Each intra-group has a specific biologically relevant factor. The inter groups account for the factor variations. For example, in the IR time- series data set, since the intra-group included both biological and technical replicates, the common response features among different cell lines were identified. On the other hand, if the intra-group included only the technical replicates, then one would reasonably expect to extract patterns representing idiosyncratic responses in individual cell lines. The responses to DNA damage that are common among biological individuals are intriguing because they are conserved, but individual-specific responses also are of interest as they point to inter-individual variations in response to external perturbations.</p>
         <p>The application to the joined IR and UV data set had eight inter-groups, four each (i.e. sham, 2 h, 6 h and 24 h post-treatment) to IR and UV respectively. In this case, similar and dissimilar responses between IR- and UV- induced DNA damage can be clearly observed (Figure <figr fid="F4">4</figr>). For example, UV-specific response patterns included genes functioning in transcription regulation, RNA processing, nucleotide binding, and cell growth (Patterns 1 through 8 in Figure <figr fid="F4">4</figr> and Table <tblr tid="T3">3</tblr>). The over-represented categories of Gene Ontology from the 616 genes in Pattern 8 included purine nucleotide binding, protein modification, ubiquitin cycle, kinase activity, and cell growth. It appears that protein kinases may be generally down-regulated specifically via phosphorylation in response to UV-induced DNA damage <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. Two early response Patterns 10 and 11 in Figure <figr fid="F4">4</figr> showed that both UV and IR caused these genes to be up-regulated, but in different ways. Many of the genes in these two patterns have been widely studied and known to be related to p53-dependent cell cycle control. The two different patterns of response to IR imply that factors other than p53 also influence the expression of p53-target genes. Pattern 14 in Figure <figr fid="F4">4</figr> showed similar late down-regulation responses to both UV and IR treatments. There were 563 genes in Pattern 14 participating in a number of important biological processes among them were mitotic cell cycle, DNA replication, DNA repair, cell cycle checkpoint, and G<sub>0</sub>-like status transition <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
         <p>In general, the inter-group-related factors are not limited to the time variable only. As a matter of fact, EPIG has been applied to many different data sets, where the variable factors include time, treatments (such as chemicals, radiation, knock-out), doses, organs (such as blood, liver, kidney), or organ sections (such as left or right lobe in liver). As such, EPIG is a robust, flexible and new pattern extraction method which is generally applicable to a variety of microarray data sets.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>EPIG competed with CLICK and performed better than CAST in extracting patterns from simulated data. However, EPIG extracted more biological informative patterns and co-expressed genes from real biological data: both C. elegans and IR/UV-treated human fibroblasts. Using Gene Ontology analysis of the genes in the patterns extracted by EPIG, several key biological categories related to p53-dependent cell cycle control were revealed from the IR/UV data. Among them were mitotic cell cycle, DNA replication, DNA repair, cell cycle checkpoint, and G<sub>0</sub>-like status transition. The extraction of these biologically responsive processes by EPIG provides a deeper understanding of the underlying biological mechanism(s) in the perturbed system.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Simulated data</p>
            </st>
            <p>A data set comprised of numeric data with 90 profiles and 24 objects (four inter- groups <it>i </it>(<it>i </it>= 1,...,4) and six intra-groups <it>j </it>(<it>j </it>= 1,...,6)) was simulated from six distinct probability distributions. Normal deviates were drawn at random to generate 15 profiles for each of the six distributions. Table <tblr tid="T1">1</tblr> lists the mean value distributions used in the simulation of the data where the standard deviation was set to be constant at 0.4.</p>
         </sec>
         <sec>
            <st>
               <p>Microarray data</p>
            </st>
            <p>Microarray gene expression data was acquired from three telomerized normal human fibroblast lines (logarithmically growing), F1-hTERT, F3-hTERT and F10-hTERT<abbrgrp><abbr bid=" B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp> that were sham-treated or treated with 1.5 Gy ionizing radiation (IR) or 6 J/m<sup>2 </sup>ultraviolet (UV [from a 254 nm radiation source]) and harvested at 2, 6 or 24 h after the treatment. Briefly, total RNA isolated from cells harvested at each time point was subjected to microarray analysis by using Agilent 22,000 element human 1A arrays. The labelled cRNA from sham- or radiation-treated samples was hybridized to the microarray along with a labelled global reference cRNA. Hybridization of sample RNA against reference RNA was performed with a dye swap (cytofluor reversal).</p>
         </sec>
         <sec>
            <st>
               <p>Microarray data pre-processing</p>
            </st>
            <p>The extracted intensity data were pre-processed by array-based Systematic Variation Normalization (SVN) <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, profile-based dye-swap correction to remove dye labelling affects, and to align biological reference states. The latter was done in such a way that the sham-treated samples were used as a reference state and the averaged sham-treated state of the dye-swapped pair was aligned to zero as a baseline. All other treated states were then adjusted by the same amount accordingly. It is plausible that each of the three human cell lines used for analysis may not have the same sham state. The alignment of the sham states aims at eliminating the biological variability and focuses the analysis on the relative changes upon IR or UV treatments. In this case, the 2-dimensional matrix of compiled gene expression data consisted of 16,757 rows of unique genes (grouped by UniGene) and 48 columns consisting of, for each of the three cell lines, dye-swapped replicates of IR-sham-treatment and 2 h, 6 h and 24 h after IR, UV-sham-treatment and 2 h, 6 h and 24 h after UV.</p>
         </sec>
         <sec>
            <st>
               <p>EPIG</p>
            </st>
            <p>A compiled microarray gene expression data set (in this study, as conventionally presented, the log<sub>2 </sub>pixel intensity ratio values) consists of a 2-dimensional matrix, in which each row represents a gene expression profile and each column represents an array. Upon sample perturbation or variation in biological factors, such as agent, dose, time or tissue, a gene expression profile can be made up of inter-group and intra-group samples. The arrays in an intra-group sample have a factor in common, <it>e.g. </it>biological replicates. The arrays in inter-group samples possess different factors, e.g., sham-treatment and time points post-UV or IR treatment. We denote each datum of log<sub>2 </sub>ratio as g<sub><it>ij </it></sub>in a gene expression profile, where <it>i </it>refers to a inter-group index from 1 to <it>m</it>, <it>j </it>is the intra-group index from 1 to n<sub><it>i</it></sub>, m is the number of inter-groups and n<sub><it>i </it></sub>is the number of arrays in <it>i</it><sup>th </sup>inter-group. To evaluate such a profile, we calculate each intra-group average <inline-formula><m:math name="1471-2105-8-427-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>g</m:mi><m:mo>&#175;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGafm4zaCMbaebadaWgaaWcbaGaemyAaKgabeaaaaa@2EC9@</m:annotation></m:semantics></m:math></inline-formula> and sample variance <inline-formula><m:math name="1471-2105-8-427-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mn>2</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaem4Cam3aa0baaSqaaiabdMgaPbqaaiabikdaYaaaaaa@2FBC@</m:annotation></m:semantics></m:math></inline-formula>. We define a gene expression profile's signal as</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1471-2105-8-427-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>S</m:mi>
                           <m:mo>=</m:mo>
                           <m:mrow>
                              <m:mo>{</m:mo>
                              <m:mrow>
                                 <m:mtable columnalign="left">
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mi>max</m:mi>
                                             <m:mo>&#8289;</m:mo>
                                             <m:mrow>
                                                <m:mo>{</m:mo>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mover accent="true">
                                                         <m:mi>g</m:mi>
                                                         <m:mo>&#175;</m:mo>
                                                      </m:mover>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mo>}</m:mo>
                                             </m:mrow>
                                             <m:mo>,</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mi>f</m:mi>
                                             <m:mi>min</m:mi>
                                             <m:mo>&#8289;</m:mo>
                                             <m:mrow>
                                                <m:mo>{</m:mo>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mover accent="true">
                                                         <m:mi>g</m:mi>
                                                         <m:mo>&#175;</m:mo>
                                                      </m:mover>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mo>}</m:mo>
                                             </m:mrow>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mo>></m:mo>
                                             <m:mn>0</m:mn>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>min</m:mi>
                                             <m:mo>&#8289;</m:mo>
                                             <m:mrow>
                                                <m:mo>{</m:mo>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mover accent="true">
                                                         <m:mi>g</m:mi>
                                                         <m:mo>&#175;</m:mo>
                                                      </m:mover>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mo>}</m:mo>
                                             </m:mrow>
                                             <m:mo>,</m:mo>
                                             <m:mi>e</m:mi>
                                             <m:mi>l</m:mi>
                                             <m:mi>s</m:mi>
                                             <m:mi>e</m:mi>
                                             <m:mi>i</m:mi>
                                             <m:mi>f</m:mi>
                                             <m:mi>max</m:mi>
                                             <m:mo>&#8289;</m:mo>
                                             <m:mrow>
                                                <m:mo>{</m:mo>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mover accent="true">
                                                         <m:mi>g</m:mi>
                                                         <m:mo>&#175;</m:mo>
                                                      </m:mover>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mo>}</m:mo>
                                             </m:mrow>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mo>&lt;</m:mo>
                                             <m:mn>0</m:mn>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mi>max</m:mi>
                                             <m:mo>&#8289;</m:mo>
                                             <m:mrow>
                                                <m:mo>{</m:mo>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mover accent="true">
                                                         <m:mi>g</m:mi>
                                                         <m:mo>&#175;</m:mo>
                                                      </m:mover>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mo>}</m:mo>
                                             </m:mrow>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>min</m:mi>
                                             <m:mo>&#8289;</m:mo>
                                             <m:mrow>
                                                <m:mo>{</m:mo>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mover accent="true">
                                                         <m:mi>g</m:mi>
                                                         <m:mo>&#175;</m:mo>
                                                      </m:mover>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mo>}</m:mo>
                                             </m:mrow>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mi>o</m:mi>
                                             <m:mi>t</m:mi>
                                             <m:mi>h</m:mi>
                                             <m:mi>e</m:mi>
                                             <m:mi>r</m:mi>
                                             <m:mi>w</m:mi>
                                             <m:mi>i</m:mi>
                                             <m:mi>s</m:mi>
                                             <m:mi>e</m:mi>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                 </m:mtable>
                              </m:mrow>
                           </m:mrow>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaeyypa0ZaaiqaaeaafaqaaeWacaaabaGagiyBa0MaeiyyaeMaeiiEaG3aaiWaaeaacuWGNbWzgaqeamaaBaaaleaacqWGPbqAaeqaaaGccaGL7bGaayzFaaGaeiilaWIaemyAaKMaemOzayMagiyBa0MaeiyAaKMaeiOBa42aaiWaaeaacuWGNbWzgaqeamaaBaaaleaacqWGPbqAaeqaaaGccaGL7bGaayzFaaaabaGaeyOpa4JaeGimaadabaGaeyOeI0IagiyBa0MaeiyAaKMaeiOBa42aaiWaaeaacuWGNbWzgaqeamaaBaaaleaacqWGPbqAaeqaaaGccaGL7bGaayzFaaGaeiilaWIaemyzauMaemiBaWMaem4CamNaemyzauMaemyAaKMaemOzayMagiyBa0MaeiyyaeMaeiiEaG3aaiWaaeaacuWGNbWzgaqeamaaBaaaleaacqWGPbqAaeqaaaGccaGL7bGaayzFaaaabaGaeyipaWJaeGimaadabaGagiyBa0MaeiyyaeMaeiiEaG3aaiWaaeaacuWGNbWzgaqeamaaBaaaleaacqWGPbqAaeqaaaGccaGL7bGaayzFaaGaeyOeI0IagiyBa0MaeiyAaKMaeiOBa42aaiWaaeaacuWGNbWzgaqeamaaBaaaleaacqWGPbqAaeqaaaGccaGL7bGaayzFaaaabaGaem4Ba8MaemiDaqNaemiAaGMaemyzauMaemOCaiNaem4DaCNaemyAaKMaem4CamNaemyzaugaaaGaay5EaaGaeiOla4caaa@8741@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where 1 &#8804; <it>i </it>&#8804; <it>m</it>. We define a profile's noise estimate as the square-root of the pooled variance, i.e.</p>
            <p>
               <display-formula id="M2">
                  <m:math name="1471-2105-8-427-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>N</m:mi>
                           <m:mo>=</m:mo>
                           <m:msqrt>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mi>m</m:mi>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:mrow>
                                                <m:mo>[</m:mo>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:msub>
                                                      <m:mi>n</m:mi>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mn>1</m:mn>
                                                   <m:mo stretchy="false">)</m:mo>
                                                   <m:mo>&#8901;</m:mo>
                                                   <m:msubsup>
                                                      <m:mi>s</m:mi>
                                                      <m:mi>i</m:mi>
                                                      <m:mn>2</m:mn>
                                                   </m:msubsup>
                                                </m:mrow>
                                                <m:mo>]</m:mo>
                                             </m:mrow>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mi>m</m:mi>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mi>n</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mn>1</m:mn>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mfrac>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mi>i</m:mi>
                                       <m:mi>m</m:mi>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mn>1</m:mn>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>n</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                           </m:msqrt>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOta4Kaeyypa0ZaaOaaaeaajuaGdaWcaaqaamaaqahabaWaamWaaeaacqGGOaakcqWGUbGBdaWgaaqaaiabdMgaPbqabaGaeyOeI0IaeGymaeJaeiykaKIaeyyXICTaem4Cam3aa0baaeaacqWGPbqAaeaacqaIYaGmaaaacaGLBbGaayzxaaaabaGaemyAaKgabaGaemyBa0gacqGHris5aaqaamaaqahabaGaeiikaGIaemOBa42aaSbaaeaacqWGPbqAaeqaaiabgkHiTiabigdaXiabcMcaPaqaaiabdMgaPbqaaiabd2gaTbGaeyyeIuoaaaGcdaaeWbqaamaalaaabaGaeGymaedabaGaemOBa42aaSbaaSqaaiabdMgaPbqabaaaaaqaaiabdMgaPbqaaiabd2gaTbqdcqGHris5aaWcbeaaaaa@56BA@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where the sample variance</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-8-427-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>s</m:mi>
                              <m:mi>i</m:mi>
                              <m:mn>2</m:mn>
                           </m:msubsup>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mi>j</m:mi>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>n</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                       </m:mrow>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mi>g</m:mi>
                                                <m:mrow>
                                                   <m:mi>i</m:mi>
                                                   <m:mi>j</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mover accent="true">
                                                   <m:mi>g</m:mi>
                                                   <m:mo>&#175;</m:mo>
                                                </m:mover>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>n</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaem4Cam3aa0baaSqaaiabdMgaPbqaaiabikdaYaaakiabg2da9KqbaoaalaaabaWaaabCaeaacqGGOaakcqWGNbWzdaWgaaqaaiabdMgaPjabdQgaQbqabaGaeyOeI0Iafm4zaCMbaebadaWgaaqaaiabdMgaPbqabaGaeiykaKYaaWbaaeqabaGaeGOmaidaaaqaaiabdQgaQbqaaiabd6gaUnaaBaaabaGaemyAaKgabeaaaiabggHiLdaabaGaemOBa42aaSbaaeaacqWGPbqAaeqaaiabgkHiTiabigdaXaaakiabc6caUaaa@489A@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>From Equations 1 and 2, we define a profile's signal-to-noise ratio as</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1471-2105-8-427-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>S</m:mi>
                           <m:mi>N</m:mi>
                           <m:mi>R</m:mi>
                           <m:mo>=</m:mo>
                           <m:mfrac bevelled="true">
                              <m:mi>S</m:mi>
                              <m:mi>N</m:mi>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemOta4KaemOuaiLaeyypa0tcfa4aaSGaaeaacqWGtbWuaeaacqWGobGtaaaaaa@339C@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>As can be seen, when <it>m </it>= 1, Equation 3 is equivalent to a two sample <it>t</it>-test, since by default the log<sub>2 </sub>pixel intensity ratio is the treated against its control. Equation 3 includes the case for <it>m </it>> 1, i.e. multiple inter-groups.</p>
            <p>In extracting gene expression patterns, EPIG uses a filtering process where all profiles initially are considered as pattern candidates. The pseudo code for the algorithm can be found in the Appendix. Briefly, using all pair-wise correlations, any candidate profile, whose local cluster size is less than a predefined size M<sub><it>t </it></sub>or its correlation with another profile is higher (> R<sub><it>t</it></sub>) but has a lower local cluster size M, is removed from pattern construction consideration. Among the remaining profiles, EPIG then creates representative profiles for the corresponding local clusters and removes those profiles with a SNR in Equation 3 less than 3 or magnitude S in Equation 1 less than 0.5. After this filtering processing, the remaining profiles consist of the extracted patterns, which are used to be the representatives to each of the local clusters. Each of the patterns has the highest local cluster size in comparison with other highly similar profiles (e.g. correlation larger than 0.8) in the same local cluster.</p>
            <p>Subsequently, EPIG categorizes each gene to the pattern, for which it has the highest correlation with the gene profile. A gene not assigned to any extracted patterns is considered an "orphan" if its highest correlation <it>r</it>-value is lower that a given threshold R<sub><it>c</it></sub>. Typically R<sub><it>c </it></sub>is set to a value which corresponds to a correlation <it>p</it>-value of 10<sup>-4 </sup>to assure the significance of the co-expression. A Java-based software tool and the source code for EPIG are publicly available <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>CLICK</p>
            </st>
            <p>CLuster Identification via Connectivity Kernels (CLICK) analysis of the gene expression data was performed using version 2 of the EXpression ANalyzer and DisplayER (EXPANDER) analysis and visualization tool <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. The default settings for CLICK were used for all analyses.</p>
         </sec>
         <sec>
            <st>
               <p>CAST</p>
            </st>
            <p>The Cluster Affinity Search Technique (CAST) for clustering data uses average similarity (affinity) between gene expression patterns and cluster cores (the current ones in the recursive portion of the algorithm) and then adds (and removes) elements from the current core one at a time <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. An affinity threshold is used to specify the cluster quality &#8211; it influences the number and the size of the clusters that are produced. The CAST implementation used for analysis of the data in this paper was based on a Java applet source code <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Appendix</p>
         </st>
         <p>EPIG algorithm</p>
         <p>Set thresholds R<sub><it>t </it></sub>to 0.8, M<sub><it>t </it></sub>to 6, SNR<sub><it>t </it></sub>to 3, and S<sub><it>t </it></sub>to 0.5</p>
         <p>FOR each profile <it>i</it></p>
         <p>&#160;&#160;&#160;M<sub><it>i </it></sub>= 0</p>
         <p>&#160;&#160;&#160;FOR each profile <it>j</it>, <it>j </it>&lt;<it>i</it></p>
         <p>&#160;&#160;&#160;&#160;&#160;&#160;M<sub><it>j </it></sub>= 0</p>
         <p>&#160;&#160;&#160;&#160;&#160;&#160;IF correlation <it>r</it>-value r<sub><it>ij </it></sub>> R<sub><it>t </it></sub>THEN M<sub><it>i</it>++</sub>, M<sub><it>j</it>++</sub></p>
         <p>&#160;&#160;&#160;END FOR</p>
         <p>END FOR</p>
         <p>REMOVE profile <it>i </it>IF M<sub><it>i </it></sub>&lt; M<sub><it>t</it></sub></p>
         <p>SORTING profiles DESCENDING ACCORDING TO M<sub><it>i</it></sub></p>
         <p>FOR each profile <it>i</it></p>
         <p>&#160;&#160;&#160;SORTING profiles j DESCENDING ACCORDING TO r<sub><it>ij</it></sub></p>
         <p>&#160;&#160;&#160;REMOVE profile <it>j </it>IF r<sub><it>ij </it></sub>> R<sub><it>t</it></sub></p>
         <p>END FOR</p>
         <p>FOR each remaining profile <it>i</it></p>
         <p>&#160;&#160;&#160;REPLACE profile <it>i </it>WITH a profile which is an average of its top 5 profiles (out of all profiles) having the highest r value</p>
         <p>&#160;&#160;&#160;REMOVE profile <it>i </it>IF its SNR &lt; SNR<sub><it>t </it></sub>OR S &lt; S<sub><it>t</it></sub></p>
         <p>END FOR</p>
         <p>RETURN remaining profiles as the extracted patterns</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JWC conceived of the concept of the research and methodology for analysis, performed the analyses of the data, developed the algorithm and wrote part of the manuscript. PRB provided the valuable suggestions for the concept of the research, for all aspects of the methodology and also wrote part of the manuscript. TZ and WKK provided the UV/IR data and offered helpful suggestions for the biological themes related to the extracted patterns. RSP provided useful suggestions related to the utility of the EPIG algorithm and software. All authors read and approved of the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Dr. Dennis Simpson and Ms. Yingchun Zhou for sample preparation. We also thank Dr. John Tomfohr for critical review of this manuscript and Drs. Alexandra Heinloth, Todd Auman and Shyama Peddada and Mr. Jianying Li for helpful discussions. This research was supported [in part] by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>DNA arrays for analysis of gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Methods Enzymol</source>
            <pubdate>1999</pubdate>
            <volume>303</volume>
            <fpage>179</fpage>
            <lpage>205</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10349646</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Cluster analysis and display of genome-wide expression patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <issue>25</issue>
            <fpage>14863</fpage>
            <lpage>14868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">24541</pubid>
                  <pubid idtype="pmpid" link="fulltext">9843981</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.25.14863</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Systematic determination of genetic network architecture</p>
            </title>
            <aug>
               <au>
                  <snm>Tavazoie</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Cho</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>1999</pubdate>
            <volume>22</volume>
            <issue>3</issue>
            <fpage>281</fpage>
            <lpage>285</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/10343</pubid>
                  <pubid idtype="pmpid" link="fulltext">10391217</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation</p>
            </title>
            <aug>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Kitareewan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dmitrovsky</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <issue>6</issue>
            <fpage>2907</fpage>
            <lpage>2912</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">15868</pubid>
                  <pubid idtype="pmpid" link="fulltext">10077610</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.6.2907</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Clustering gene expression patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Ben-Dor</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yakhini</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>1999</pubdate>
            <volume>6</volume>
            <issue>3-4</issue>
            <fpage>281</fpage>
            <lpage>297</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652799318274</pubid>
                  <pubid idtype="pmpid" link="fulltext">10582567</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Finding groups in data: an introduction to cluster analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Rousseeuw</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <publisher>NY , John Wiley and Sons, Inc.</publisher>
            <pubdate>1990</pubdate>
         </bibl>
         <bibl id="B7">
            <title>
               <p>CLICK and EXPANDER: a system for clustering and visualizing gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Sharan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Maron-Katz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>14</issue>
            <fpage>1787</fpage>
            <lpage>1799</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg232</pubid>
                  <pubid idtype="pmpid" link="fulltext">14512350</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>CLICK: a clustering algorithm with applications to gene expression analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Sharan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proc Int Conf Intell Syst Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>8</volume>
            <fpage>307</fpage>
            <lpage>316</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10977092</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Evaluation and comparison of gene clustering methods in microarray analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Thalamuthu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mukhopadhyay</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Tseng</snm>
                  <fnm>GC</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>19</issue>
            <fpage>2405</fpage>
            <lpage>2412</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl406</pubid>
                  <pubid idtype="pmpid" link="fulltext">16882653</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Computational cluster validation in post-genomic data analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Handl</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Knowles</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kell</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>15</issue>
            <fpage>3201</fpage>
            <lpage>3212</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti517</pubid>
                  <pubid idtype="pmpid" link="fulltext">15914541</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes</p>
            </title>
            <aug>
               <au>
                  <snm>Mootha</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Lindgren</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Eriksson</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sihag</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lehar</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Puigserver</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Carlsson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ridderstrale</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Laurila</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Houstis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Daly</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Patterson</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Spiegelman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Hirschhorn</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Altshuler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Groop</snm>
                  <fnm>LC</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2003</pubdate>
            <volume>34</volume>
            <issue>3</issue>
            <fpage>267</fpage>
            <lpage>273</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1180</pubid>
                  <pubid idtype="pmpid" link="fulltext">12808457</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference</p>
            </title>
            <aug>
               <au>
                  <snm>Peddada</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Lobenhofer</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Afshari</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Weinberg</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Umbach</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>7</issue>
            <fpage>834</fpage>
            <lpage>841</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg093</pubid>
                  <pubid idtype="pmpid" link="fulltext">12724293</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Algorithms for clustering data</p>
            </title>
            <aug>
               <au>
                  <snm>Jain</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Dubes</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <publisher>Englewood Cliffs, NJ , Prentice Hall College Div</publisher>
            <pubdate>1998</pubdate>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Linear modes of gene expression determined by independent component analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Liebermeister</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>1</issue>
            <fpage>51</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.1.51</pubid>
                  <pubid idtype="pmpid" link="fulltext">11836211</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Independent component analysis of microarray data in the study of endometrial cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Saidi</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Holland</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Kreil</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>MacKay</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Charnock-Jones</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Print</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>Oncogene</source>
            <pubdate>2004</pubdate>
            <volume>23</volume>
            <issue>39</issue>
            <fpage>6677</fpage>
            <lpage>6683</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.onc.1207562</pubid>
                  <pubid idtype="pmpid" link="fulltext">15247901</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Multi-class cancer classification via partial least squares with gene expression profiles</p>
            </title>
            <aug>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>DV</fnm>
               </au>
               <au>
                  <snm>Rocke</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>9</issue>
            <fpage>1216</fpage>
            <lpage>1226</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.9.1216</pubid>
                  <pubid idtype="pmpid" link="fulltext">12217913</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Global analysis of dauer gene expression in Caenorhabditis elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>2003</pubdate>
            <volume>130</volume>
            <issue>8</issue>
            <fpage>1621</fpage>
            <lpage>1634</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1242/dev.00363</pubid>
                  <pubid idtype="pmpid" link="fulltext">12620986</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>A gene expression map for Caenorhabditis elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Lund</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kiraly</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Duke</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stuart</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Eizinger</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wylie</snm>
                  <fnm>BN</fnm>
               </au>
               <au>
                  <snm>Davidson</snm>
                  <fnm>GS</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>293</volume>
            <issue>5537</issue>
            <fpage>2087</fpage>
            <lpage>2092</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1061603</pubid>
                  <pubid idtype="pmpid" link="fulltext">11557892</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Identifying biological themes within lists of genes with EASE</p>
            </title>
            <aug>
               <au>
                  <snm>Hosack</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Dennis</snm>
                  <fnm>G</fnm>
                  <suf>Jr.</suf>
               </au>
               <au>
                  <snm>Sherman</snm>
                  <fnm>BT</fnm>
               </au>
               <au>
                  <snm>Lane</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Lempicki</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>10</issue>
            <fpage>R70</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">328459</pubid>
                  <pubid idtype="pmpid" link="fulltext">14519205</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-4-10-r70</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Profiles of Global Gene Expression in Ionizing Radiation-Damaged Human Diploid Fibroblasts Reveal Synchronization behind the G1 Checkpoint in a G0-Like State Of Quiescence</p>
            </title>
            <aug>
               <au>
                  <snm>Zhou</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chou</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Mullen</snm>
                  <fnm>TE</fnm>
               </au>
               <au>
                  <snm>Medeiros</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bushel</snm>
                  <fnm>PR</fnm>
               </au>
               <au>
                  <snm>Paules</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Hurban</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Lobenhofer</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Kaufmann</snm>
                  <fnm>WK</fnm>
               </au>
            </aug>
            <source>Environmental Health Perspectives</source>
            <pubdate>2006</pubdate>
            <volume>114</volume>
            <issue>4</issue>
            <fpage>553</fpage>
            <lpage>559</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1440780</pubid>
                  <pubid idtype="pmpid" link="fulltext">16581545</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>PhD Thesis</p>
            </title>
            <aug>
               <au>
                  <snm>Bushel</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <publisher>Raleigh, NC , North Carolina State University</publisher>
            <pubdate>2005</pubdate>
            <volume>PhD</volume>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Estimating the number of clusters in a dataset via the Gap statistic</p>
            </title>
            <aug>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Walther</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Technical Report 208</source>
            <publisher> Department of Statistics, Stanford University</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Introduction to data mining</p>
            </title>
            <aug>
               <au>
                  <snm>Tan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Steinbach</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kumar</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <publisher>Boston, MA , Addison-Wesley</publisher>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Validating clustering for gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Yeung</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Haynor</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Ruzzo</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>4</issue>
            <fpage>309</fpage>
            <lpage>318</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.4.309</pubid>
                  <pubid idtype="pmpid" link="fulltext">11301299</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Probability and Statistics</p>
            </title>
            <aug>
               <au>
                  <snm>Spiegel</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Schiller</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Srinivasan</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <publisher>New York , McGraw-Hill</publisher>
            <edition>2nd ed</edition>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Cell cycle checkpoint signaling through the ATM and ATR kinases</p>
            </title>
            <aug>
               <au>
                  <snm>Abraham</snm>
                  <fnm>RT</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2001</pubdate>
            <volume>15</volume>
            <issue>17</issue>
            <fpage>2177</fpage>
            <lpage>2196</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gad.914401</pubid>
                  <pubid idtype="pmpid" link="fulltext">11544175</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Inhibition of cyclin-dependent kinase 2 by p21 is necessary for retinoblastoma protein-mediated G1 arrest after gamma-irradiation</p>
            </title>
            <aug>
               <au>
                  <snm>Brugarolas</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Moberg</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Boyd</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Taya</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Jacks</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lees</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <issue>3</issue>
            <fpage>1002</fpage>
            <lpage>1007</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">15340</pubid>
                  <pubid idtype="pmpid" link="fulltext">9927683</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.3.1002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The human decatenation checkpoint</p>
            </title>
            <aug>
               <au>
                  <snm>Deming</snm>
                  <fnm>PB</fnm>
               </au>
               <au>
                  <snm>Cistulli</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Graves</snm>
                  <fnm>PR</fnm>
               </au>
               <au>
                  <snm>Piwnica-Worms</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Paules</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Downes</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Kaufmann</snm>
                  <fnm>WK</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <issue>21</issue>
            <fpage>12044</fpage>
            <lpage>12049</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">59764</pubid>
                  <pubid idtype="pmpid" link="fulltext">11593014</pubid>
                  <pubid idtype="doi">10.1073/pnas.221430898</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>An ATR- and Chk1-dependent S checkpoint inhibits replicon initiation following UVC-induced DNA damage</p>
            </title>
            <aug>
               <au>
                  <snm>Heffernan</snm>
                  <fnm>TP</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Frank</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Heinloth</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>Paules</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Cordeiro-Stone</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kaufmann</snm>
                  <fnm>WK</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>2002</pubdate>
            <volume>22</volume>
            <issue>24</issue>
            <fpage>8552</fpage>
            <lpage>8561</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">139882</pubid>
                  <pubid idtype="pmpid" link="fulltext">12446774</pubid>
                  <pubid idtype="doi">10.1128/MCB.22.24.8552-8561.2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Systematic variation normalization in microarray data to get gene expression comparison unbiased</p>
            </title>
            <aug>
               <au>
                  <snm>Chou</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Paules</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Bushel</snm>
                  <fnm>PR</fnm>
               </au>
            </aug>
            <source>J Bioinform Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <issue>2</issue>
            <fpage>225</fpage>
            <lpage>241</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1142/S0219720005001028</pubid>
                  <pubid idtype="pmpid" link="fulltext">15852502</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>EPIG application</p>
            </title>
            <url>http://www.niehs.nih.gov/research/resources/software/epig</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>CAST application</p>
            </title>
            <url>http://acg.media.mit.edu/people/fry/clustering/source/</url>
         </bibl>
      </refgrp>
   </bm>
</art>
