<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-335</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>Combining transcriptional datasets using the generalized singular value decomposition</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Schreiber</snm>
               <mi>W</mi>
               <fnm>Andreas</fnm>
               <insr iid="I1"/>
               <email>andreas.schreiber@adelaide.edu.au</email>
            </au>
            <au id="A2">
               <snm>Shirley</snm>
               <mi>J</mi>
               <fnm>Neil</fnm>
               <insr iid="I1"/>
               <email>neil.shirley@adelaide.edu.au</email>
            </au>
            <au id="A3">
               <snm>Burton</snm>
               <mi>A</mi>
               <fnm>Rachel</fnm>
               <insr iid="I1"/>
               <email>rachel.burton@adelaide.edu.au</email>
            </au>
            <au id="A4">
               <snm>Fincher</snm>
               <mi>B</mi>
               <fnm>Geoffrey</fnm>
               <insr iid="I1"/>
               <email>geoff.fincher@adelaide.edu.au</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Australian Centre for Plant Functional Genomics, School of Agriculture and Wine, University of Adelaide, Waite Campus, Glen Osmond, SA 5064, Australia</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>335</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/335</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18687147</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-335</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>02</day>
               <month>1</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>08</day>
               <month>8</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>08</day>
               <month>8</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Schreiber et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Both microarrays and quantitative real-time PCR are convenient tools for studying the transcriptional levels of genes. The former is preferable for large scale studies while the latter is a more targeted technique. Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable. These difficulties are exacerbated if there is only partial overlap between the experimental conditions and genes probed in the two datasets.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We show here that the generalized singular value decomposition provides a practical tool for merging a small, targeted dataset obtained by quantitative real-time PCR of specific genes with a much larger microarray dataset. The technique permits, for the first time, the identification of genes present in only one dataset co-expressed with a target gene present exclusively in the other dataset, even when experimental conditions for the two datasets are not identical. With the rapidly increasing number of publically available large scale microarray datasets the latter is frequently the case. The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-<it>&#946;</it>-<smcaps>D</smcaps>-glucan polysaccharide found in plant cell walls.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We show that the generalized singular value decomposition provides a viable tool for a combined analysis of two gene expression datasets with only partial overlap of both gene sets and experimental conditions. We illustrate how the decomposition can be optimized self-consistently by using a judicious choice of genes to define it. The ability of the technique to seamlessly define a concept of "co-expression" across both datasets provides an avenue for meaningful data integration. We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <sec>
            <st>
               <p>Historical background</p>
            </st>
            <p>Measurements and comparisons of transcriptional activities of genes provide important information on the biological state of a cell. For example, enhanced transcription of a gene of unknown function in response to an imposed stress may be used to infer a possible biological function for the gene or, conversely, altered activity of a gene of known function may serve as a useful diagnostic indicator of the biological state. Several methods for measuring transcriptional activities of genes are in common use, such as those based on two-colour <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and genechip <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp> microarrays, serial analysis of gene expression <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, MPSS and other sequencing technologies <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> or on the real-time quantitative polymerase chain reaction (Q-PCR) <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Frequently, complementary data from several of these technologies are available for a particular biological system or process. This raises the question of how to perform a meaningful comparison and/or integration of transcriptional datasets from multiple sources.</p>
            <p>Numerous approaches to transcriptomic data integration have been developed in recent years. For example, if the data originates from several sources using the same or similar platforms, a direct integration of expression values may be feasible <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. For genes in common among the individual studies this can lead to increased significance of results upon integration simply by virtue of greater statistics. At the other extreme, if the platforms are dissimilar (e.g. two-colour cDNA and one-channel oligonucleotide arrays) then simple comparisons of expression values become meaningless. In this case a 'meta-analysis' of summary statistics such as fold-changes, p-values, ranks or effect sizes rather than expression values is more appropriate <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Choi <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B13">13</abbr></abbrgrp> used this type of approach to compare two tumour-datasets, explicitly taking into account interstudy variation. Subsequently this approach was developed further in order to move beyond gene-by-gene analysis by constructing co-expression networks <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. In a comprehensive work, Rhodes <it>et al</it>. used data from up to 40 published studies to identify common transcriptional signatures in diverse cancer microarray datasets <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. More recently, Bayesian approaches for estimating model parameters within comparative analyses have been proposed (see, for example, <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B20">20</abbr></abbrgrp> and references therein).</p>
            <p>For the most part the above studies are concerned with improved diagnostic power through the integration of data, for example by decreasing p-values indicating differential expression of sets of genes desired as prognostic signatures for the detection of cancer. We are interested in quite a different line of inquiry, namely discovery of gene-function through co-expression across datasets. So, on the one hand we would like to work directly with expression values but, on the other hand, the expression data of interest are obtained from two very diverse platforms, in our case an Affymetrix array and a Q-PCR tissue set. Apart from obvious differences on the experimental side, the datasets obtained from these two platforms are themselves quite heterogeneous. In contrast to most integrative microarray studies, there is only a small overlap in gene content between our two datasets with only a few genes in common: a microarray dataset usually contains expression information for 10<sup>3</sup>&#8211;10<sup>5 </sup>genes while a Q-PCR dataset typically consists of corresponding information for at most a hundred genes. As described in detail below, various other aspects of these datasets conspire to complicate a combined analysis even further. We do note, however, that both datasets correspond to measurements of absolute rather than relative gene expression.</p>
         </sec>
         <sec>
            <st>
               <p>Experimental background</p>
            </st>
            <p>The organism we consider is barley (<it>Hordeum vulgare </it>L), where transcriptional information is often used to guide hypotheses about gene function and cellular processes because the regulation and function of only a small fraction of its genome is understood. The microarray data for this species, obtained with Affymetrix's Barley1 chip <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, is available through the barley reference experiment <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> (the data itself can be found at PlexDB <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>), which covers 15 different tissues and developmental stages. This dataset contains data from two barley cultivars; we make use of only one of these, namely 'Morex'. The barley microarray dataset is potentially useful for gene discovery because of a total of approx. 21400 non-redundant probesets on the Barley1 chip, the function of about 16500 genes cannot be reliably surmised from sequence comparisons with genes of known function in other species. However, it is a 'closed' dataset in the sense that the genes interrogated by the chip comprise a fixed fraction, perhaps half, of the genes in the genome <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and this selection of genes is determined at the time of the design of the chip. Furthermore, because the probes on the chip were mostly designed from information available in the public EST databases, there is an inbuilt bias toward genes expressed at a significant level in at least one tissue, while genes transcribed at low levels are often missing from the chip.</p>
            <p>The Q-PCR based dataset, on the other hand, was taken from a series of 11 barley tissues, from the cultivar 'Sloop'. This dataset contains expression data for almost 80 genes that are mostly related to the synthesis or modification of cell wall polysaccharides. A number of these genes are only transcribed at relatively low levels and so it is not surprising that, while some of them are represented on the Barley1 chip, quite a few are unique to the Q-PCR dataset. The Q-PCR technique is more suited for detailed, targeted studies of genes of particular interest and, in contrast to the microarray dataset, it can be considered to be an 'open' dataset: it can easily be enlarged through the design of additional primers. For details on this Q-PCR data we refer to the Methods section as well as to the additional material [see Additional file <supplr sid="S1">1</supplr>]. The relationship between the genes and tissues probed in the two datasets is summarized in Figure <figr fid="F1">1</figr>.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Table S1. This Excel spreadsheet contains Table S1 with the Q-PCR data used in the paper.</p>
               </text>
               <file name="1471-2105-9-335-S1.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>The microarray and Q-PCR datasets</p>
               </caption>
               <text>
                  <p><b>The microarray and Q-PCR datasets</b>. The potential overlap of the microarray and Q-PCR datasets is depicted here. 'Region A' generically refers to the overlap between the datasets, 'Region B' to the part unique to the microarray and 'Region C' to that part unique to the Q-PCR series. It is clear from the context whether references to these regions in the main text refer to genes (top panel) or tissues probed (bottom panel). While up to 59 genes are simultaneously probed by both platforms, one would not necessarily expect their expression profiles to be identical in every case: differential contributions from (unknown) paralogs and/or closely related gene family members, as well as alternative splicing, can lead to distortions because the probes on the microarray and the primers used for the Q-PCR generally target different regions of a gene. Similarly, some tissues are probed simultaneously using both platforms, while others can be found in only one of the datasets. For a few tissues the overlap is hard to determine due to possible differences in developmental stage (shown in brackets; dgs = day old germinating seedling, dap = days after pollination, dba = days before anthesis, s = 10 cm seedling, ba = before anthesis). Further details about the tissues probed with the microarray may be found in Ref. <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, while details about those included in the Q-PCR series can be found in Ref. <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
               </text>
               <graphic file="1471-2105-9-335-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>The experimental question</p>
            </st>
            <p>The central question that we would like to address here is the following: suppose one has identified a gene of interest that participates in a particular biological process and that one has collected Q-PCR data for the gene, but it is not a member of the set of genes represented on the microarray. For the reasons discussed above, this is frequently the case for species, such as most plant species, whose full genome has not been sequenced. We want to discover potential candidate genes involved in the same biological process as this gene of interest through a co-expression analysis, however a Q-PCR dataset consists, by its very nature, of expression data for only a very limited number of genes, so the value of carrying out this sort of analysis for this dataset alone is rather limited. On the other hand, microarray datasets contain expression information for a very large number of genes and would be ideally suited for the task.</p>
            <p>For this reason we would like to use the extensive transcript data from the microarray to discover potential candidate genes involved in the same biological process as the original gene of interest. Co-transcription of these other genes identified from the microarray can then be verified in follow-up Q-PCR experiments. The stumbling block is, of course, that one needs to make a meaningful comparison between the actual expression profiles obtained from one dataset with those of the other. The difficulties with this include the following:</p>
            <p>a) Each platform has inherent systematic errors that result in measurements that, while presumably correlated in some way to the actual mRNA concentrations, are distorted representations thereof. Intensive studies of this issue have been undertaken over the years, particularly between various microarray platforms <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. It is fair to say that no clear consensus has emerged <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, with correlations between platforms ranging from 'poor' <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> to 'strong' <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Analogously, comparisons of fold-changes between microarray and Q-PCR measurements have demonstrated similar systematic, platform-dependent biases <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
            <p>It appears likely that a significant source of differences across platforms is simply due to differential hybridization to platform-specific probes. This can arise because in general the probes on each platform can be expected to be sensitive to their own particular admixtures of alternative splice forms <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp> and/or gene family members with similar sequence <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. If a complete genome is available these admixtures can be identified, at least in principle, facilitating correct matching of probes between platforms <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B36">36</abbr></abbrgrp>. For barley, however, the complete complement of genes, let alone splice forms, is not known.</p>
            <p>b) The datasets considered here were obtained not only from different biological samples but also from different varieties of the same plant species.</p>
            <p>c) The experimental conditions used to obtain one dataset correspond only partially with the experimental conditions used in obtaining the second dataset (see Fig. <figr fid="F1">1</figr>). Some of the experimental conditions are unique to one dataset, some to the other and some are common to both. Specifically, the experimental variables here consist of an assortment of tissues, some of which are unique to the genechip data (e.g. mesocotyl and embryo), some unique to the Q-PCR tissue series (e.g. scutellum and stem) and some are found in both datasets (e.g. caryopsis and root), although even in the latter case the age of the tissues probed is different.</p>
            <p>d) As described above, the gene content of the two datasets is only partially overlapping and, in contrast to comparative microarray studies, highly asymmetric in size.</p>
         </sec>
         <sec>
            <st>
               <p>The computational task</p>
            </st>
            <p>In short, our aim is twofold:</p>
            <p>a) we want to establish a meaningful framework for quantifying the similarities and differences in the overlap of the two datasets (region A, Fig. <figr fid="F1">1</figr>) and</p>
            <p>b) we want to use this mathematical framework to draw inferences about the non-overlapping parts of the datasets. In particular, we want to identify candidate genes probed <it>only </it>by the microarray (i.e. genes in region B) that are 'co-expressed' with genes probed <it>only </it>in the Q-PCR dataset (i.e. genes in region C).</p>
         </sec>
         <sec>
            <st>
               <p>The Generalized Singular Value Decomposition</p>
            </st>
            <p>At first sight, the latter aim in particular poses a formidable, even impossible, challenge and, indeed, we are unaware of any existing methodology that would be suitable for realizing this goal. However, the datasets <it>do </it>contain some overlapping information (region A) and in this paper we show how to exploit this fact to achieve both tasks by using the matrix decomposition known as the generalized singular value decomposition (GSVD) <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>.</p>
            <p>The situation described is somewhat analogous to one that arises in the comparison of transcriptomes from two species <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. In that case, certain genes rather than experimental conditions may be involved in processes common to the two organisms, while others may be involved in processes unique to either one or the other organism. The number of genes probed in the one species is in general different to that probed in the other and, of course, the ubiquitous systematic artefacts are present here as well. The latter problem has been addressed, using the GSVD, by Alter <it>et al</it>. <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> in a comparison of the cell cycle in humans and yeast. While here we are dealing with the orthogonal problem, it does have some similarity to the one considered in the pioneering study of Alter <it>et al</it>. and so a number of the conceptual ideas of the approach have already been introduced <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. Here we concentrate on additional developments essential for the application of this novel matrix decomposition in its present setting.</p>
            <p>More recently, Berger <it>et al</it>. <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> have used the GSVD approach to combine transcriptomic and copy number data obtained from genome-wide breast cancer studies. The aim in that work was similar to ours in that these authors set out to combine datasets collected from the same species from different experimental platforms. However, their approach is equivalent to that of Alter <it>et al</it>. in that the link between the two datasets was provided through coinciding experimental conditions &#8211; namely, identical time points in the cell cycle in Ref. <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> and identical cell lines in Ref. <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. Because, as discussed above, the merging of transcriptomic datasets in general involves differing experimental conditions in the two datasets, the approach of Alter <it>et al</it>. and Berger <it>et al</it>. does not suffice for the problem addressed here. In addition, for the identification of candidate genes, we want to extend the use of the GSVD beyond a simple comparison of expression profiles of genes in common to the two datasets. In what follows we describe how to modify the approach to meet the more complex requirements of the present setting.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>In this section we describe the application of the GSVD approach to a comparison of the two transcript datasets. The discussion is restricted to issues arising in a comparison of microarray and Q-PCR data, although it is straightforward to generalize it to other applications such as comparisons between genechip and two-colour arrays. The method is subsequently tested by comparing transcriptional profiles of genes common to the two datasets. Finally, the technique is applied by searching the microarray dataset for co-ordinately expressed genes for which only Q-PCR data are available. This leads to a testable biological hypothesis, implicating a particular glycosyl transferase in cell wall biosynthesis.</p>
         <sec>
            <st>
               <p>Algorithm: Multiple gene-expression platforms and the generalized singular value decomposition</p>
            </st>
            <p>The well-known singular value decomposition of the <it>N </it>&#215; <it>M </it>dimensional matrix <it>e</it>,</p>
            <p>
               <display-formula id="M1"><it>e </it>= <it>u </it>&#183; <it>&#949; </it>&#183; <it>v</it><sup><it>T</it></sup></display-formula>
            </p>
            <p>has become a popular tool in the analysis of large-scale gene expression datasets <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> because it can be used to reorganize thousands of individual gene expression profiles, as measured by transcript abundance, into a small number of linearly independent processes involving linearly independent combinations of genes. The matrix <it>e </it>contains rows of 'gene-expression vectors' <b><it>e</it></b><sub><b><it>n</it></b></sub>, <it>1 </it>&#8804; <it>n </it>&#8804; <it>N</it>, with each component <it>e</it><sub><it>nm </it></sub>indicating the level of transcription of gene <it>n </it>in array <it>m</it>, <it>1 </it>&#8804; <it>m </it>&#8804; <it>M</it>. In keeping with the nomenclature used by Alter <it>et al</it>. <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, we refer to the set of all gene expression values collected for a particular environmental condition as an 'array', even for the Q-PCR data; in our particular case one could, of course, simply refer to these as individual 'tissues'. As illustrated in the online Additional Material [see Additional file <supplr sid="S2">2</supplr>], it is useful to think of the <it>N </it>&#215; <it>N </it>matrix <it>u </it>and the <it>M </it>&#215; <it>M </it>matrix <it>v </it>as rotation matrices (hence <it>u</it><sup><it>T</it></sup>&#183;<it>u = I </it>and <it>v</it><sup><it>T</it></sup>&#183;<it>v = I</it>), rotating the original orthonormal coordinate systems spanned by individual genes and arrays to new coordinate systems <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Strictly speaking these matrices, being orthogonal, may involve reflections as well as rotations. A reflection corresponds to a change in the handedness of the new coordinate system, but the handedness is immaterial within the context of the present discussion. In the following, therefore, we take it as understood that our use of the term 'rotations' may include reflections as well.</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>The geometrical interpretations of the singular and generalized singular value decompositions. This Word document contrasts the geometric interpretations of the singular and generalized singular value decompositions.</p>
               </text>
               <file name="1471-2105-9-335-S2.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>The matrix <it>&#949; </it>contains the expression patterns as viewed from the new, rotated, coordinate systems and by construction it is very simple in that only its diagonal entries are non-zero. Singular value decompositions may of course be carried out individually for two datasets (labelled p and q), i.e. <it>e</it><sup>(<it>p</it>) </sup>= <it>u</it><sup>(<it>p</it>)</sup>&#183;<it>&#949;</it><sup>(<it>p</it>)</sup>&#183;<it>v</it><sup>(<it>p</it>)<it>T </it></sup>and <it>e</it><sup>(<it>q</it>) </sup>= <it>u</it><sup>(<it>q</it>)</sup>&#183;<it>&#949;</it><sup>(<it>q</it>)</sup>&#183;<it>v</it><sup>(<it>q</it>)<it>T</it></sup>, where the expression of the same set of genes has been measured in different sets of experiments. However, it is not possible to subsequently compare the expression matrices <it>&#949;</it><sup>(<it>p</it>) </sup>and <it>&#949;</it><sup>(<it>q</it>) </sup>directly, because the separate rotations <it>u</it><sup>(<it>p</it>) </sup>and <it>u</it><sup>(<it>q</it>) </sup>of the coordinate systems spanned by the genes have removed the information that there is a connection between genes in the two experiments. A simultaneous diagonalization may be achieved, however, through the use of the GSVD, defined by</p>
            <p>
               <display-formula id="M2"><it>e</it><sup>(<it>i</it>) </sup>= <it>y</it>&#183;<it>&#949;</it><sup>(<it>i</it>) </sup><it>&#183;v</it><sup>(<it>i</it>)<it>T</it></sup><it>   i </it>= <it>p</it>, <it>q</it></display-formula>
            </p>
            <p>The <it>N </it>&#215; <it>N </it>dimensional matrix <it>y </it>again parameterizes the connection between the original and transformed genes, termed 'genelets' by Alter <it>et al</it>. <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, but now it is the same for both decompositions <it>p </it>and <it>q</it>. While this retains the desired common coordinate system for the genelet space, the price to pay is that the new 'genelet' axes are no longer orthogonal, that is, the matrix <it>y </it>is no longer purely a rotation/reflection matrix (hence <it>y</it><sup><it>T</it></sup>&#183;<it>y </it>&#8800; <it>I</it>). This geometrical interpretation of the SVD and GSVD is illustrated in the mathematical appendix contained in the online Additional Material [see Additional file <supplr sid="S2">2</supplr>].</p>
            <p>The <it>M</it><sup>(<it>i</it>) </sup>&#215; <it>M</it><sup>(<it>i</it>) </sup>dimensional matrices <it>v</it><sup>(<it>i</it>) </sup>define rotations from spaces spanned by arrays to spaces spanned by 'arraylets'. These rotations are necessarily different in the two datasets because the sets of experimental variables (in our case, the individual tissues) are unique to each dataset. As before, the <it>N </it>&#215; <it>M</it><sup>(<it>i</it>) </sup>dimensional matrices <it>&#949;</it><sup>(<it>i</it>) </sup>only have non-vanishing entries <it>&#949;</it><sub><it>nm </it></sub><sup>(<it>i</it>) </sup>if <it>n </it>= <it>m</it>, so each genelet is only expressed in its corresponding arraylet. By convention the singular values <it>&#949;</it><sub><it>nm </it></sub><sup>(<it>i</it>)</sup>are positive, decrease with increasing <it>n </it>in <it>&#949;</it><sup>(<it>p</it>) </sup>and increase with increasing <it>n </it>in <it>&#949;</it><sup>(<it>q</it>) </sup><abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp>.</p>
            <p>The GSVD defined by Eq. 2 should be contrasted to that used in <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. In that work, the GSVD was defined through <it>e</it><sup>(<it>i</it>) </sup>= <it>u</it><sup>(<it>i</it>)</sup>&#183;<it>&#949; </it><sup>(<it>i</it>)</sup>&#183;<it>x</it><sup> -1</sup>, where <it>u</it><sup>(<it>i</it>)</sup>were rotation matrices connecting the gene and genelet co-ordinate systems while the non-orthogonal <it>M </it>&#215; <it>M </it>matrix <it>x </it><sup>-1 </sup>was the matrix connecting array and arraylet co-ordinate systems. The reason for the difference between this definition and our Equation (2) is clear: in <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, the connection between the two datasets is made through coinciding experimental conditions, i.e. time-points in the cell cycle, while in the present case the connection between the datasets is imposed through coinciding genes. Hence in the former case a common transformation <it>x</it><sup>-1 </sup>from arrays to arraylets was required, while in the latter a common transformation <it>y </it>from genes to genelets is appropriate. Notwithstanding these differences in detail, transposition of <it>e</it><sup>(<it>i</it>) </sup>allows the same algorithm employed in <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> to be used for performing the decomposition in Eq. 2. Furthermore, as in <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, we use the angles</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1471-2105-9-335-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>&#952;</m:mi>
                              <m:mi>k</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:msup>
                              <m:mrow>
                                 <m:mi>tan</m:mi>
                                 <m:mo>&#8289;</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:msup>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>&#949;</m:mi>
                                    <m:mrow>
                                       <m:mi>k</m:mi>
                                       <m:mi>k</m:mi>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>p</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:msubsup>
                              </m:mrow>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>&#949;</m:mi>
                                    <m:mrow>
                                       <m:mi>k</m:mi>
                                       <m:mi>k</m:mi>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>q</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:msubsup>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>&#8722;</m:mo>
                           <m:mfrac>
                              <m:mi>&#960;</m:mi>
                              <m:mn>4</m:mn>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiUde3aaSbaaSqaaiabdUgaRbqabaGccqGH9aqpcyGG0baDcqGGHbqycqGGUbGBdaahaaWcbeqaaiabgkHiTiabigdaXaaajuaGdaWcaaqaaiabew7aLnaaDaaabaGaem4AaSMaem4AaSgabaGaeiikaGIaemiCaaNaeiykaKcaaaqaaiabew7aLnaaDaaabaGaem4AaSMaem4AaSgabaGaeiikaGIaemyCaeNaeiykaKcaaaaakiabgkHiTKqbaoaalaaabaGaeqiWdahabaGaeGinaqdaaaaa@4ACA@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>as a measure of the relative contribution of the k<sup>th </sup>arraylet and genelets to the first and second dataset. Those arraylets for which this angle is close to zero characterize processes that are common to the two datasets, while those arraylets for which <it>&#952;</it><sub><it>k </it></sub>is close to <it>&#960;</it>/4 or -<it>&#960;</it>/4 characterize processes exclusive to the first or second datasets, respectively. The crucial observation is, therefore, that a sensible comparison between the datasets that avoids platform dependent biases should involve only those processes for which <it>&#952; </it><sub><it>k</it></sub>&#8776;<it>0</it>.</p>
            <p>The proof of the GSVD in <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B40">40</abbr></abbrgrp>, and the corresponding algorithms implementing this decomposition, relies on the inequality <it>N </it>&#8804; <it>min(M</it><sup>(<it>p</it>)</sup>, <it>M</it><sup>(<it>q</it>)</sup>). Given that particularly microarray datasets typically contain many more genes than arrays, i.e. <it>N </it>>> <it>M</it><sup>(<it>i</it>)</sup>, this implies that only a subset of probed genes may be used in Eq. 2. This is in contrast to the decomposition used by Alter <it>et al</it>. <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, where the inequality required the number of genes to be larger than the number of arrays, which is usually the case. While in principle the GSVD can be generalized to arbitrary <it>N</it>, <it>M</it><sup>(<it>p</it>) </sup>and <it>M</it><sup>(<it>q</it>) </sup><abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, only those genes in common between the two datasets (region A of Fig. <figr fid="F1">1</figr>) are represented in Eq. 2. Because part of our aim is to make use of gene expression profiles contained in only one or the other of the two datasets (regions B &amp; C), we require further conceptual extensions to the analysis carried out in <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>The definition of the subspace in common to both datasets</p>
            </st>
            <p>The GSVD retains its utility in spite of these complications because it provides the transformations from the 'array'-space to 'arraylet'-space, i.e. <it>v</it><sup>(<it>p</it>) </sup>and <it>v</it><sup>(<it>q</it>)</sup>, and at the same time identifies arraylets, for which <it>&#952;</it><sub><it>k</it></sub>&#8776;<it>0</it>, spanning the subspace of relevance to a comparison between the two datasets. It, therefore, provides a mathematical mapping from expression profiles in two disparate spaces, spanned by arrays, to a common space, spanned by arraylets. Ultimately, it is this feature of the GSVD which allows one make comparisons of expression profiles for genes contained in only one or the other datasets (i.e. genes in regions B and C).</p>
            <p>The transformation between arrays and arraylets needs to be defined through the use of a suitable subset of genes common to both datasets (i.e. from region A; we shall refer to these as 'gene-pairs'). This subset defines arraylets characterizing common processes in the two datasets, relevant to this subset of genes. If one tabulates the expression of the <it>complete </it>set of genes in both datasets in the matrices <it>e</it><sup>(<it>p</it>) </sup><sub><it>full </it></sub>and <it>e</it><sup>(<it>q</it>) </sup><sub><it>full</it></sub>, one may write the expression of all genes (i.e. regions A, B and C) in the arraylets defined by the GSVD as</p>
            <p>
               <display-formula id="M4">
                  <m:math name="1471-2105-9-335-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:msubsup>
                                          <m:mi>y</m:mi>
                                          <m:mrow>
                                             <m:mi>f</m:mi>
                                             <m:mi>u</m:mi>
                                             <m:mi>l</m:mi>
                                             <m:mi>l</m:mi>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:msubsup>
                                       <m:mo>=</m:mo>
                                       <m:msubsup>
                                          <m:mi>e</m:mi>
                                          <m:mrow>
                                             <m:mi>f</m:mi>
                                             <m:mi>u</m:mi>
                                             <m:mi>l</m:mi>
                                             <m:mi>l</m:mi>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:msubsup>
                                       <m:mo>&#8901;</m:mo>
                                       <m:msup>
                                          <m:mi>v</m:mi>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:msup>
                                       <m:mo>.</m:mo>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msup>
                                                <m:mi>&#949;</m:mi>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mi>i</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                             </m:msup>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                       </m:msup>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mi>p</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>q</m:mi>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeqabeGaaaqaaiabdMha5naaDaaaleaacqWGMbGzcqWG1bqDcqWGSbaBcqWGSbaBaeaacqGGOaakcqWGPbqAcqGGPaqkaaGccqGH9aqpcqWGLbqzdaqhaaWcbaGaemOzayMaemyDauNaemiBaWMaemiBaWgabaGaeiikaGIaemyAaKMaeiykaKcaaOGaeyyXICTaemODay3aaWbaaSqabeaacqGGOaakcqWGPbqAcqGGPaqkaaGccqGGUaGlcqGGOaakcqaH1oqzdaahaaWcbeqaaiabcIcaOiabdMgaPjabcMcaPaaakiabcMcaPmaaCaaaleqabaGaeyOeI0IaeGymaedaaaGcbaGaemyAaKMaeyypa0JaemiCaaNaeiilaWIaemyCaehaaaaa@584D@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Here (<it>&#949;</it><sup>(<it>i</it>)</sup>)<sup>-1 </sup>is the pseudo-inverse of <it>&#949;</it><sup>(<it>i</it>) </sup><abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. This equation is the key result that we use in the present study.</p>
            <p>The matrices <b><it>y</it></b><sup>(<it>i</it>)</sup><sub><it>n</it>, <it>full </it></sub>contain the expression profiles of all genes in the two datasets. Each column contains the expression information for a particular arraylet and the relative contribution the arraylet <it>k </it>receives from each dataset is characterized by its angle <it>&#952;</it><sub><it>k </it></sub>. Expression profiles of different genes (rows) in <b><it>y</it></b><sup>(<it>i</it>) </sup><sub><it>n</it>, <it>full </it></sub>may be directly compared, irrespective of whether they originate from regions A, B or C in Fig. <figr fid="F1">1</figr>.</p>
            <p>Those genes actually used to define the GSVD will have identical expression profiles in the matrices <it>y</it><sup>(<it>i</it>)</sup><sub><it>full</it></sub>, i.e., <b><it>y</it></b><sup>(<it>p</it>)</sup><sub><it>n</it>, <it>full </it></sub>= <b><it>y</it></b><sup>(<it>q</it>)</sup><sub><it>n</it>, <it>full</it></sub>. Those genes contained in both datasets but not used to define the GSVD should have similar, but generally not identical, expression profiles <b><it>y</it></b><sup>(<it>p</it>)</sup><sub><it>n</it>, <it>full </it></sub>and <b><it>y</it></b><sup>(<it>q</it>)</sup><sub><it>n</it>, <it>full </it></sub>in the subspace (i.e. columns) characterized by <it>&#952;</it><sub><it>k</it></sub>&#8776;<it>0</it>. The degree to which these expression profiles correlate within this space provides a convenient measure of the utility of the GSVD and the suitability of those genes used to define it. Finally, expression profiles in the subspace characterized by <it>&#952; </it><sub><it>k</it></sub>&#8776;<it>0 </it>for genes present in only one or the other dataset alone (regions B and C) can also be compared, allowing the identification of putatively co-regulated genes.</p>
            <p>These features suggest an iterative approach, illustrated in Fig. <figr fid="F2">2</figr>, for using the GSVD in a search for co-expressed genes across the two datasets. This approach is described in detail in the following sections.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Using the GSVD to identify candidate co-expressing genes</p>
               </caption>
               <text>
                  <p><b>Using the GSVD to identify candidate co-expressing genes</b>. This schematic flowchart shows the procedures used to identify a) an overlapping region between the two datasets as well as b) candidate genes probed by the microarray co-expressing with genes of interest from the Q-PCR dataset. Regions A, B and C refer to those defined in Fig. 1. In order to reduce the number of false positives we have repeated the entire procedure a number of times and only examine in detail genes that co-express consistently among these repeats.</p>
               </text>
               <graphic file="1471-2105-9-335-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Testing the GSVD defined by random subsets of genes</p>
            </st>
            <p>We begin by illustrating the procedure using, at this stage, a <it>random </it>selection of gene-pairs from the overlapping region A in Figure <figr fid="F1">1</figr> to define a GSVD of our microarray and Q-PCR data. The purpose here is twofold: firstly, we want to check that, as one would expect from the preceding discussion, the expression profiles of the remaining gene-pairs from region A indeed show greater co-expression in the subspace spanned by the central arraylets than those spanned by peripheral arraylets. Secondly, this illustration provides a vehicle for introducing the particular measure that we shall adopt for quantifying "co-expression". This measure will be used in the subsequent analysis.</p>
            <p>We use a random selection of 10 gene-pairs in the definition of the GSVD. The size of this set is dictated by the requirement <it>N </it>&#8804; <it>min(M</it><sup>(<it>p</it>)</sup>, <it>M</it><sup>(<it>q</it>)</sup>) discussed above, i.e. in our case we need <it>N </it>&#8804; 11. We have used one gene-pair less than this because, for convenience, the datasets have been standardized by centering each gene's transcription profile and scaling its variance across the tissues to unity. The centering results in one column in each matrix becoming linearly dependent on the other 10. Finally, in keeping with standard practice, we have worked with the <it>log</it><sub>2 </sub>of the expression intensities.</p>
            <p>The random set of 10 genes used for the GSVD is indicated by asterisks in the table available in the online Additional Material [see Additional file <supplr sid="S1">1</supplr>]. The resulting range of angles <it>&#952; </it><sub><it>k </it></sub>is shown in Fig. <figr fid="F3">3</figr>. It is evident from this figure that arraylets <it>k </it>= 5, <it>k </it>= 6 and <it>k </it>= 7 contribute a similar amount to both datasets, while arraylets <it>k </it>&#8804; <it>4 </it>increasingly dominate in the microarray dataset and arraylets <it>k </it>&#8805; <it>8 </it>increasingly dominate in the Q-PCR dataset. One would expect, therefore, to have the greatest success in making an identification of genes between the two datasets if the overlapping subspace included the central arraylets <it>k </it>= 5, <it>k </it>= 6 and <it>k </it>= 7.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Relative contribution of arraylets/genelets to the two datasets</p>
               </caption>
               <text>
                  <p><b>Relative contribution of arraylets/genelets to the two datasets</b>. Genelets with <it>&#952;</it><sub><it>k </it></sub>> 0 (i.e. small k) correspond to those expressed predominantly in the microarray dataset while genelets with <it>&#952;</it><sub><it>k </it></sub>&lt; 0 (i.e. large <it>k</it>) correspond to those expressed predominantly in the Q-PCR dataset. The angles <it>&#952;</it><sub><it>k </it></sub>result from a GSVD defined by the genes marked by asterisks in the Table in the online Additional Material [see Additional file <supplr sid="S1">1</supplr>].</p>
               </text>
               <graphic file="1471-2105-9-335-3"/>
            </fig>
            <p>There is of course arbitrariness in how one actually defines the "identification of genes". A convenient procedure adopted here consists of calculating, for each microarray gene <inline-formula><m:math name="1471-2105-9-335-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>M</m:mi><m:mi>p</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqefiuzSXgih9gDOL2yGmfDKbIqSf2yRbaceiGae8xta00aaSbaaSqaaiabdchaWbqabaaaaa@35F2@</m:annotation></m:semantics></m:math></inline-formula> in turn [see Additional File <supplr sid="S1">1</supplr>], the Euclidean distance <it>d </it>within the central arraylets for all Q-PCR gene transcripts (<inline-formula><m:math name="1471-2105-9-335-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>q</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqefiuzSXgih9gDOL2yGmfDKbIqSf2yRbaceiGae8xuae1aaSbaaSqaaiabdghaXbqabaaaaa@35FC@</m:annotation></m:semantics></m:math></inline-formula>), i.e. for each <it>p </it>we calculated <it>d</it><sub><it>c</it>.<it>a</it>.</sub>(<inline-formula><m:math name="1471-2105-9-335-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>M</m:mi><m:mi>p</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqefiuzSXgih9gDOL2yGmfDKbIqSf2yRbaceiGae8xta00aaSbaaSqaaiabdchaWbqabaaaaa@35F2@</m:annotation></m:semantics></m:math></inline-formula>, <inline-formula><m:math name="1471-2105-9-335-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>q</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqefiuzSXgih9gDOL2yGmfDKbIqSf2yRbaceiGae8xuae1aaSbaaSqaaiabdghaXbqabaaaaa@35FC@</m:annotation></m:semantics></m:math></inline-formula>) for all <it>q</it>. We chose to define a "successful match" to be one where the appropriate gene from the Q-PCR dataset (<it>p </it>= <it>q</it>) is one of the seven 'closest' to the microarray gene, i.e. rank(<it>d</it><sub><it>c</it>.<it>a</it>.</sub>(<inline-formula><m:math name="1471-2105-9-335-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>M</m:mi><m:mi>p</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqefiuzSXgih9gDOL2yGmfDKbIqSf2yRbaceiGae8xta00aaSbaaSqaaiabdchaWbqabaaaaa@35F2@</m:annotation></m:semantics></m:math></inline-formula>, <inline-formula><m:math name="1471-2105-9-335-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>q</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqefiuzSXgih9gDOL2yGmfDKbIqSf2yRbaceiGae8xuae1aaSbaaSqaaiabdghaXbqabaaaaa@35FC@</m:annotation></m:semantics></m:math></inline-formula>) &#8804; <it>7</it>). While the absolute number of "successes" is naturally sensitive to this arbitrary choice of the cut-off, comparisons between them are less so.</p>
            <p>The results from this illustrative exercise are shown in Table <tblr tid="T1">1</tblr>. In this instance the greatest success is achieved using either arraylets 4 to 8, 5 to 9 or 3 to 9. In these cases 17 out of 49 genes are successfully matched. The success rate decreases, as one would expect, if non-central arraylets are chosen. For example, using the Q-PCR dominated arraylets 8&#8211;10, only 10 genes are successfully matched. Similarly, using the microarray dominated arraylets 1&#8211;3 only 4 genes are matched. Successful matches may of course occur purely by chance, with a binomial probability distribution given by <it>Pr</it>(<it>j</it>, <it>J</it>; <it>x </it>= <it>s</it>/<it>S</it>) = <it>J</it>!/[<it>j</it>! (<it>J</it>-<it>j</it>)!] <it>x</it><sup><it>j</it></sup>(1-<it>x</it>)<sup><it>J</it>-<it>j</it></sup>, where <it>Pr</it>(<it>j</it>, <it>J</it>; <it>x</it>) is the probability of having <it>j </it>successes in <it>J </it>= 49 trials by randomly picking <it>s </it>= 7 genes from a list of <it>S </it>= 59. The p-values associated with this null-hypothesis are shown in brackets in Table <tblr tid="T1">1</tblr>. The rate of success achieved by matching gene expression in the central arraylets of the GSVD is clearly far greater than one would expect by chance, with typical p-values in the range of 10<sup>-5 </sup>to 10<sup>-3</sup>. This success rate decreases, as expected, to around background levels (i.e. p-value of order 1) in the peripheral arraylets.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>The number of correctly identified genes as a function of both subspace location and dimension</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>k</p>
                     </c>
                     <c ca="left">
                        <p>&#916;k = 0</p>
                     </c>
                     <c ca="left">
                        <p>&#916;k = 1</p>
                     </c>
                     <c ca="left">
                        <p>&#916;k = 2</p>
                     </c>
                     <c ca="left">
                        <p>&#916;k = 3</p>
                     </c>
                     <c ca="left">
                        <p>&#916;k = 4</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>4 (0.85)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>6 (0.53)</p>
                     </c>
                     <c ca="left">
                        <p>4 (0.85)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>5 (0.71)</p>
                     </c>
                     <c ca="left">
                        <p>6 (0.53)</p>
                     </c>
                     <c ca="left">
                        <p>5 (0.71)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>3 (0.94)</p>
                     </c>
                     <c ca="left">
                        <p>6 (0.53)</p>
                     </c>
                     <c ca="left">
                        <p>10 (5.9 &#215; 10<sup>-2</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>5 (0.71)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>8 (0.22)</p>
                     </c>
                     <c ca="left">
                        <p>14 (1.3 &#215; 10<sup>-3</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>13 (3.8 &#215; 10<sup>-3</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>13 (3.8 &#215; 10<sup>-3</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>8 (0.22)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>12 (1.1 &#215; 10<sup>-2</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>13 (3.8 &#215; 10<sup>-3</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>17 (2.7 &#215; 10<sup>-5</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>17 (2.7 &#215; 10<sup>-5</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>5 (0.71)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>9 (0.12)</p>
                     </c>
                     <c ca="left">
                        <p>14 (1.3 &#215; 10<sup>-3</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>17 (2.7 &#215; 10<sup>-5</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>14 (1.3 &#215; 10<sup>-3</sup>)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c ca="left">
                        <p>9 (0.12)</p>
                     </c>
                     <c ca="left">
                        <p>15 (3.9 &#215; 10<sup>-4</sup>)</p>
                     </c>
                     <c ca="left">
                        <p>13 (3.8 &#215; 10<sup>-3</sup>)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>8 (0.22)</p>
                     </c>
                     <c ca="left">
                        <p>10 (5.9 &#215; 10<sup>-2</sup>)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>8 (0.22)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Arraylets ranging from <it>k-&#916;k </it>to <it>k+&#916;k </it>were used to define the subspace. The genes used to define the GSVD are indicated by asterisks in the online Additional Material [see Additional file <supplr sid="S1">1</supplr>]. The numbers in brackets are the corresponding p-values.</p>
               </tblfn>
            </tbl>
            <p>Naturally, the results shown in Table <tblr tid="T1">1</tblr> depend on the particular set of gene-pairs used to define the GSVD and, indeed, considerable fluctuations around these numbers may be observed when choosing a different set of genes to define the GSVD. In view of this one may well ask to what extent the results in Table <tblr tid="T1">1</tblr> are 'typical'. We have investigated this by using the fact that random fluctuations may be averaged out by performing large numbers of GSVD's, defined by randomly chosen sets of 10 gene-pairs. The results of a series of 1000 GSVDs defined in this way are shown in Fig. <figr fid="F4">4</figr>, indicating that the general trends observed in Table <tblr tid="T1">1</tblr> are robust: The greatest success in matching Q-PCR and microarray profiles is achieved in the central arraylets with an <it>average </it>of around 14 positive matches. Also shown in Fig. <figr fid="F4">4</figr> are the p-values associated with the null-hypothesis for a single GSVD (see Table <tblr tid="T1">1</tblr>) as well as, shaded dark, an additional check that the results are not being over-interpreted. Here the expression profiles in the microarray data not used in defining the GSVD were randomized, thus destroying all remaining inherent biological connections between the two datasets, before performing the matching procedure. Similar results are obtained if the microarray expression profiles are randomized before defining the GSVDs (data not shown). For all but the smallest <it>k </it>these results are consistent with the average expected background, <it>Js/S &#8776; 5.81</it>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>The <it>average </it>number of successfully identified microarray genes, using distance within three arraylets as the measure of similarity (light bars)</p>
               </caption>
               <text>
                  <p><b>The <it>average </it>number of successfully identified microarray genes, using distance within three arraylets as the measure of similarity (light bars)</b>. On the right hand axis the calculated p-values characterizing the expected number of false positives for a single GSVD are shown. The dark bars indicate the result obtained if those microarray genes not used in the GSVD are randomized.</p>
               </text>
               <graphic file="1471-2105-9-335-4"/>
            </fig>
            <p>We conclude, therefore, that co-expression of known gene-pairs is indeed strongest, and highly significant (p-value &lt; 10<sup>-3</sup>), in the subspace spanned by the central arraylets. On the other hand, in peripheral arraylets co-expression of known gene-pairs occurs at background levels. This provides strong empirical evidence that search for co-expression in the subspace spanned by the central arraylets indeed provides a tool for identifying candidates for co-expressed genes across the two datasets.</p>
         </sec>
         <sec>
            <st>
               <p>Improving the GSVD through a judicious choice of defining gene-pairs</p>
            </st>
            <p>While the discussion so far addresses the utility of the GSVD in dealing with partially overlapping experimental conditions in the two datasets, we shall now address the second problem illustrated in Figure <figr fid="F1">1</figr>: because the primers used for the Q-PCR target different regions to the probes on the microarray, there is some uncertainty in defining the set of genes that are part of region A in the first place. It could well be that alternative splice forms, unknown paralogs and/or gene family members with closely related sequence contribute differently to the signal obtained with the two platforms. Clearly, it would not be wise to make use of cases like this in defining the common subspace between the two experiments.</p>
            <p>We contend that the freedom one has in selecting the set of gene-pairs defining the GSVD allows one to test whether or not the set contains 'contamination' of this sort and that this freedom, therefore, provides a solution to this problem. In particular, it should be noted that although the results shown above are typical, some sets of gene-pairs improve the performance of the GSVD in matching genes in the two datasets dramatically. We have found over a dozen sets of <it>10 </it>gene-pairs that result in <it>35 </it>to <it>39 </it>out of <it>49 </it>successful matches. While one naturally expects some fluctuations in this success-rate it is easy to check, using the binomial distribution discussed earlier, that fluctuations of this magnitude and frequency go significantly beyond what one would expect by chance alone. Given the difficulties that have been encountered in previous cross-platform comparisons of gene expression data <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>, this is a notable result. A natural interpretation of this success is that these sets of gene-pairs define GSVDs that are able to cope particularly well with systematic platform dependent artefacts, that the expression of these genes shows little or no cultivar dependence and that differential sensitivity to alternative splice forms etc. for these gene-pairs is not an issue.</p>
            <p>A corollary of this line of reasoning is that expression signals of gene-pairs that are strongly affected by any of these artefacts, when used to define the GSVD, consistently lead to poor results. Indeed, this is found to be the case. For example, inclusion of the barley cellulose synthase-like gene <it>HvCslE2 </it>in the set of 10 genes used to define the GSVD invariably leads to a low number of successful matches. At the same time, direct comparison of expression profiles of <it>HvCslE2 </it>in both the microarray and Q-PCR tissue series indicates that, while on the microarray expression of this gene in caryopsis both 8&#8211;10 and 14&#8211;16 days after pollination is somewhat down-regulated as compared to the average across all tissues, in the Q-PCR dataset it is strongly up-regulated in the tissue that roughly corresponds to these, namely developing grain 10&#8211;13 days after pollination. While the origin of this apparent discrepancy is not known, it illustrates how one can gain information on the (in)-consistency of the expression profiles of individual gene-pairs in the two datasets by using the GSVD. In summary, we conclude that in addition to using subspaces defined only by central arraylets of the GSVD, one can further greatly improve the efficacy of the method by defining the GSVD using sets of genes that maximize its success rate for matching genes in common in the two datasets. This procedure is summarized in Fig. <figr fid="F2">2</figr>.</p>
            <p>Various strategies for selecting suitable gene-pairs may be employed. In most cases an exhaustive brute force search for the optimum set is not feasible: in our case this would have entailed testing <inline-formula><m:math name="1471-2105-9-335-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mrow><m:mo>(</m:mo><m:mrow><m:mtable><m:mtr><m:mtd><m:mrow><m:mn>59</m:mn></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd><m:mrow><m:mn>10</m:mn></m:mrow></m:mtd></m:mtr></m:mtable></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaeWaaeaafaqabeGabaaabaGaeGynauJaeGyoaKdabaGaeGymaeJaeGimaadaaaGaayjkaiaawMcaaaaa@31CF@</m:annotation></m:semantics></m:math></inline-formula> &#8776; 6 &#215; 10<sup>10 </sup>combinations of gene-pairs. As an alternative heuristic method one may start with a random set of gene-pairs and progressively swap new gene-pairs from region A into this set, keeping those that lead to improved gene-pair matching. We elected to implement a combination of these approaches: first, we narrowed down the choice of suitable gene-pairs to 20 through a heuristic search for gene-pairs that tended to improve performance and then exhaustively tested all selections of length 10 (i.e. <inline-formula><m:math name="1471-2105-9-335-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mrow><m:mo>(</m:mo><m:mrow><m:mtable><m:mtr><m:mtd><m:mrow><m:mn>20</m:mn></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd><m:mrow><m:mn>10</m:mn></m:mrow></m:mtd></m:mtr></m:mtable></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaeWaaeaafaqabeGabaaabaGaeGOmaiJaeGimaadabaGaeGymaeJaeGimaadaaaGaayjkaiaawMcaaaaa@31B7@</m:annotation></m:semantics></m:math></inline-formula> = 184,756 of them) picked from this narrowed down set.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Implementation of the GSVD</p>
            </st>
            <p>Finally, we turn to applying the methodology developed in this paper to a real biological problem. We are interested in a particular gene for which Q-PCR expression data has been collected but for which microarray information is not available (i.e. a gene in region C of Figure <figr fid="F1">1</figr>). This gene is a member of a barley cellulose synthase-like gene family and is designated <it>HvCslF3 </it>(GenBank Acc. No. <ext-link ext-link-type="gen" ext-link-id="EU267179">EU267179</ext-link>; for details of the biological methods as well as the numerical results of the Q-PCR experiments, see the Methods section as well as the online Additional Material [see Additional file <supplr sid="S1">1</supplr>]). It has recently been implicated in the biosynthesis of the polysaccharide (1,3;1,4)-<it>&#946;</it>-<smcaps>D</smcaps>-glucan, which is a major constituent of cell walls of commelinoid monocotyledons, including barley <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. However, given the presence of two distinct linkage types and the general structural complexity of barley and other (1,3;1,4)-<it>&#946;</it>-<smcaps>D</smcaps>-glucans, it might be anticipated that additional enzymes could be required for the biosynthesis of the polysaccharide and for its post-synthetic modification, either during transport to the cell wall or following its deposition into it <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp>. For example, in cellulose biosynthesis, groups of at least three cellulose synthase enzymes (HvCesA's) are thought to be required for the formation of the active terminal rosette complex through which cellulose microfibres are secreted into the cell wall <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. Furthermore, the mRNAs encoding the cellulose synthase-like HvCslF proteins are often of relatively low abundance <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> and corresponding gene sequences are generally under-represented in EST databases. As a result, only one representative of seven known members of the <it>HvCslF </it>gene family is found on the Barley1 microarray <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, despite the fact that the chip includes over 22,000 contigs <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Thus, Q-PCR data obtained for the <it>HvCslF3 </it>gene was combined with the microarray data, using the GSVD, to identify co-transcribed genes from the chip that might provide clues to the identities of ancillary proteins or enzymes required for (1,3;1,4)-<it>&#946;</it>-<smcaps>D</smcaps>-glucan biosynthesis.</p>
            <p>A GSVD analysis was performed using a particular set of 10 gene-pairs from region A in Figure <figr fid="F1">1</figr> (<it>HvCesA1, HvCesA2, HvLimit-Dextrinase Inhibitor, HvCesA4, HvGlyT5, HvCesA8, HvUXS3, HvCslC4, HvEndogluII, HvGSL3</it>). This set was chosen because it resulted in a large number (39/49) of matches for the remaining gene-pairs in region A ("matches" being defined as gene-pairs sufficiently close in Euclidean distance in the subspace spanned by the central arraylets 5 to 7, as described in detail earlier on). Using Equation (4), this GSVD provides the mapping from the space spanned by arrays to the space spanned by arraylets for the remaining genes in regions B and C of Figure <figr fid="F1">1</figr>. Transcripts from the microarray (i.e. from region B) co-ordinately transcribed with the <it>HvCslF3 </it>gene (from region C), within the subspace spanned by the central arraylets, could then be identified.</p>
            <p>It is illustrative to compare this co-expression in the space spanned by arraylets to the expression profiles obtained directly from the microarray. In Fig. <figr fid="F5">5A</figr> we show a heatmap of 200 transcript abundances obtained with the microarray, ordered so that those co-expressing most closely with <it>HvCslF3 </it>in the central arraylets are at the top of the plot. The co-expression in the central arraylets is clearly visible. On the other hand, little or no co-expression in the peripheral arraylets characterising expression in non-overlapping parts of the datasets is apparent. For comparison, the corresponding expression profiles in the original space spanned by the arrays of the microarray experiment are shown in Fig. <figr fid="F5">5B</figr>. Some overall trends are apparent: expression in anther, caryopsis and endosperm tends to be low for these genes, while expression in root-like tissues and coleoptile tends to be high. More interesting, however, is the variation in expression among these genes. Co-expression in central arraylets should be reflected in stronger co-expression in tissues that are in common between the two platforms than those tissues that are not. As a measure of this variation we have listed, along the top of Fig. <figr fid="F5">5B</figr>, the standard deviation of expression among these 200 genes, scaled by the corresponding quantity for the whole dataset. We see that a selection of genes based on co-expression within central arraylets has resulted in a gene-set that is most tightly co-expressed in anther, caryopsis (5 dap), crown, inflorescence, pistil and radicle, but co-expressed less than average in caryopsis (8&#8211;10 &amp; 14&#8211;16 dap) and mesocotyl. Comparing with Figure <figr fid="F1">1</figr> we see that the former tissues are mostly those probed by both tissue series, while the latter are among those probed by the microarray alone. It appears therefore that, as expected, the central arraylets of the GSVD are indeed associated with those tissues for which there is some overlap between the two series.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Relation between co-expression within the central arraylets to co-expression in the microarray data</p>
               </caption>
               <text>
                  <p><b>Relation between co-expression within the central arraylets to co-expression in the microarray data</b>. Panel A shows gene expression as measured in the arraylets defined by the GSVD using a set of genes described in the main text (green &#8211; low expression, red &#8211; high expression). Only the 200 transcripts whose expression profile in the central arraylets 5&#8211;7 (boxed) is closest to that of <it>HvCslF3 </it>(as measured by Euclidean distance) are shown. The expression profiles for the same genes, in the space spanned by arrays, are shown in panel B. Approximate co-expression can be seen for some tissues (e.g. expression in anther, caryopsis and endosperm tends to be low, while expression in root, radicle and coloeptile tends to be higher). At the top of panel B we indicate, for each tissue, the standard deviation of expression values among the genes shown on the plot, scaled with the corresponding quantity for the whole dataset; i.e. values larger (smaller) than one indicate larger (smaller) variability than average. As expected (see text), variability in expression is smallest in those tissues represented in both datasets.</p>
               </text>
               <graphic file="1471-2105-9-335-5"/>
            </fig>
            <p>Naturally, there are many 'co-expressed' genes and, in principle, one could perform follow-up analyses on all of those that are sufficiently co-expressed with <it>HvCslF3 </it>within the subspace under consideration. At this stage, we have focussed our attention only on those genes that, in addition to being co-expressed, are also already suspected to participate in cell wall synthesis on the basis of their annotation. A list of 20 of these genes, with the highest scores for co-expression with <it>HvCslF3</it>, is shown in Table <tblr tid="T2">2</tblr>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Microarray probesets with profiles closest to the Q-PCR profile for the cellulose synthase-like gene <it>HvCslF3</it>.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>Barley1 probeset</p>
                     </c>
                     <c ca="left">
                        <p>Dist.</p>
                     </c>
                     <c ca="left">
                        <p>Annotation (E-value)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig12242</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.17</p>
                     </c>
                     <c ca="left">
                        <p>UDP-glucose:sterol Gt [As] (1 &#215; 10<sup>-67</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig14077</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.40</p>
                     </c>
                     <c ca="left">
                        <p>putative glycosyltransferase [Hv]; (1 &#215; 10<sup>-160</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>HVSMEa0015K08r2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.42</p>
                     </c>
                     <c ca="left">
                        <p>putative XTH [Os] (3 &#215; 10<sup>-13</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>HV06O09u</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.46</p>
                     </c>
                     <c ca="left">
                        <p>putative glucosyltransferase [Os] (1 &#215; 10<sup>-35</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig11619</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.48</p>
                     </c>
                     <c ca="left">
                        <p>ceramide glucosyltransferase [Ga] (8 &#215; 10<sup>-51</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig6602</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.52</p>
                     </c>
                     <c ca="left">
                        <p>putative glycoprotein 3-<it>&#945;</it>-<smcaps>L</smcaps> Ft [Hv] (0)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig14830</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.55</p>
                     </c>
                     <c ca="left">
                        <p>putative glucosyltransferase [Os] (1 &#215; 10<sup>-113</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>HE01I24u</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.57</p>
                     </c>
                     <c ca="left">
                        <p>xyloglucan endo-1,4-<it>&#946;</it>-<smcaps>D</smcaps>-Gl [Hv] (6 &#215; 10<sup>-27</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig11983</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.64</p>
                     </c>
                     <c ca="left">
                        <p>galactosyltransferase family [At] (1 &#215; 10<sup>-113</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>HV12D17u</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.70</p>
                     </c>
                     <c ca="left">
                        <p>putative GDP-fucose protein-Ft [Os](2 &#215; 10<sup>-52</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Rbags19k14</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.74</p>
                     </c>
                     <c ca="left">
                        <p>putative glucosyltransferase [Os] (2 &#215; 10<sup>-20</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig2958</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.74</p>
                     </c>
                     <c ca="left">
                        <p>XTH [Hv] (1 &#215; 10<sup>-170</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig18221</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.75</p>
                     </c>
                     <c ca="left">
                        <p>XYLT [At] (1 &#215; 10<sup>-111</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig23070</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.76</p>
                     </c>
                     <c ca="left">
                        <p>GALT family-like protein [Os] (3 &#215; 10<sup>-70</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>HVSMEl0008B06r2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.76</p>
                     </c>
                     <c ca="left">
                        <p>putative GT family [Os] (5 &#215; 10<sup>-28</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>HVSMEl0013E16r2_s</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.77</p>
                     </c>
                     <c ca="left">
                        <p>putative xyloglucan Ft [At] (8 &#215; 10<sup>-10</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig14826</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.78</p>
                     </c>
                     <c ca="left">
                        <p>putative glucosyltransferase [Os] (6 &#215; 10<sup>-58</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig15434</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.79</p>
                     </c>
                     <c ca="left">
                        <p>glycogenin GT [Os] (4 &#215; 10<sup>-53</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig15291</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.80</p>
                     </c>
                     <c ca="left">
                        <p>putative glucosyltransferase [Os] (7 &#215; 10<sup>-48</sup>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Contig5876</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.81</p>
                     </c>
                     <c ca="left">
                        <p>putative glucosyl transferase [Os] (0)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The distance in the second column is the Euclidean distance in the subspace spanned by arraylets 5&#8211;7 of the GSVD defined by 10 genes in common between the two datasets (<it>HvCesA1, HvCesA2, HvLimit-Dextrinase Inhibitor, HvCesA4, HvGlyT5, HvCesA8, HvUXS3, HvCslC4, HvEndogluII, HvGSL3</it>).</p>
                  <p>Abreviations: As &#8211; Avena sativa; At -Arabidopsis thaliana; Ga &#8211; Gossypium arboretum; Hv &#8211; Hordeum vulgare; Os &#8211; Oryza sativa; Ft &#8211; fucosyltransferase; GALT &#8211; galactosyltransferase; Gl &#8211; glucanase; Gt &#8211; glucosyltransferase; XTH -xyloglucan endo-transglycosylase; XYLT &#8211; beta-(12)-xylosyltransferase.</p>
               </tblfn>
            </tbl>
            <p>Furthermore, one may wonder to what degree the results in Table <tblr tid="T2">2</tblr> reflect the choice of 10 genes used to define the GSVD in the first place. Another choice for this set of genes may well lead to some changes to the list of putative co-expressors listed in Table <tblr tid="T2">2</tblr>. In other words, it may well be that subsequent analyses could expose some of these candidate genes as being false positives. In order to reduce this number of false positives we elected to repeat the procedure shown in Fig. <figr fid="F2">2</figr> a number of times, each time using another set of 10 genes to define the GSVD. Each of these sets (16 in total) was chosen because it resulted in a similarly large number of matches of pairs of genes in common in the datasets as the first one. It was comforting to find that there are quite a number of genes in the lists of co-expressors that are insensitive to the choice of gene-pairs used to define the GSVD: we found that in <it>all </it>cases <it>Contig11619 </it>(annotated as a ceramide glucosyltransferase) is co-transcribed with <it>HvCslF3</it>, in 15 out of 16 GSVD analyses <it>Contig14830 </it>(annotated as a putative glucosyltransferase), <it>Contig15434 </it>(the cellulose-synthase-like gene <it>HvCslA4</it>) and <it>Contig18825 </it>(the cellulose-synthase-like gene <it>HvCslC1</it>) were co-expressed and, in 14 out of 16 analyses <it>Contig16931 </it>(annotated as a galactoside 2-<it>&#945;</it>-<smcaps>L</smcaps>-fucosyltransferase), was co-expressed. While our subsequent analysis concentrated on these genes it could well be that other transcripts in Table <tblr tid="T2">2</tblr> (or, for that matter, other transcripts not annotated as cell wall related) may also be worthy of further investigation.</p>
         </sec>
         <sec>
            <st>
               <p>Confirmation of co-expression using Q-PCR</p>
            </st>
            <p>In order to confirm the apparent co-regulation of <it>HvCslF3 </it>with this selection of genes probed by the microarray, primers were constructed so that their transcript abundance in the 11 barley tissues of the Q-PCR dataset could be checked directly using Q-PCR. The resulting expression profile of the most consistently co-expressed candidate (correlation coefficient 0.72), the putative ceramide glucosyltransferase <it>Contig11619</it>, is shown in red in Figure <figr fid="F6">6A</figr> alongside the corresponding expression profile of <it>HvCslF3 </it>(black), confirming that the GSVD procedure has indeed correctly identified a hitherto unknown co-expressed gene to this cellulose synthase-like gene. Similar cross-checks were carried out for <it>Contig14830 </it>(corr. coeff. 0.29), <it>Contig16931 </it>(0.68), <it>Contig15434 </it>(0.75) and <it>Contig18825 </it>(0.71), the latter two being already present in the Q-PCR dataset (i.e. region A). The expression profiles for these genes are also shown in Fig. <figr fid="F6">6A</figr>. As can be seen, all but <it>Contig14830 </it>show significant co-expression with <it>HvCslF3 </it>in the tissues probed by the Q-PCR dataset.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Confirmation of co-expression of <it>HvCslF3 </it>and candidate genes</p>
               </caption>
               <text>
                  <p><b>Confirmation of co-expression of <it>HvCslF3 </it>and candidate genes</b>. In panel A the Q-PCR expression profiles of the cellulose synthase-like gene <it>HvCslF3 </it>and the candidate genes identified in this study are compared. Expression profiles have been standardized as described in the text. As can be seen, the genes indeed co-express in the tissues probed in both the Q-PCR and microarray datasets. Panel B shows an additional comparison of Q-PCR coleoptile time course expression profiles of <it>HvCslF3 </it>and <it>Contig11619</it>. The two genes appear to remain roughly co-expressed in this time-course as well.</p>
               </text>
               <graphic file="1471-2105-9-335-6"/>
            </fig>
            <p>It is noteworthy that the co-expression of <it>HvCslF3 </it>with <it>Contig11619 </it>breaks down in scutellum. This is a tissue that is part of the Q-PCR dataset but not the microarray dataset. Quite correctly, therefore, the central arraylets that were searched for co-expressed genes were insensitive to the expression level in this tissue (the analogous behaviour for tissues probed in the microarray dataset but not the Q-PCR dataset has already been noted in Figure <figr fid="F5">5B</figr>). While the origin of the lack of co-expression is not known at present it should be noted that in a further series of Q-PCR based measurements, using coleoptiles at different stages of development (R. A. Burton, unpublished data), close coordinate transcription of the <it>HvCslF3 </it>and ceramide glucosyl transferase persisted (Figure <figr fid="F6">6B</figr>). Similarly, the apparent lack of co-expression of the Q-PCR derived profiles of <it>Contig14830 </it>and <it>HvCslF3 </it>is most noticeable in those tissues where the Q-PCR series indicates significant sub-tissue dependence (leaf-tip vs. leaf-base, root-tip vs. root midzone), sub-tissues that were not probed individually in the microarray experiment.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In summary, we have applied the generalized singular value decomposition to the combined analysis of two expression datasets that are only partially overlapping in both gene content as well as experimental conditions. This adapts and significantly extends the use of the GSVD beyond its original use in gene expression analysis, namely a comparative study of cell cycles of two species where the experimental conditions were identical. The extension makes use of a selection procedure that adjusts the set of genes used to define the GSVD in order to maximize expression-profile matching of known gene-pairs in the two datasets. In this way, one effectively uses the information contained in the expression data itself (rather than probe-matching to a reference sequence) to eliminate gene-pairs whose expression signal may be affected by differential sensitivity to alternative splice forms and/or other gene family members. Furthermore, we have demonstrated that the resulting decomposition provides an effective framework for conducting searches for candidate genes in one dataset that are likely to co-express with genes contained only in the other dataset.</p>
         <p>The methodology developed here has provided testable leads for the identification of genes that might be co-ordinately transcribed with the <it>HvCslF3 </it>gene. Indeed, the association of the most consistently co-expressed candidate, the ceramide glucosyl transferase, with <it>HvCslF3 </it>is quite plausible. It might form part of the cellular machinery necessary for the biosynthesis of cell wall (1,3;1,4)-<it>&#946;</it>-<smcaps>D</smcaps>-glucans. Ceramide mono- and oligoglucosides are members of the glycosphingolipid group of plant plasma membrane components that are believed to separate laterally to form specialized microdomains in the membrane <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>. These so-called 'lipid rafts' are believed to recruit groups of proteins, including GPI-anchored proteins and integral membrane proteins, that assemble in localized areas for specialized membrane processes <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. Furthermore, it has recently been shown that GPI-anchored proteins are required for cell wall biosynthesis and morphogenesis in Arabidopsis <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> and it has earlier been suggested that glycolipids or steryl glycosides might act as intermediates in the biosynthesis of wall polysaccharides <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>.</p>
         <p>The GSVD procedure described here has allowed the combination of Q-PCR and microarray transcript datasets and, through this integration, the development of testable hypotheses as to which genes might be involved in specific cellular processes. More generally, the procedure dramatically extends the utility of a limited dataset of Q-PCR analyses, for a small number of genes of interest, through combination with much larger microarray datasets. The GSVD analysis should be of similar value in combining other types of transcript datasets in any biological system for which microarray, MPSS or other large transcript datasets are available.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Real Time Quantitative-PCR</p>
            </st>
            <p>Barley tissues were prepared, RNA extracted and cDNA synthesized as detailed in Burton <it>et al</it>. <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. The amount of cDNA required to perform the experiments described here meant that two aliquots of cDNAs were prepared and combined for all tissues, using the same RNA preparations.</p>
            <p>Stock solutions of the PCR product for the preparation of a dilution series were prepared from the cDNAs and purified and quantified by HPLC <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. A dilution series covering seven orders of magnitude was prepared from 10<sup>9 </sup>copies/<it>&#956;</it>l stock solution <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. Three replicates each of seven standard concentrations were included with every Q-PCR experiment together with a minimum of three 'no template' controls. Some Q-PCR experiments were assembled by hand and others were assembled using a CAS-1200 liquid handling robot. Three replicate PCRs for each of the cDNAs were included in every analysis.</p>
            <p>Reactions were performed in an RG 3000 Rotor-Gene Real Time Thermal Cycler as follows; 15 min at 95&#176;C followed by 45 cycles of 20 s at 95&#176;C, 30 s at 55&#176;C, 30 s at 72&#176;C and 15 s at an optimized acquisition temperature. A melt curve was obtained for the final product by heating from 70&#176;C to 99&#176;C. The optimal cycle threshold (CT) was determined from the dilution series using the Rotor-Gene V6 software, and the raw expression data were derived. The mean expression levels and standard deviations for each set of four replicates for each cDNA were calculated and were normalized using the procedure described in Burton <it>et al</it>. <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Microarray Data</p>
            </st>
            <p>Sequences for the Q-PCR products of all primers used in this study were made available to us by members of our laboratory. These sequences were compared with the Affymetrix Barley1 genechip sequences using the Blast-n algorithm <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. Matched sequences were defined by demanding an E-value better than 10<sup>-38 </sup>and a percent-identity better than 93%. The precise value of these cut-offs is not crucial: increasing the stringency to E-value &lt; 10<sup>-50 </sup>and P.I. > 95% eliminates only 2 matched sequence pairs. In a small number of cases the matching was ambiguous in that several different genechip sequences with similar E-values were found. These cases were not included in the matched set. The results are summarized in the online Additional Material [see Additional file <supplr sid="S1">1</supplr>].</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>AWS and NJS jointly conceived the methodology described in this paper. AWS carried out the GSVD calculations and drafted the manuscript. NJS and RAB created the Q-PCR dataset. GBF provided the motivation for this work and was involved in drafting the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported by grants from the Australian Research Council and the Grains Research and Development Corporation.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Quantitative monitoring of gene expression patterns with a complementary-DNA microarray</p>
            </title>
            <aug>
               <au>
                  <snm>Schena</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shalon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>467</fpage>
            <lpage>470</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7569999</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Expression monitoring by hybridization to high-density oligonucleotide arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Lockhart</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Dong</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Byrne</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Follettie</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Gallo</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Chee</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Mittmann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kobayashi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Horton</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>1996</pubdate>
            <volume>14</volume>
            <fpage>1675</fpage>
            <lpage>1680</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9634850</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Accessing genetic information with high-density DNA arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Chee</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hubbell</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Berno</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>XC</fnm>
               </au>
               <au>
                  <snm>Stern</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Winkler</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lockhart</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Fodor</snm>
                  <fnm>SPA</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>274</volume>
            <fpage>610</fpage>
            <lpage>614</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8849452</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Serial analysis of gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Velculescu</snm>
                  <fnm>VE</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Vogelstein</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kinzler</snm>
                  <fnm>KW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>484</fpage>
            <lpage>487</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7570003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Brenner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bridgham</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Golda</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lloyd</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Luo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McCurdy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Foy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ewan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>George</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eletr</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Albrecht</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Vermaas</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Moon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Burcham</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Pallas</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>DuBridge</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Kirchner</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fearon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mao</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Corcoran</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2000</pubdate>
            <volume>18</volume>
            <fpage>630</fpage>
            <lpage>634</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10835600</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Brenner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Vermaas</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Storck</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Moon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>McCollum</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mao</snm>
                  <fnm>JI</fnm>
               </au>
               <au>
                  <snm>Luo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kirchner</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Eletr</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>DuBridge</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Burcham</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Albrecht</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>1665</fpage>
            <lpage>1670</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">26493</pubid>
                  <pubid idtype="pmpid" link="fulltext">10677516</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Real-time quantitative RT-PCR after laser-assisted cell picking</p>
            </title>
            <aug>
               <au>
                  <snm>Fink</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Seeger</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ermert</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>H&#228;nze</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stahl</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Grimminger</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kummer</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Bohle</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Nat Med</source>
            <pubdate>1998</pubdate>
            <volume>4</volume>
            <fpage>1329</fpage>
            <lpage>1333</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9809560</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Pooling information across different studies and oligonucleotide microarray chip types to identify prognostic genes for lung cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Morris</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Yin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Baggerly</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Methods of Microarray Data Analysis IV</source>
            <publisher>New York: Springer-Verlag</publisher>
            <editor>Shoemaker JS, Lin SM</editor>
            <pubdate>2005</pubdate>
            <fpage>51</fpage>
            <lpage>66</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Combining multiple microarrays in the presence of controlling variables</p>
            </title>
            <aug>
               <au>
                  <snm>Park</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yi</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Shin</snm>
                  <fnm>YK</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>1682</fpage>
            <lpage>1689</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16705015</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Dysregulation of the Annexin Family Protein Family Is Associated with Prostate Cancer Progression</p>
            </title>
            <aug>
               <au>
                  <snm>Xin</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Rhodes</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Ingold</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chinnaiyan</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Am J Pathol</source>
            <pubdate>2003</pubdate>
            <volume>162</volume>
            <issue>1</issue>
            <fpage>255</fpage>
            <lpage>261</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1851111</pubid>
                  <pubid idtype="pmpid" link="fulltext">12507908</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Combining multiple microarray studies and modeling inter-study variation</p>
            </title>
            <aug>
               <au>
                  <snm>Choi</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>OJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>i84</fpage>
            <lpage>i90</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg1010</pubid>
                  <pubid idtype="pmpid" link="fulltext">12855442</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Differences in gene expression between B-cell chronic lymphocytic leukemia and normal B cells: a meta-analysis of three microarray studies</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Coombes</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Highsmith</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Keating</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Abruzzo</snm>
                  <fnm>LV</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>3166</fpage>
            <lpage>3178</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15231529</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Integrative analysis of multiple gene expression profiles applied to liver cancer study</p>
            </title>
            <aug>
               <au>
                  <snm>Choi</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>JY</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>BY</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Yeom</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>HS</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>OJ</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>2004</pubdate>
            <volume>565</volume>
            <fpage>93</fpage>
            <lpage>100</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15135059</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>A cross-study comparison of gene expression studies for the molecular classification of lung cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Parmigiani</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Garrett-Mayer</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Anbazhagan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gabrielson</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Clinical Cancer Research</source>
            <pubdate>2004</pubdate>
            <volume>10</volume>
            <fpage>2922</fpage>
            <lpage>2927</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15131026</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes</p>
            </title>
            <aug>
               <au>
                  <snm>Jiang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Deng</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tao</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sha</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tsai</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>81</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">476733</pubid>
                  <pubid idtype="pmpid" link="fulltext">15217521</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models</p>
            </title>
            <aug>
               <au>
                  <snm>Hu</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Greenwood</snm>
                  <fnm>CMT</fnm>
               </au>
               <au>
                  <snm>Beyene</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>128</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1173085</pubid>
                  <pubid idtype="pmpid" link="fulltext">15921507</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Differential coexpression analysis using microarray data and its application to human cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Choi</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>OJ</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>4348</fpage>
            <lpage>4355</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16234317</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Meta-analysis of microarrays: inter-study validation of gene expression profiles reveals pathway dysregulation in prostate cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Rhodes</snm>
                  <fnm>DR</fnm>
             