<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-364</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>A Latent Variable Approach for Meta-Analysis of Gene Expression Data from Multiple Microarray Experiments</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Choi</snm>
               <fnm>Hyungwon</fnm>
               <insr iid="I1"/>
               <email>hwchoi@umich.edu</email>
            </au>
            <au id="A2">
               <snm>Shen</snm>
               <fnm>Ronglai</fnm>
               <insr iid="I1"/>
               <email>rlshen@umich.edu</email>
            </au>
            <au id="A3">
               <snm>Chinnaiyan</snm>
               <mi>M</mi>
               <fnm>Arul</fnm>
               <insr iid="I2"/>
               <email>arul@umich.edu</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Ghosh</snm>
               <fnm>Debashis</fnm>
               <insr iid="I3"/>
               <email>ghoshd@psu.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA</p>
            </ins>
            <ins id="I2">
               <p>Departments of Pathology and Urology, University of Michigan, Ann Arbor, MI, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Statistics and Huck Institute for Life Sciences, Penn State University, University Park, PA, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>364</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/364</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17900369</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-364</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>26</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>27</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>27</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Choi et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>With the explosion in data generated using microarray technology by different investigators working on similar experiments, it is of interest to combine results across multiple studies.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>In this article, we describe a general probabilistic framework for combining high-throughput genomic data from several related microarray experiments using mixture models. A key feature of the model is the use of latent variables that represent quantities that can be combined across diverse platforms. We consider two methods for estimation of an index termed the probability of expression (POE). The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques. The second method is a faster algorithm based on the expectation-maximization (EM) algorithm. The methods are illustrated with application to a meta-analysis of datasets for metastatic cancer.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The statistical methods described in the paper are available as an R package, metaArray 1.8.1, which is at Bioconductor, whose URL is <url>http://www.bioconductor.org/</url>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>With the increasing availability of published microarray data sets, there is a tremendous need for developing approaches to validate and integrate results across multiple studies. One major issue to deal with in the meta-analysis of DNA microarrays is the lack of a single standard experimental platform for data generation. The dominant technologies so far have been two-color microarrays and oligonucleotide (e.g., Affymetrix GeneChip) arrays. Because these technologies measure fundamentally differing genetic materials designed to represent identical targets, many properties of expression measurements may vary across platforms including scale of measurements, sensitivity in detecting fold changes, control of cross-hybridization, and so forth. The heterogeneity in array design poses a great challenge for cross-platform comparisons and integration of results across independent microarray studies. The general area of combining data across multiple studies is referred to as meta-analysis <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>Many approaches have been proposed for meta-analysis of microarray data. Rhodes <it>et al</it>. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> combined evidence of differential expression using a summary statistic involving the p-values from comparing cancer versus normal samples across multiple gene profiling studies and adjusted for multiple testing using q-values <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Choi <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> proposed a Bayesian model for the effect size for genes from multiple microarray experiments. In a more recent study <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, data from one study were used to generate a prior distribution of the differences in logarithm of gene expression between diseased and normal groups, whose distribution was then updated using other microarray studies. These methods all model the effect size <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, or a transformation thereof, across multiple studies.</p>
         <p>Recently, we proposed a Bayesian mixture model-based transformation of DNA microarray data based on a proposal of Parmigiani <it>et al</it>. <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and applied it to develop a signature of breast cancer recurrence across multiple microarray experiments from different platforms <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. The scale which was combined across studies is termed probability of expression (POE). The focus of Shen <it>et al</it>. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> was on the breast cancer application; here, we wish to examine the technical aspects of the modelling used there. Based on the probabilistic model that underlies the POE methodology, one can exploit the notion of using latent variables for combining genomic data from multiple genomic studies. This is a very important idea that can have more general applications than that considered by Parmigiani <it>et al</it>. <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. In <b>Methods</b>, we describe the data structure and define two general probabilistic models for quantities that are combinable across studies. The first is the model used in <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>; we present it here for completeness. The second is a two-component mixture model that can be fit using an expectation-maximization algorithm. We also relate the latent variables to recent statistical methodologies for differential expression as well as false discovery rate <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B9">9</abbr></abbrgrp>. We then illustrate the proposed methods with an application to a meta-analysis of data comparing metastatic to localized cancer across multiple microarray studies in the <b>Results </b>section.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Metastatic Cancer Study</p>
            </st>
            <p>We now discuss the application of the proposed methodology to a study looking at metastatic cancer. Based on the availability of expression data for metastatic samples and clinical information regarding the distinction of primary and metastatic tumors, we selected three studies from publicly available data sources <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. These three studies were selected based on two criteria: 1) both localized and metastatic samples are profiled, and 2) a reasonable number of common genes appear across datasets. It should be noted that generally only a small number of metastatic samples are profiled, which was the case in all three datasets. Throughout the article, the terms primary and localized will be used interchangeably.</p>
            <p>The goal of this meta-analysis is to identify the set of genes that best distinguishes metastatic tumors from primary tumors in human cancer tissue samples across distinct organ sites. The method mentioned in the previous section is applied to the three training sets to transform the data to POE using both the EM and MCMC algorithms, and an optimal signature based on leave-one-out cross-validation logistic regression framework is obtained. The method will be compared to a few alternative meta-analytic approach (<abbrgrp><abbr bid="B5">5</abbr><abbr bid="B13">13</abbr></abbrgrp>) in terms of the selected gene signatures and the clustering of primary and metastatic tumors based on them. Although the validation of methodology is challenging, we used our gene signature to predict metastasis-free survival time in the breast cancer study proposed by van't Veer <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> as a possible validation. The hypothesis presumed here is that the profile for distinguishing metastatic from nonmetastatic tumors can be used to predict aggressive cancer prognosis.</p>
         </sec>
         <sec>
            <st>
               <p>Data Description</p>
            </st>
            <p>Chen <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> mainly focus on characterizing the global gene expression patterns that distinguishes hepatocellular carcinoma (HCC) from non-HCC samples using cDNA microarrays. Our sample size numbers (see Table <tblr tid="T1">1</tblr>) are different from theirs because we have excluded non-tumor samples as well as repeat samples on the same patient. Removing these samples leaves us with 69 unique primary tumors and 9 liver tumors which have metastasized.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Description of data used in meta-analysis</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Data Source</p>
                     </c>
                     <c ca="center">
                        <p>Array Type</p>
                     </c>
                     <c ca="center">
                        <p>Organ Site</p>
                     </c>
                     <c ca="center">
                        <p>Sample</p>
                     </c>
                     <c ca="center">
                        <p># Metastatic</p>
                     </c>
                     <c ca="center">
                        <p># Primary</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Chen <it>et al</it>.</p>
                     </c>
                     <c ca="center">
                        <p>cDNA</p>
                     </c>
                     <c ca="center">
                        <p>Liver</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Garber <it>et al</it>.</p>
                     </c>
                     <c ca="center">
                        <p>cDNA</p>
                     </c>
                     <c ca="center">
                        <p>Lung</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Latulippe <it>et al</it>.</p>
                     </c>
                     <c ca="center">
                        <p>Affy U95 Human</p>
                     </c>
                     <c ca="center">
                        <p>Prostate</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Garber <it>et al</it>. <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> describe the diversity of gene expression patterns in squamous cell carcinomas (SCC), large cell lung carcinomas (LCLC), small cell lung carcinomas (SCLC), and adenocarcinoma (AC) using cDNA microarrays. These four subtypes of lung cancer are often detected in epithelial cells that line different sections of airways in the lung, and their treatment options differ by these types due to the pathological distinction among them. We first selected all 6 unique metastatic tumors and removed their paired samples profiled at primary stage. Identifying and removing duplicate samples was performed the same way as for the Chen <it>et al</it>. data. The subset of patients included in our meta-analysis were 27 primary adenocarcinoma samples and 6 samples with lymph node metastases.</p>
            <p>Finally, the Latulippe <it>et al</it>. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> study identifies genes that differentiates primary and metastatic cancers in the prostate. Using Affymetrix oligonucleotide array U95 human gene arrays, they reported gene expression profiles of nearly 25,000 genes/ESTs. All samples were included in our meta-analysis. The details for the three studies are summarized in Table <tblr tid="T1">1</tblr>.</p>
            <p>An important aspect of this collection of data is that the organ sites are different. We are postulating a hypothesis that there is a common profile separating localized tumors from metastatic tumors across the three sites. Similar evidence for this type of hypothesis has been suggested before <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. The microarray platform differs by studies, so we mapped clone/probeset IDs to Unigene cluster IDs (UGIDs) of its most recent build through SOURCE <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. UGIDs are constantly updated. Because our initial mapping was done in the year 2004, we translated these UGIDs to the June 2006 build (No. 191) in the NCBI database. The genes we report here and their annotation in the remainder of the paper is consistent with all annotations associated with the most up-to-date Unigene clusters. When multiple clones are mapped to the same UGID, we averaged the expression over the clones within each sample. Such a mapping produced 1633 common UGIDs.</p>
         </sec>
         <sec>
            <st>
               <p>POE</p>
            </st>
            <p>Before combining 140 samples from different sources into a single dataset, we transformed the raw data to POE from each study by normalizing the distribution of expression values in metastatic samples to that of localized samples. Note that the localized or primary tumors represent the baseline group, since our goal is to select gene signature that distinguishes metastatic tumors from localized tumors, for which many conflicting hypotheses have been postulated. The output of POE from each study was then combined to form a single expression dataset with 1633 genes and 140 samples.</p>
            <p>In the following, the POE data transformations by the EM and MCMC algorithms will be analyzed in parallel for the sake of comparison. All primary tumors are color-coded in red and metastatic tumors in green. In terms of computational speed, estimation of POE based on the EM algorithm takes less than a minute for 1633 genes per dataset, while that using MCMC takes about 50 minutes for 2000 iterations and 4 periodic skips in the sampler. As the numbers of genes and samples grow, this difference will be substantial. For example, it usually takes 4 hours to fit POE for a dataset with 10,000 genes using full Bayesian modelling as opposed to 3 minutes for the maximum likelihood approach using the EM algorithm. The reason for the computational difference is that the EM algorithm is fit to one gene at a time, while the MCMC algorithm involves fitting to expression measurements for all genes simultaneously.</p>
            <p>Figures <figr fid="F1">1</figr> and <figr fid="F2">2</figr> show the POE transformation for two genes using both the EM and MCMC algorithms. In both plots, the top panel shows the expression levels on the raw scale, followed by those on the POE scale from the EM and MCMC algorithms, respectively. The gene in Figure <figr fid="F1">1</figr> is TGFB1 (UGID Hs.155218), which controls proliferation and differentiation in many cell types. The gene in Figure <figr fid="F2">2</figr> is F2 (UGID Hs.410092), coagulation factor II, whose mutation leads to various forms of thrombosis and which is often expressed in liver tissues.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Expression of TGFB1</p>
               </caption>
               <text>
                  <p><b>Expression of TGFB1</b>. Transforming growth factor beta 1 (TGFB1) gene expression on raw (upper), POE EM (middle) and POE MCMC (lower) scales. This gene is uniformly underexpressed in metastatic samples. Open circles indicate primary tumor samples, and stars indicate metastatic samples.</p>
               </text>
               <graphic file="1471-2105-8-364-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Expression of F2</p>
               </caption>
               <text>
                  <p><b>Expression of F2</b>. F2 coagulation factor II (F2) gene expression on raw (upper), POE EM (middle) and POE MCMC (lower) scales. This gene is underexpressed primarily in metastatic samples of the Chen liver study. Open circles indicate primary tumor samples, and stars indicate metastatic samples.</p>
               </text>
               <graphic file="1471-2105-8-364-2"/>
            </fig>
            <p>Although both genes are in the signature obtained by our methods, they clearly represent different types of genes. Based on Figure <figr fid="F1">1</figr>, F2 is under-expressed in the metastatic liver samples of Chen <it>et al</it>., weakly under expressed in the lung samples of Garber <it>et al</it>., and not differentially expressed in the Latulippe <it>et al</it>. data. It was found significant only in the liver study among the three studies we considered here. On the other hand, TGFB1 is a gene whose expression is uniformly under expressed in metastatic samples across all three studies.</p>
            <p>This observation on the two types of expression pattern on POE scale suggests that our signature will contain both types of genes. As will be shown later, a conventional meta-analytic approach that combines strength of differential expression across studies on the raw scale tends to select genes that behave similarly to TGFB1, whereas our method picks up both types of genes. Unless genes with expression patterns similar to F2 dominate the entire signature, the gene set from our method tends not to be influenced by a single study.</p>
         </sec>
         <sec>
            <st>
               <p>Signature Selection</p>
            </st>
            <p>As we proposed POE transformations using two different implementations, we will refer to the signatures from the data transformed by the EM and MCMC algorithms as the POE EM signature and the POE MCMC signature, respectively.</p>
            <p>To obtain a gene signature that distinguishes metastatic samples from localized samples, we calculated risk indices for all samples. What we call a risk index is described in the Methods section. A logistic regression is fitted for each gene with one sample held out at a time. The response variable is metastasis status (1 = metastatic, 0 = localized). For all genes we iterated the same procedure holding each sample out while recording coefficients <it>&#946; </it>and <it>p</it>-values. Following the risk index approach for classification expalined in Methods section, we calculated risk indices for all 140 subjects at various sizes of the gene signature. The optimal signature size <it>p </it>was then determined based on classification performance.</p>
            <p>For classification purposes, we predicted the subjects with positive risk index to be metastatic and those with negative risk index to be localized cancer. Using Figure <figr fid="F3">3</figr>, we took the optimal size to be 80 for the POE EM signature as the error rates in metastatic and primary tumor samples collectively reach a minimum and do not decrease further as more genes are added beyond 80. A similar criterion was applied to obtain a 70-gene POE MCMC signature. A plot of the risk indices and the optimal cutpoint is given in Figure <figr fid="F4">4</figr>. The POE EM and POE MCMC signatures share 52 common UGIDs.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Misclassification Error</p>
               </caption>
               <text>
                  <p><b>Misclassification Error</b>. Misclassification error rates in metastatic (starred line) and primary tumors (open circled line). The upper panel is the error rates from the data transformed by the EM algorithm, and the lower panel is that from the data transformed by the MCMC algorithm.</p>
               </text>
               <graphic file="1471-2105-8-364-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Risk index</p>
               </caption>
               <text>
                  <p><b>Risk index</b>. Derived risk indices from the data transformed by the EM and MCMC algorithms. Primary and metastatic tumors are represented by open circles and stars respectively. The y-axis is risk index.</p>
               </text>
               <graphic file="1471-2105-8-364-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Comparison and Validation</p>
            </st>
            <p>We performed other analyses for the sake of comparison. First, we compared the classification performance of the signatures found using meta-analyses with that in which the classifiers were constructed on one dataset only and tested on the other two datasets. The performance is summarized in Table <tblr tid="T2">2</tblr>. While such individual study-specific signatures tended to perform well on the training dataset, their performance did not generalize well to other datasets. The consistently poor performance of all signatures on the Garber dataset, including its own signature, suggests that this dataset might have poorer reliability than the others within the common subset of 1633 genes used in this analysis.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Classification error rates</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Chen</p>
                     </c>
                     <c ca="center">
                        <p>Garber</p>
                     </c>
                     <c ca="center">
                        <p>Latulippe</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Chen (50)</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Garber (25)</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Latulippe (30)</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>POE EM (80)</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>POE MCMC (80)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Effect Size (80)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Conlon <it>et al</it>. (80)</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Table entries are misclassification error rates in percentage points using classifier from study on row to predict that listed in the column. Number in parentheses in the signature column refers to number of genes at which classification accuracy was optimized. E3ect Size refers to method of [5].</p>
               </tblfn>
            </tbl>
            <p>We also compared our methods with two meta-analysis techniques developed in <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The former performs Bayesian inference on the classical Hedges-Olkin pooled effect sizes for each gene from multiple studies, and the latter uses Bayesian hierarchical model to pool datasets across studies through group-specific mean and variance parameterization and selects gene signature based on their Bayesian estimate of FDR.</p>
            <p>First, since the method of <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> pools the differential expression statistics from a collection of raw-scale data, there is no analogue of a risk index-based classification method available using their signatures. Instead, we first obtained a signature of size 80 based on univariate gene selection. Here the choice of size 80 in all signatures was chosen to provide a fair comparison of class prediction power with POE signatures. This corresponds to controlling the FDR at 0.02 in the method by <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. We call this the effect size (ES) signature. We also fitted the hierarchical model from <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> using WinBUGS software <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. We used the prior specification reflecting vague prior information as in the original paper. The fitted model was obtained from a simulation of 12,000 iterations with the initial 2,000 iterations used for burn-in. The estimated probabilities of differential expression were surprisingly low, with the highest probability 0.003. This implies that the 80 gene signature has FDR 99%. For the sake of comparison, we also took the 80 gene to assess its class prediction ability. We call this Conlon signature. Since both POE and the latter method report the probability of differential expression of individual genes, we examined the concordance between the two sets of probabilities. Figure <figr fid="F5">5</figr> shows the probability in Conlon <it>et al</it>. plotted against that in POE EM. The ES signature shared 15 UGIDs in common with the POE EM signature and 18 genes with the POE MCMC signature only, which suggests that the two signatures will have different characteristics.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>POE and Conlon <it>et al </it>comparison</p>
               </caption>
               <text>
                  <p><b>POE and Conlon <it>et al</it>. comparison</b>. Concordance of the gene specific probability of differential expression between the POE (EM) and the method in [13].</p>
               </text>
               <graphic file="1471-2105-8-364-5"/>
            </fig>
            <p>Meanwhile, the Conlon signature had an overlap of two genes with the ES signature and one gene with the POE EM and MCMC signatures. The poor overlap of Conlon signature with others is consistent with the high Bayesian FDR estimated above.</p>
            <p>To assess the classification performance, we performed hierarchical clustering of tissue samples from the individual studies using the ES signature. Figures <figr fid="F6">6</figr> through <figr fid="F8">8</figr> show the heatmaps of the ES signature in individual studies with clustering tree. These were drawn separately because the raw scale data cannot be directly combined as in POE. Figures <figr fid="F9">9</figr>, <figr fid="F10">10</figr> are the heatmaps of the POE EM and MCMC signatures. To highlight the sample labels in each plot, a yellow/blue color strip was added to the top of the dendrograms through Figures <figr fid="F6">6</figr>, <figr fid="F7">7</figr>, <figr fid="F8">8</figr>, <figr fid="F9">9</figr>, <figr fid="F10">10</figr>, which should be viewed along with the breakdown of the clustering tree. For all plots, we used average linkage clustering with the distance metric defined using the Euclidean metric. This was also done for the Conlon signature [see Additional Files <supplr sid="S1">1</supplr>, <supplr sid="S2">2</supplr>, <supplr sid="S3">3</supplr>]. We found that the clustering performance of this signature was similar to that in the ES signature as well, with most of the errors committed in Garber lung study. The overall classification performance across all signatures is provided in Table <tblr tid="T2">2</tblr>. Based on the classification table, we see that the proposed methods (EM and MCMC) greatly outperform the Conlon signature, while they also are superior to the ES method, although this difference is smaller.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Chen <it>et al</it>. data</p>
               </caption>
               <text>
                  <p><b>Chen <it>et al</it>. data</b>. Hierarchical clustering of tumors in Chen <it>et al</it>. data using the effect size signature. The expression here is on the raw scale. The color strip in blue and yellow below the heatmap indicates primary and metastatic tumors. Blue indicates primary tumors and Yellow indicates metastatic tumors.</p>
               </text>
               <graphic file="1471-2105-8-364-6"/>
            </fig>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Garber <it>et al</it>. data</p>
               </caption>
               <text>
                  <p><b>Garber <it>et al</it>. data</b>. Hierarchical clustering of tumors in Garber <it>et al</it>. using the ES signature. The expression here is on the raw scale. The color strip in blue and yellow below the heatmap indicates primary and metastatic tumors, respectively</p>
               </text>
               <graphic file="1471-2105-8-364-7"/>
            </fig>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Latulippe <it>et al</it>. data</p>
               </caption>
               <text>
                  <p><b>Latulippe <it>et al</it>. data</b>. Hierarchical clustering of tumors in Latulippe <it>et al</it>. using the ES signature. The expression here is on the raw scale. The color strip in blue and yellow below the heatmap indicate primary and metastatic tumors, respectively.</p>
               </text>
               <graphic file="1471-2105-8-364-8"/>
            </fig>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>POE EM for all three datasets</p>
               </caption>
               <text>
                  <p><b>POE EM for all three datasets</b>. Hierarchical clustering of tumors of all three studies using the POE EM signature. The expression is on the POE scale.</p>
               </text>
               <graphic file="1471-2105-8-364-9"/>
            </fig>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>POE MCMC for all three datasets</p>
               </caption>
               <text>
                  <p><b>POE MCMC for all three datasets</b>. Hierarchical clustering of tumors of all three studies using the POE MCMC signature. The expression is on the POE scale.</p>
               </text>
               <graphic file="1471-2105-8-364-10"/>
            </fig>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Dendrogram of samples from liver data using the Conlon signature. This is a heatmap representing the hierarchical clustering results of the data in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> using the genes selected by the method of <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
               </text>
               <file name="1471-2105-8-364-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>Dendrogram of samples from lung data using the Conlon signature. This is a heatmap representing the hierarchical clustering results of the data in <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> using the genes selected by the method of <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
               </text>
               <file name="1471-2105-8-364-S2.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>Dendrogram of samples from prostate data using the Conlon signature. This is a heatmap representing the hierarchical clustering results of the data in <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> using the genes selected by the method of <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
               </text>
               <file name="1471-2105-8-364-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>We note that clustering with all signatures give fairly accurate results in all three studies. In the ES signature, only a few metastatic samples are grouped together with two other primary tumors for the Chen liver study (Figure <figr fid="F6">6</figr>). Two metastatic samples are situated under the same node with primary tumors in Garber lung study (Figure <figr fid="F7">7</figr>), Finally, one primary and another three metastatic samples are in the opposite clusters in Latulippe prostate study (Figure <figr fid="F8">8</figr>). Overall, the clustering can differentiate metastatic tumors from primary tumors, although some metastatic tumors were grouped with primary tumors. The Conlon signature had no classification error in Latulippe prostate study, but essentially there was no tight clustering in Garber lung study at all, although 4 out of 6 metastatic samples were clustered together in a local tree.</p>
            <p>The POE EM and MCMC signatures give comparably good clusterings of the two types of tumors across all studies. In Figures <figr fid="F9">9</figr> and <figr fid="F10">10</figr>, all metastatic tumors except for two samples from the Garber lung study are grouped together, and some primary tumors from the Chen liver study are separated from other primary tumors. Furthermore, the lengths of the edges to the leaf nodes in the dendrogram are shorter than that in the ES signature, which suggests that the clustering of primary tumors is tighter than that using the ES signature. This is a consequence of normalizing the expression level of metastatic tumors to the distribution of primary tumors by utilizing phenotypic information in the estimation of POE. The heatmaps visually demonstrate the difference between the ES signatures and the POE signatures. We next used NIH DAVID <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> to determine if there were functional groups enriched for in our gene expression signatures. In terms of gene annotation, the POE EM and MCMC signatures share many common functional categories because they have many UGIDs in common such as response to stress, immune response, endopeptidase and enzyme inhibitor activity, cell organization and biogenesis, and regulation of cell cycle. The class of functions common to the POE and ES signatures is cell cycle processes. GO terms such as antigen processing, endogenous antigen via MHC class I, DNA repair, many metabolism and transport activities appear in the ES signatures only. Also, a literature search has suggested the association of POE signature genes and their corresponding GO terms with tumor invasion and metastasis in various cancer types. For example, ALDH1A1 (stress response) and MAPK3 (cell proliferation) are targets of the HGF/MET signaling pathway which has been associated with tumor metastasis and poor prognosis in human hepatocellular carcinomas <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. In another example, overexpression of PFN2 (Regulation of actin cytoskeleton) and UBS (stress response) has been associated with lymph node metastasis of gastric cancer <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and colon cancer <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> respectively. These observations indicate that the POE signatures lead to relevant findings toward understanding the potential mechanism of differentiation of metastatic tumors from primary tumors.</p>
            <p>Finally, an additional validation of the method was attempted to see if the resulting gene expression signature can discriminate lethal from nonlethal cancers in an early detected population of cancers. Note that the signature selection was primarily oriented toward the distinction of metastatic tumors from primary tumors. Thus validation here is based on the conjecture that many metastatic tumors are highly likely to initiate lethal condition. We addressed this issue by using the data from the van't Veer <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> study. Their study profiled 98 primary breast cancer samples in Hu25K inkjet arrays. Among these patients, 34 patients developed distant metastases within 5 years, 44 patients continued to be disease-free after a period of at least 5 years. Other 20 patients either had BRCA1 germline mutations or were BRCA2 carriers; we excluded these samples from the analysis.</p>
            <p>The study was based on a large inkjet microarray profiling over 25,000 probes. About two-thirds of 1633 genes used in the three cancer studies appear in the Van't Veer <it>et al</it>. data. Based on the classifier trained from the three cancer datasets described above, we mapped the genes from the signatures to those in the van't Veer <it>et al</it>. data. We generated risk indices for subjects in van't Veer <it>et al</it>. Specifically, we first transformed the van't Veer <it>et al</it>. data to the POE scale using both the EM and MCMC algorithms without using the phenotypic information to prevent overfitting. Note that we did not consider the effect size and Conlon signatures here. Then we calculate the log odds ratio for each patient using the coefficients trained from training data and the newly generated POE data. Note that the estimated regression coefficients for the risk score came from the training set. As expected, the derived risk indices using the data from the EM and MCMC algorithms are highly correlated (Pearson correlation 0.83).</p>
            <p>A proportional hazards model <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> relating metastasis-free survival to the risk index, adjusting for covariates, was fit to the data. Tables <tblr tid="T3">3</tblr> and <tblr tid="T4">4</tblr> shows the results. In both analyses using data from the EM and MCMC algorithms concur in that the derived risk indices are strong predictor of metastasis-free survival times. This association remains strong even after adjusting for estrogen receptor status and age. Since we are interested in risk prediction, we calculated the C-index <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> to see if the gene expression signature adds discriminatory information relative to estrogen status and age. For the model with just age and estrogen status, the C-index is 0.714. For the EM-based POE signature, the C-index with all three variables (Multivariate model in Table <tblr tid="T3">3</tblr>) is 0.722. For the MCMC-based POE signature, the C-index with all three variables (Multivariate model in Table <tblr tid="T4">4</tblr>) is 0.748.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Results of EM POE-based survival analysis</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Analysis</p>
                     </c>
                     <c ca="left">
                        <p>Variable</p>
                     </c>
                     <c ca="center">
                        <p>Coef</p>
                     </c>
                     <c ca="center">
                        <p><it>p</it>-value</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Univariate</p>
                     </c>
                     <c ca="left">
                        <p>Risk Index</p>
                     </c>
                     <c ca="center">
                        <p>0.015</p>
                     </c>
                     <c ca="center">
                        <p>0.005</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Multivariate</p>
                     </c>
                     <c ca="left">
                        <p>Risk Index</p>
                     </c>
                     <c ca="center">
                        <p>0.010</p>
                     </c>
                     <c ca="center">
                        <p>0.049</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Estrogen Receptor Pos</p>
                     </c>
                     <c ca="center">
                        <p>-0.697</p>
                     </c>
                     <c ca="center">
                        <p>0.058</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Age</p>
                     </c>
                     <c ca="center">
                        <p>-0.059</p>
                     </c>
                     <c ca="center">
                        <p>0.021</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Cox proportional hazards model fitted to time to distant metastasis in lymph node from study by [14]. Risk indices were derived based on POE data transformed by the EM algorithm.</p>
               </tblfn>
            </tbl>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Results of MCMC POE-based survival analysis</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Analysis</p>
                     </c>
                     <c ca="left">
                        <p>Variable</p>
                     </c>
                     <c ca="center">
                        <p>Coef</p>
                     </c>
                     <c ca="center">
                        <p><it>p</it>-value</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Univariate</p>
                     </c>
                     <c ca="left">
                        <p>Risk Index</p>
                     </c>
                     <c ca="center">
                        <p>0.036</p>
                     </c>
                     <c ca="center">
                        <p>&lt; 0.001</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Multivariate</p>
                     </c>
                     <c ca="left">
                        <p>Risk Index</p>
                     </c>
                     <c ca="center">
                        <p>0.027</p>
                     </c>
                     <c ca="center">
                        <p>0.008</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Estrogen Receptor Pos</p>
                     </c>
                     <c ca="center">
                        <p>-0.580</p>
                     </c>
                     <c ca="center">
                        <p>0.120</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Age</p>
                     </c>
                     <c ca="center">
                        <p>-0.056</p>
                     </c>
                     <c ca="center">
                        <p>0.030</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Cox proportional hazards model fitted to time distant metastasis in lymph node from [14]. Risk indices were derived based on the POE data transformed by the MCMC algorithm.</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Ideally, we wish to use all common genes from the available studies for meta-analysis. However, one issue that has been debated recently is that of reproducibility of genes across studies <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. A technique that has proven to be useful as a filtering device to enhance comparability across arrays of different platforms is the integrative correlation coefficient or correlation of correlation coefficients <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. The idea underlying this method is that while raw expression values vary from study to study, the intergene correlations do not vary as much. The intergene correlations are calculated across all samples; this yields <it>a N &#215; N </it>matrix for each study. The row-wise averages are taken for each study and then calculated, and the correlations of these averages between these studies is then calculated. Thus, one would consider combining genes that have similar intergene correlations across the studies. For normally distributed data, the sample correlation is independent of the sample mean. Thus, genes selected based on an integrative correlation filter need not necessarily be highly expressed genes. We could perform this as a filtering step before applying the proposed meta-analysis methodologies; we did not do so here. The drawback to such a measure is that the filtering of genes might reduce the chance of finding subtypes in the datasets because the genes that define such subtypes may be excluded based on the integrative correlation coefficient.</p>
         <p>One limitation of our methodology is that it is still subject to the usual meta-analysis assumption that the transformed expression measurements (the POE values) are directly combinable across studies. If in fact we are trying to combine measurements for fundamental different quantities across multiple studies, then this in fact renders the meta-analysis invalid. However, for that situation most meta-analytical approaches are invalid, and one would need more sophisticated modelling assumptions.</p>
         <p>A related issue to this is that of heterogeneity. The results of the analysis here should be interpreted with some caution in that we are comparing metastatic versus nonmetastatic tumors across a variety of tissue types. We made the assumption that the differences between the two types of tumors are the same across the three studies. If this is not true, then it might be quite possible that what we are detecting are in fact tissue-specific differences between metastatic and non-metastatic tissues. It is of interest to develop methods for assessing heterogeneity in meta-analyses of genomic data so that they may be applied before using the proposed methodology in the paper.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>With the proliferation of genomic datasets from related studies by different scientific groups, an important method for increasing power will be to combine results across the different studies. In this article, we have proposed a model-based approach to doing this. Being able to integrate and interpret multiple genomic datasets will be an important enterprise for data analysts working in bioinformatics to address in the future.</p>
         <p>A question orthogonal to that looked at in this paper but of equal scientific importance is that of identifying genes whose expression is correlated across a subset of the samples. This is referred to as molecular subtype analysis and was in fact one motiviation of the POE algorithm proposed by Parmigiani et al. (2002). However, finding such gene signatures would require the use of completely different statistical methods than those proposed here and is beyond the scope of the current paper.</p>
         <p>Several important issues to consider when integrating microarray studies include use of different gene expression measurement scales, varying analytical power and reliability of the results for individual studies. To address these issues in a meta-analysis framework, we proposed a two-stage mixture modeling strategy. The goal of the mixture model-based transformation is to transform the preprocessed data to the probability scale, which are then integrated across datasets. In particular, the signed probability of differential expression <it>p</it><sup><it>d </it></sup>is easily interpretable and is platform-independent. The Normal-Uniform mixture distribution under a Bayesian hierarchical model setting has several desirable properties. We have also proposed an alternative model based on a two-component mixture and estimation using the EM algorithm. We briefly compare the MCMC and EM algorithms. The advantage of the former method is that it pools information across all genes, while the latger approach does not. However, the EM algorithm is computationally much faster than the MCMC scheme. In our example, we find that there is substantial overlap between the two approaches for the metastasis data considered here. However, we also expect that for cancer studies, the EM algorithm would fare better with larger phenotypic differences (e.g., non-cancer versus cancer tissue), while the MCMC approach would be of use when the phenotypic differences in samples are subtle (e.g., Gleason score &#8804; 7 versus > 7 in prostate cancer).</p>
         <p>Combining samples on the probability scale mitigates the influence of potential artifacts from a single study. The effect is reflected on two counts. One, integrated sample cohorts improve the reliability of the findings by guarding against false positive results from a single study. Two, it increases the statistical power to detect small consistent effects that can be otherwise masked by inadequacy of the sample size of an individual data set. By implementing this modeling approach, we were able to combine information from three microarray studies to build an inter-study validated signature for discriminating metastatic cancer from non-metastatic cancer.</p>
         <p>The statistical methods described in the paper are available as an R package, metaArray, which is available through the Bioconductor project at <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data Structures and Probabilistic Models</p>
            </st>
            <p>Let <inline-formula><m:math name="1471-2105-8-364-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>x</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG4baEdaqhaaWcbaGaemyAaKMaemOAaOgabaGaem4AaSgaaaaa@3269@</m:annotation></m:semantics></m:math></inline-formula> denote the gene expression measurement for gene <it>i </it>from sample. <it>j </it>in study <it>k</it>, transformed using the base two logarithm, <it>i </it>= 1, ..., <it>N</it>, <it>j </it>= 1, ..., <it>M</it><sub><it>k</it></sub>, <it>k </it>= 1, ..., <it>K</it>. Note that we assume that there are <it>N </it>common genes in all <it>K </it>studies, but the number of arrays in studies may vary. We also assume that preprocessing has been done, either by a lowess normalization for two-channel microarray data <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> or a robust multichip average analysis for Affymetrix data <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Then the available data can be denoted by <inline-formula><m:math name="1471-2105-8-364-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mrow><m:mrow><m:mo>{</m:mo><m:mrow><m:msup><m:mi>X</m:mi><m:mi>k</m:mi></m:msup></m:mrow><m:mo>}</m:mo></m:mrow></m:mrow><m:mrow><m:mi>k</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>K</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGadeqaaiabdIfaynaaCaaaleqabaGaem4AaSgaaaGccaGL7bGaayzFaaWaa0baaSqaaiabdUgaRjabg2da9iabigdaXaqaaiabdUealbaaaaa@364E@</m:annotation></m:semantics></m:math></inline-formula>, where <it>X</it><sup><it>k </it></sup>is a <it>N </it>&#215; <it>M</it><sub><it>k </it></sub>matrix whose (<it>i</it>, <it>j</it>)th entry is <inline-formula><m:math name="1471-2105-8-364-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>x</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG4baEdaqhaaWcbaGaemyAaKMaemOAaOgabaGaem4AaSgaaaaa@3269@</m:annotation></m:semantics></m:math></inline-formula>. Note that the value and interpretation of <inline-formula><m:math name="1471-2105-8-364-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>x</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG4baEdaqhaaWcbaGaemyAaKMaemOAaOgabaGaem4AaSgaaaaa@3269@</m:annotation></m:semantics></m:math></inline-formula> is inherently different across array platforms and is not necessarily comparable if they are measured from independent studies.</p>
            <p>Corresponding to <inline-formula><m:math name="1471-2105-8-364-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>x</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG4baEdaqhaaWcbaGaemyAaKMaemOAaOgabaGaem4AaSgaaaaa@3269@</m:annotation></m:semantics></m:math></inline-formula>, let <inline-formula><m:math name="1471-2105-8-364-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mstyle mathsize="140%" displaystyle="true"><m:mi>e</m:mi></m:mstyle><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqfWaqabSqaaiabdMgaPjabdQgaQbqaaiabdUgaRbqdbaGaemyzaugaaaaa@327B@</m:annotation></m:semantics></m:math></inline-formula> be a variable that takes one of three values {1, 0, -1}, indicating over-, baseline- or under- expression respectively for gene <it>i </it>in sample <it>j </it>for the study <it>k</it>. If <inline-formula><m:math name="1471-2105-8-364-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mstyle mathsize="140%" displaystyle="true"><m:mi>e</m:mi></m:mstyle><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqfWaqabSqaaiabdMgaPjabdQgaQbqaaiabdUgaRbqdbaGaemyzaugaaaaa@327B@</m:annotation></m:semantics></m:math></inline-formula> were known, then this is a gene-specific quantity that would provide a platform-free scale which could be combined across multiple studies. We approach this problem by treating <inline-formula><m:math name="1471-2105-8-364-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mstyle mathsize="140%" displaystyle="true"><m:mi>e</m:mi></m:mstyle><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqfWaqabSqaaiabdMgaPjabdQgaQbqaaiabdUgaRbqdbaGaemyzaugaaaaa@327B@</m:annotation></m:semantics></m:math></inline-formula> as a latent variable that is inferred from the data using a mixture model. We now present two probabilistic specifications for making inference about <inline-formula><m:math name="1471-2105-8-364-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mstyle mathsize="140%" displaystyle="true"><m:mi>e</m:mi></m:mstyle><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqfWaqabSqaaiabdMgaPjabdQgaQbqaaiabdUgaRbqdbaGaemyzaugaaaaa@327B@</m:annotation></m:semantics></m:math></inline-formula>. The first was presented in <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>; we describe it here for the sake of completeness. The second assumes <inline-formula><m:math name="1471-2105-8-364-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mstyle mathsize="140%" displaystyle="true"><m:mi>e</m:mi></m:mstyle><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqfWaqabSqaaiabdMgaPjabdQgaQbqaaiabdUgaRbqdbaGaemyzaugaaaaa@327B@</m:annotation></m:semantics></m:math></inline-formula>, take two values of 0 and 1 and involves fitting a two-component mixture model using the EM algorithm. Both specifications aim to map the original expression values to POE values within a study. We then can combine the POE values across the multiple studies. In the simplest case, a direct group comparison can be made by calculating <it>t</it>-statistics or applying significance analysis of microarrays <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> to the combined data. In the remainder of this section, we describe the two approaches for obtaining POE expression values. From here on, we suppress the study indicator <it>k </it>throughout this section because estimation is performed within each study separately, with the only exception being the use of <it>M</it><sub><it>k </it></sub>to denote the number of samples in study <it>k</it>.</p>
         </sec>
         <sec>
            <st>
               <p>Bayesian Model-based Approach and Algorithm</p>
            </st>
            <p>This approach was first explored for a single study setting in <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and used in the meta-analysis setting in <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. We present it here for the sake of completeness. The estimation of the POE values involves borrowing information across all genes.</p>
            <p>First, the model specification is described. Following the approach of <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, we assume that the expression <it>x</it><sub><it>ij </it></sub>of gene <it>i </it>in sample <it>j </it>is a realization of the following mixture model:</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1471-2105-8-364-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>x</m:mi>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mi>j</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo>|</m:mo>
                                       <m:msub>
                                          <m:mi>&#956;</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:msub>
                                          <m:mi>&#945;</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#954;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mo>+</m:mo>
                                       </m:msubsup>
                                       <m:mo>,</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#954;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                       </m:msubsup>
                                       <m:mo>,</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#963;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mn>2</m:mn>
                                       </m:msubsup>
                                       <m:munderover>
                                          <m:mo>~</m:mo>
                                          <m:mrow/>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mi>n</m:mi>
                                             <m:mi>d</m:mi>
                                          </m:mrow>
                                       </m:munderover>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:msubsup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mo>+</m:mo>
                                       </m:msubsup>
                                       <m:mi mathvariant="script">U</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>&#956;</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>+</m:mo>
                                       <m:msub>
                                          <m:mi>&#945;</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:msub>
                                          <m:mi>&#956;</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>+</m:mo>
                                       <m:msub>
                                          <m:mi>&#945;</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                       <m:mo>+</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#954;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mo>+</m:mo>
                                       </m:msubsup>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow/>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mo>+</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mo>+</m:mo>
                                       </m:msubsup>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                       </m:msubsup>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mi mathvariant="script">N</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>&#956;</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>+</m:mo>
                                       <m:msub>
                                          <m:mi>&#945;</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#963;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mn>2</m:mn>
                                       </m:msubsup>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow/>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mo>+</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                       </m:msubsup>
                                       <m:mi mathvariant="script">U</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>&#956;</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>+</m:mo>
                                       <m:msub>
                                          <m:mi>&#945;</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#954;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                       </m:msubsup>
                                       <m:mo>,</m:mo>
                                       <m:msub>
                                          <m:mi>&#956;</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>+</m:mo>
                                       <m:msub>
                                          <m:mi>&#945;</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeWacaaabaGaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGG8baFiiGacqWF8oqBdaWgaaWcbaGaemyAaKgabeaakiabcYcaSiab=f7aHnaaBaaaleaacqWGQbGAaeqaaOGaeiilaWIae8NUdS2aa0baaSqaaiabdMgaPbqaaiabgUcaRaaakiabcYcaSiab=P7aRnaaDaaaleaacqWGPbqAaeaacqGHsislaaGccqGGSaalcqWFdpWCdaqhaaWcbaGaemyAaKgabaGaeGOmaidaaOWaaCbmaeaacqGG+bGFaSqaaaqaaiabdMgaPjabd6gaUjabdsgaKbaaaOqaaiab=b8aWnaaDaaaleaacqWGPbqAaeaacqGHRaWkaaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGccqGFueFvcqGGOaakcqWF8oqBdaWgaaWcbaGaemyAaKgabeaakiabgUcaRiab=f7aHnaaBaaaleaacqWGQbGAaeqaaOGaeiilaWIae8hVd02aaSbaaSqaaiabdMgaPbqabaGccqGHRaWkcqWFXoqydaWgaaWcbaGaemOAaOgabeaakiabgUcaRiab=P7aRnaaDaaaleaacqWGPbqAaeaacqGHRaWkaaGccqGGPaqkaeaaaeaacqGHRaWkcqGGOaakcqaIXaqmcqGHsislcqWFapaCdaqhaaWcbaGaemyAaKgabaGaey4kaScaaOGaeyOeI0Iae8hWda3aa0baaSqaaiabdMgaPbqaaiabgkHiTaaakiabcMcaPiab+1q8ojabcIcaOiab=X7aTnaaBaaaleaacqWGPbqAaeqaaOGaey4kaSIae8xSde2aaSbaaSqaaiabdQgaQbqabaGccqGGSaalcqWFdpWCdaqhaaWcbaGaemyAaKgabaGaeGOmaidaaOGaeiykaKcabaaabaGaey4kaSIae8hWda3aa0baaSqaaiabdMgaPbqaaiabgkHiTaaakiab+rr8vjabcIcaOiab=X7aTnaaBaaaleaacqWGPbqAaeqaaOGaey4kaSIae8xSde2aaSbaaSqaaiabdQgaQbqabaGccqGHsislcqWF6oWAdaqhaaWcbaGaemyAaKgabaGaeyOeI0caaOGaeiilaWIae8hVd02aaSbaaSqaaiabdMgaPbqabaGccqGHRaWkcqWFXoqydaWgaaWcbaGaemOAaOgabeaakiabcMcaPaaaaaa@B01F@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>&#956;</it><sub><it>i </it></sub>is the gene-specific effect of gene <it>i</it>, <it>&#945;</it><sub><it>j </it></sub>is the sample-specific effect in sample <it>j</it>. For the purposes of identifiability, sample effects <inline-formula><m:math name="1471-2105-8-364-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mrow><m:mrow><m:mo>{</m:mo><m:mrow><m:msub><m:mi>&#945;</m:mi><m:mi>i</m:mi></m:msub></m:mrow><m:mo>}</m:mo></m:mrow></m:mrow><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:msub><m:mi>M</m:mi><m:mi>k</m:mi></m:msub></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGadeqaaGGaciab=f7aHnaaBaaaleaacqWGPbqAaeqaaaGccaGL7bGaayzFaaWaa0baaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd2eannaaBaaameaacqWGRbWAaeqaaaaaaaa@3842@</m:annotation></m:semantics></m:math></inline-formula> are constrained to sum to zero. The parameters <inline-formula><m:math name="1471-2105-8-364-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#954;</m:mi><m:mi>i</m:mi><m:mo>+</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF6oWAdaqhaaWcbaGaemyAaKgabaGaey4kaScaaaaa@30CF@</m:annotation></m:semantics></m:math></inline-formula> and <inline-formula><m:math name="1471-2105-8-364-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#954;</m:mi><m:mi>i</m:mi><m:mo>&#8722;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF6oWAdaqhaaWcbaGaemyAaKgabaGaeyOeI0caaaaa@30DA@</m:annotation></m:semantics></m:math></inline-formula> provide limits to the uniform distribution in the mixture of gene <it>i</it>, and are set to be at least either 3<it>&#963;</it><sub><it>i </it></sub>away from the mean of normal distribution or farther away than the most outlying expression. The parameters <inline-formula><m:math name="1471-2105-8-364-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#960;</m:mi><m:mi>i</m:mi><m:mo>+</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFapaCdaqhaaWcbaGaemyAaKgabaGaey4kaScaaaaa@30DA@</m:annotation></m:semantics></m:math></inline-formula> &#8801; <it>P</it>(<it>e</it><sub><it>ij </it></sub>= 1) and <inline-formula><m:math name="1471-2105-8-364-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#960;</m:mi><m:mi>i</m:mi><m:mo>&#8722;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFapaCdaqhaaWcbaGaemyAaKgabaGaeyOeI0caaaaa@30E5@</m:annotation></m:semantics></m:math></inline-formula> &#8801; <it>P</it>(<it>e</it><sub><it>ij </it></sub>= -1) are the multinomial probabilities for the latent variable <inline-formula><m:math name="1471-2105-8-364-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mstyle mathsize="140%" displaystyle="true"><m:mi>e</m:mi></m:mstyle><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqfWaqabSqaaiabdMgaPjabdQgaQbqaaiabdUgaRbqdbaGaemyzaugaaaaa@327B@</m:annotation></m:semantics></m:math></inline-formula>. Conceptually, we can think of gene expression for the <it>i</it>-th gene arising from three types of genes in model (1). The first component in the model represents the type of genes whose expression levels are overexpressed in the cancer samples relative to the normal samples. The second corresponds to genes that do not change between cancer and normal samples, and the third is for genes that are underexpressed in cancer samples relative to normal.</p>
            <p>Let <inline-formula><m:math name="1471-2105-8-364-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>p</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mo>+</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKMaemOAaOgabaGaey4kaScaaaaa@31DC@</m:annotation></m:semantics></m:math></inline-formula> &#8801; <it>P</it>(<it>e</it><sub><it>ij </it></sub>= 1|<it>x</it><sub><it>ij</it></sub>) and <inline-formula><m:math name="1471-2105-8-364-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>p</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mo>&#8722;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKMaemOAaOgabaGaeyOeI0caaaaa@31E7@</m:annotation></m:semantics></m:math></inline-formula> &#8801; <it>P </it>(<it>e</it><sub><it>ij </it></sub>= -1|<it>x</it><sub><it>ij</it></sub>) be the conditional probabilities of over and underexpression for gene <it>i </it>in sample <it>j </it>given the microarray measurements. By Bayes' rule,</p>
            <p>
               <display-formula id="M2">
                  <m:math name="1471-2105-8-364-i12" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>p</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                              <m:mo>+</m:mo>
                           </m:msubsup>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>+</m:mo>
                                 </m:msubsup>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mi>i</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>x</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>+</m:mo>
                                 </m:msubsup>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mi>i</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>x</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>+</m:mo>
                                 <m:msubsup>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                 </m:msubsup>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mi>i</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>x</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>+</m:mo>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msubsup>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>+</m:mo>
                                 </m:msubsup>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msubsup>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                 </m:msubsup>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mn>0</m:mn>
                                       <m:mi>i</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>x</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKMaemOAaOgabaGaey4kaScaaOGaeyypa0ZaaSaaaeaaiiGacqWFapaCdaqhaaWcbaGaemyAaKgabaGaey4kaScaaOGaemOzay2aaSbaaSqaaiabigdaXiabdMgaPbqabaGccqGGOaakcqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabcMcaPaqaaiab=b8aWnaaDaaaleaacqWGPbqAaeaacqGHRaWkaaGccqWGMbGzdaWgaaWcbaGaeGymaeJaemyAaKgabeaakiabcIcaOiabdIha4naaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeiykaKIaey4kaSIae8hWda3aa0baaSqaaiabdMgaPbqaaiabgkHiTaaakiabdAgaMnaaBaaaleaacqGHsislcqaIXaqmcqWGPbqAaeqaaOGaeiikaGIaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGGPaqkcqGHRaWkcqGGOaakcqaIXaqmcqGHsislcqWFapaCdaqhaaWcbaGaemyAaKgabaGaey4kaScaaOGaeyOeI0Iae8hWda3aa0baaSqaaiabdMgaPbqaaiabgkHiTaaakiabcMcaPiabdAgaMnaaBaaaleaacqaIWaamcqWGPbqAaeqaaOGaeiikaGIaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGGPaqkaaaaaa@76E1@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>and</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1471-2105-8-364-i13" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>p</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                              <m:mo>&#8722;</m:mo>
                           </m:msubsup>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                 </m:msubsup>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mi>i</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>x</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>+</m:mo>
                                 </m:msubsup>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mi>i</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>x</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>+</m:mo>
                                 <m:msubsup>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                 </m:msubsup>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mi>i</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>x</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>+</m:mo>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msubsup>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>+</m:mo>
                                 </m:msubsup>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msubsup>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                 </m:msubsup>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mn>0</m:mn>
                                       <m:mi>i</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>x</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKMaemOAaOgabaGaeyOeI0caaOGaeyypa0ZaaSaaaeaaiiGacqWFapaCdaqhaaWcbaGaemyAaKgabaGaeyOeI0caaOGaemOzay2aaSbaaSqaaiabgkHiTiabigdaXiabdMgaPbqabaGccqGGOaakcqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabcMcaPaqaaiab=b8aWnaaDaaaleaacqWGPbqAaeaacqGHRaWkaaGccqWGMbGzdaWgaaWcbaGaeGymaeJaemyAaKgabeaakiabcIcaOiabdIha4naaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeiykaKIaey4kaSIae8hWda3aa0baaSqaaiabdMgaPbqaaiabgkHiTaaakiabdAgaMnaaBaaaleaacqGHsislcqaIXaqmcqWGPbqAaeqaaOGaeiikaGIaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGGPaqkcqGHRaWkcqGGOaakcqaIXaqmcqGHsislcqWFapaCdaqhaaWcbaGaemyAaKgabaGaey4kaScaaOGaeyOeI0Iae8hWda3aa0baaSqaaiabdMgaPbqaaiabgkHiTaaakiabcMcaPiabdAgaMnaaBaaaleaacqaIWaamcqWGPbqAaeqaaOGaeiikaGIaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGGPaqkaaaaaa@77E4@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>f</it><sub>0<it>i </it></sub>is the normal density function, and <it>f</it><sub>1<it>i</it></sub>, <it>f</it><sub>-1<it>i </it></sub>are the corresponding uniform densities for the differential expression categories for the gene <it>i </it>in each study. In the numerator of (2), <it>f</it><sub>1<it>i </it></sub>= 1/<inline-formula><m:math name="1471-2105-8-364-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#954;</m:mi><m:mi>i</m:mi><m:mo>+</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF6oWAdaqhaaWcbaGaemyAaKgabaGaey4kaScaaaaa@30CF@</m:annotation></m:semantics></m:math></inline-formula> if <it>x</it><sub><it>ij </it></sub>&#8712; [<it>&#956;</it><sub><it>i </it></sub>+ <it>&#945;</it><sub><it>j</it></sub>, <it>&#956;</it><sub><it>i </it></sub>+ <it>&#945;</it><sub><it>j </it></sub>+ <inline-formula><m:math name="1471-2105-8-364-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#954;</m:mi><m:mi>i</m:mi><m:mo>+</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF6oWAdaqhaaWcbaGaemyAaKgabaGaey4kaScaaaaa@30CF@</m:annotation></m:semantics></m:math></inline-formula>] and 0 otherwise. In the numerator of (3), <it>f</it><sub>-1<it>j </it></sub>= 1/<inline-formula><m:math name="1471-2105-8-364-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#954;</m:mi><m:mi>i</m:mi><m:mo>&#8722;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF6oWAdaqhaaWcbaGaemyAaKgabaGaeyOeI0caaaaa@30DA@</m:annotation></m:semantics></m:math></inline-formula> if <it>x</it><sub><it>ij </it></sub>&#8712; [-<inline-formula><m:math name="1471-2105-8-364-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#954;</m:mi><m:mi>i</m:mi><m:mo>&#8722;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF6oWAdaqhaaWcbaGaemyAaKgabaGaeyOeI0caaaaa@30DA@</m:annotation></m:semantics></m:math></inline-formula> + <it>&#956;</it><sub><it>i </it></sub>+ <it>&#945;</it><sub><it>j</it></sub>, <it>&#956;</it><sub><it>i </it></sub>+ <it>&#945;</it><sub><it>j</it></sub>] and 0 otherwise.</p>
            <p>Note that the supports of the two uniform distributions are disjoint. As a result, the probabilities of differential expression are mutually exclusive with the following forms:</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-8-364-i14" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msup>
                              <m:mi>p</m:mi>
                              <m:mo>+</m:mo>
                           </m:msup>
                           <m:mo>,</m:mo>
                           <m:msup>
                              <m:mi>p</m:mi>
                              <m:mo>&#8722;</m:mo>
                           </m:msup>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mo>+</m:mo>
                                       </m:msup>
                                       <m:mo>/</m:mo>
                                       <m:msup>
                                          <m:mi>&#954;</m:mi>
                                          <m:mo>+</m:mo>
                                       </m:msup>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mo>+</m:mo>
                                       </m:msup>
                                       <m:mo>/</m:mo>
                                       <m:msup>
                                          <m:mi>&#954;</m:mi>
                                          <m:mo>+</m:mo>
                                       </m:msup>
                                       <m:mo>+</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mo>+</m:mo>
                                       </m:msup>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                       </m:msup>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:msub>
                                          <m:mi>f</m:mi>
                                          <m:mn>0</m:mn>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mfrac>
                                 <m:mo>,</m:mo>
                                 <m:mn>0</m:mn>
                              </m:mrow>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGGOaakcqWGWbaCdaahaaWcbeqaaiabgUcaRaaakiabcYcaSiabdchaWnaaCaaaleqabaGaeyOeI0caaOGaeiykaKIaeyypa0ZaaeWaaeaadaWcaaqaaGGaciab=b8aWnaaCaaaleqabaGaey4kaScaaOGaei4la8Iae8NUdS2aaWbaaSqabeaacqGHRaWkaaaakeaacqWFapaCdaahaaWcbeqaaiabgUcaRaaakiabc+caViab=P7aRnaaCaaaleqabaGaey4kaScaaOGaey4kaSIaeiikaGIaeGymaeJaeyOeI0Iae8hWda3aaWbaaSqabeaacqGHRaWkaaGccqGHsislcqWFapaCdaahaaWcbeqaaiabgkHiTaaakiabcMcaPiabdAgaMnaaBaaaleaacqaIWaamaeqaaaaakiabcYcaSiabicdaWaGaayjkaiaawMcaaaaa@5344@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-8-364-i15" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msup>
                              <m:mi>p</m:mi>
                              <m:mo>+</m:mo>
                           </m:msup>
                           <m:mo>,</m:mo>
                           <m:msup>
                              <m:mi>p</m:mi>
                              <m:mo>&#8722;</m:mo>
                           </m:msup>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mrow>
                                 <m:mn>0</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                       </m:msup>
                                       <m:mo>/</m:mo>
                                       <m:msup>
                                          <m:mi>&#954;</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                       </m:msup>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                       </m:msup>
                                       <m:mo>/</m:mo>
                                       <m:msup>
                                          <m:mi>&#954;</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                       </m:msup>
                                       <m:mo>+</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mo>+</m:mo>
                                       </m:msup>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msup>
                                          <m:mi>&#960;</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                       </m:msup>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:msub>
                                          <m:mi>f</m:mi>
                                          <m:mn>0</m:mn>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mfrac>
                              </m:mrow>
                              <m:mo>)</m:mo>
                           </m:mrow>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGGOaakcqWGWbaCdaahaaWcbeqaaiabgUcaRaaakiabcYcaSiabdchaWnaaCaaaleqabaGaeyOeI0caaOGaeiykaKIaeyypa0ZaaeWaaeaacqaIWaamcqGGSaaldaWcaaqaaGGaciab=b8aWnaaCaaaleqabaGaeyOeI0caaOGaei4la8Iae8NUdS2aaWbaaSqabeaacqGHsislaaaakeaacqWFapaCdaahaaWcbeqaaiabgkHiTaaakiabc+caViab=P7aRnaaCaaaleqabaGaeyOeI0caaOGaey4kaSIaeiikaGIaeGymaeJaeyOeI0Iae8hWda3aaWbaaSqabeaacqGHRaWkaaGccqGHsislcqWFapaCdaahaaWcbeqaaiabgkHiTaaakiabcMcaPiabdAgaMnaaBaaaleaacqaIWaamaeqaaaaaaOGaayjkaiaawMcaaiabc6caUaaa@5454@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>We then construct the following expression measurement: <inline-formula><m:math name="1471-2105-8-364-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>p</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>d</m:mi></m:msubsup><m:mo>=</m:mo><m:msubsup><m:mi>p</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mo>+</m:mo></m:msubsup><m:mo>&#8722;</m:mo><m:msubsup><m:mi>p</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mo>&#8722;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKMaemOAaOgabaGaemizaqgaaOGaeyypa0JaemiCaa3aa0baaSqaaiabdMgaPjabdQgaQbqaaiabgUcaRaaakiabgkHiTiabdchaWnaaDaaaleaacqWGPbqAcqWGQbGAaeaacqGHsislaaaaaa@3EBD@</m:annotation></m:semantics></m:math></inline-formula>, ranging from -1 to 1. This is the probability of expression (POE); it can be interpreted as the signed conditional probability of differential expression of gene <it>i </it>in sample <it>j </it>in an individual study. On first glance, our formula differs from that in <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> in that their POE measure, corresponding to <inline-formula><m:math name="1471-2105-8-364-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>p</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>d</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKMaemOAaOgabaGaemizaqgaaaaa@324B@</m:annotation></m:semantics></m:math></inline-formula>, involves addition, while ours involves subtraction. The two are equivalent, however, since their second probability is constrained to be negative, while <inline-formula><m:math name="1471-2105-8-364-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>p</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mo>&#8722;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKMaemOAaOgabaGaeyOeI0caaaaa@31E7@</m:annotation></m:semantics></m:math></inline-formula> is constrained to be positive.</p>
            <p>To sample from the posterior distributions of the parameters, a Gibbs sampling algorithm (with Metropolis-Hastings step for mixture proportion parameters) was then implemented where the gene-specific parameters were repeatedly sampled from the corresponding full conditional distributions [See Additional File <supplr sid="S4">4</supplr>]. We thus fit the Bayesian algorithm to each microarray dataset separately. Note that there is one normal component distribution to the mixture, while the other two are uniform distributions. The reason we prefer this to a three-component normal mixture model is so that the probabilities of expression are monotonic functions of absolute gene expression. It can be shown that using a three-component normal mixture model, the POE is no longer a monotonic function of gene expression. It is desirable to have the monotonicity property as we would like larger differences in gene expression relative to a baseline group to be associated with greater statistical evidence of differential expression. Note that this is a standard assumption made in the meta-analysis methodologies.</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>Derivation of conditional distributions for the MCMC-based POE algorithm. This file contains the details of the full conditional distributions from which samples of posterior distribution are drawn.</p>
               </text>
               <file name="1471-2105-8-364-S4.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>There are several advantages to the mixture model-based transformation. First, the method estimates the posterior distribution for the latent variables <it>e</it><sub><it>ij</it></sub>, which can then be combined across multiple studies. Second, the transformed values carry meaningful interpretations as signed probabilities of differential expression of a gene in a particular sample. Third, the underlying normal and uniform mixture distributions give equal density in the tails and is effective in reducing the influence of extreme expression values. Finally, the Bayesian hierarchical modeling approach borrows strength across genes, resulting in shrinkage-type estimators for the gene-specific parameters. Consequently, the high-dimensional gene expression data are denoised. However, the algorithm for inferring the posterior distribution of the latent variables is fairly computationally intensive. In the next section, we discuss an alternative mixture model specification that leads to a more computationally efficient algorithm.</p>
         </sec>
         <sec>
            <st>
               <p>Maximum Likelihood Approach using EM algorithm</p>
            </st>
            <p>Maximum likelihood estimation (MLE) using the EM algorithm leads to greater increases in computational speed for mixture models. Such an approach might be useful since what we are interested in eventually is estimates of POE that we can integrate across studies. There is a difficulty in implementing an EM algorithm for the three-component mixture model we considered in the previous section. Recall the restriction that the uniform components must have the same heights. Since the MLE of end points of uniform distributions are the most outlying observations, we have found in some examples that the EM algorithm with these MLEs provides parameter estimates that are unstable.</p>
            <p>As an alternative modelling approach, suppose we take the three values for the latent variables <it>e</it><sub><it>ij </it></sub>from the previous section (<it>e</it><sub><it>ij </it></sub>= {-1, 0, 1}) and collapse them into two possible values, <it>e</it><sub><it>ij </it></sub>= 1 and <it>e</it><sub><it>ij </it></sub>= 0. Note that for the <it>i</it>th gene and <it>j</it>th sample, <it>e</it><sub><it>ij </it></sub>= 1 corresponds to differential expression in either direction, while <it>e</it><sub><it>ij </it></sub>= 0 represents nondifferential expression. We now consider the following two-component mixture model for the <it>i</it>th gene:</p>
            <p>
               <display-formula id="M4">
                  <m:math name="1471-2105-8-364-i18" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>x</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:munderover>
                              <m:mo>~</m:mo>
                              <m:mrow/>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>d</m:mi>
                              </m:mrow>
                           </m:munderover>
                           <m:msub>
                              <m:mi>&#960;</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mi mathvariant="script">U</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>a</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo>,</m:mo>
                           <m:msub>
                              <m:mi>b</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>+</m:mo>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>&#8722;</m:mo>
                           <m:msub>
                              <m:mi>&#960;</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mi mathvariant="script">N</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>&#956;</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo>,</m:mo>
                           <m:msubsup>
                              <m:mi>&#963;</m:mi>
                              <m:mi>i</m:mi>
                              <m:mn>2</m:mn>
                           </m:msubsup>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakmaaxadabaGaeiOFa4haleaaaeaacqWGPbqAcqWGUbGBcqWGKbazaaacciGccqWFapaCdaWgaaWcbaGaemyAaKgabeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaakiab+rr8vjabcIcaOiabdggaHnaaBaaaleaacqWGPbqAaeqaaOGaeiilaWIaemOyai2aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkcqGHRaWkcqGGOaakcqaIXaqmcqGHsislcqWFapaCdaWgaaWcbaGaemyAaKgabeaakiabcMcaPiab+1q8ojabcIcaOiab=X7aTnaaBaaaleaacqWGPbqAaeqaaOGaeiilaWIae83Wdm3aa0baaSqaaiabdMgaPbqaaiabikdaYaaakiabcMcaPiabcYcaSaaa@62C6@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>&#956;</it><sub><it>i </it></sub>is the mean of the normal distribution, <it>&#960;</it><sub><it>i </it></sub>is the mixing proportion and <it>a</it><sub><it>i</it></sub>, <it>b</it><sub><it>i </it></sub>are the two end points of Uniform distribution respectively. Conceptually, there are only two populations of genes in model (4). There is a constitutively expressed population common to both tumor and normal samples (the normal component) as well as a differentially expressed part (the uniform component). Such a model has been proposed in the situation for <it>K </it>= 1 in <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Their interest was in determining differentially expressed genes within one study, while ours is in combining results across multiple studies.</p>
            <p>For a fixed <it>i </it>(gene), the likelihood to be maximized for each gene is the following:</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-8-364-i19" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8719;</m:mo>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>n</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo>[</m:mo>
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>&#960;</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>b</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>a</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mo>+</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mn>1</m:mn>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>&#960;</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mn>2</m:mn>
                                                   <m:mi>&#960;</m:mi>
                                                   <m:msubsup>
                                                      <m:mi>&#963;</m:mi>
                                                      <m:mi>i</m:mi>
                                                      <m:mn>2</m:mn>
                                                   </m:msubsup>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:mn>1</m:mn>
                                                   <m:mo>/</m:mo>
                                                   <m:mn>2</m:mn>
                                                </m:mrow>
                                             </m:msup>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mi>exp</m:mi>
                                       <m:mo>&#8289;</m:mo>
                                       <m:mrow>
                                          <m:mo>{</m:mo>
                                          <m:mrow>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mfrac>
                                                <m:mrow>
                                                   <m:msup>
                                                      <m:mrow>
                                                         <m:mo stretchy="false">(</m:mo>
                                                         <m:msub>
        