Abstract
Background
With the explosion of microarray studies, an enormous amount of data is being produced. Systematic integration of gene expression data from different sources increases statistical power of detecting differentially expressed genes and allows assessment of heterogeneity. The challenge, however, is in designing and implementing efficient analytic methodologies for combination of data generated by different research groups.
Results
We extended traditional effect size models to combine information from different microarray datasets by incorporating a quality measure for each gene in each study into the effect size estimation. We illustrated our method by integrating two datasets generated using different Affymetrix oligonucleotide types. Our results indicate that the proposed qualityadjusted weighting strategy for modelling interstudy variation of gene expression profiles not only increases consistency and decreases heterogeneous results between these two datasets, but also identifies many more differentially expressed genes than methods proposed previously.
Conclusion
Data integration and synthesis is becoming increasingly important. We live in a highthroughput era where technologies constantly change leaving behind a trail of data with different forms, shapes and sizes. Statistical and computational methodologies are therefore critical for extracting the most out of these related but not identical sources of data.
Background
The introduction of DNA microarray technology has enabled investigators to screen thousands of genes simultaneously. One of the main goals of these studies is to identify differentially expressed genes between two biological conditions. For example, many studies [14] have been performed in prostate cancer research to find candidate markers. Since laboratory protocols, microarray platforms and analysis techniques used in these studies were not identical, it is difficult to make a comparison among the results obtained from them. However, systematic integration of gene expression data from different sources increases statistical power of detecting differentially expressed genes and allows assessment of heterogeneity. Metaanalysis is a classical statistical methodology for combining results from different studies addressing the same scientific questions, and it is becoming particularly popular in the area of medical and epidemiological research [5]. Metaanalysis methods have recently been applied to the analysis of microarray data [611].
Rhodes et al. [6] focused on combining pvalues for each gene from the individual studies to estimate an overall pvalue for each gene across all studies. Their method has been applied to four prostate cancer microarray datasets, two of which are cDNA microarray data and the remainder Affymetrix microarray data. Samples in each data set were taken from prostate cancer cases, but were analyzed with different platforms. Differential expression was assessed independently for each gene in each dataset. Since the method chosen to combine results across studies was based on the statistical confidence measure, the pvalue, not on the expression level, this strategy avoids direct comparisons of data sets and related crossplatform normalization issues. Choi et al. [7] focused on integrating effect size estimates to obtain an overall estimate of the average effect size. Effect size is used to measure the magnitude of treatment effect in a given study. Using the same datasets as those used by Rhodes et al. [6], they demonstrated that their method can lead to the discovery of small but consistent expression changes with increased sensitivity and reliability.
Parmigiani et al. [9] developed a correlationbased method for assessing reproducibility of gene expression studies with application to lung cancer. They demonstrated that their method can improve correlation across the various studies. Jiang et al. [10] used a distribution transformation method to integrate two lung cancer studies and proposed a gene shavingbased classification approach to identify a small list of differentially expressed genes between lung cancer and normal patients. They noted that many of the selected genes have been experimentally validated.
Although some of the above studies (for example Rhodes et al. [6], Choi et al. [7], Parmigiani et al. [9]) demonstrated the utility of integrating cDNA and Affymetrix microarray data, other investigators argued against this approach. Kuo et al. [12] compared Affymetrix and spotted cDNA gene expression measurements based on 60 cell lines from the National Cancer Institute. They found low correlation between the actual gene measurements from the two technologies, and concluded that "data from spotted cDNA microarrays could not be directly combined with data from synthesized oligonucleotide arrays." Moreover, they concluded that it was unlikely that the two types of data could be transformed or normalized into a common standardized index. Jarvinen et al. [13] determined the level of concordance between microarray platforms by analyzing breast cancer cell lines with in situ synthesized oligonucleotide arrays, commercial cDNA microarrays and custommade cDNA microarrays. Their results demonstrated that data from different microarray platforms are variable to the extent that direct integration of data from different platforms may be complicated and unreliable.
In classical metaanalysis, quality measures have often been used when combining results across studies. It has been argued that studies of a higher quality will give more accurate estimates of the true parameter of interest, and therefore studies of high quality should receive a higher weight in the analysis summarizing across studies [14]. The Affymetrix microarray technology has been used worldwide. Its success has been demonstrated by numerous publications in scientific journals. However, it is wellknown that only part (approximately 40–50%) of the whole genome is expressed in any specific tissue type, so there are many genes showing low expression and random variability across samples. Furthermore, some genes will be measured less precisely by some technologies, or studies, than others.
Therefore, our ability to develop powerful statistical methods for efficiently integrating and weighting information from related genomic experiments will be critical in the success of the massive investment made on genomic studies. The focus of this paper is to design and implement a quality measure appropriate for Affymetrix microarray data. Using our quality measure, we weighted the importance of each gene in each experiment and incorporated our quality measure into the effect size model proposed by Choi et al. [7] to model interstudy variation of gene expression profiles. We believe that applying this approach can lead to a more accurate description of expression patterns than Choi et al's method [7].
Results
We used two data sets consisting of gene expression profiles in lung cancer and normal subjects. These datasets were collected using different chip types of the Affymetrix oligonucleotide microarrays and were conducted by two research groups, one from Harvard and the other from Michigan (see Methods section for details). A list of 6124 common probe sets found in the two datasets was used for data analysis in this study [10]. We developed a quality weight for each gene in each study by modeling the log of the detection pvalues with an exponential distribution, and then summarizing across arrays and groups within each study (see Methods). In order to visualize the effect of the proposed quality weighting, we calculated the mean expression value of each probe set across all normal samples, where the expression variation presumably is less heterogeneous than among the cancer samples. The mean expression value of a probe set in a study is the estimated average of the probe set's intensity values across all normal samples in the study. Figure 1 shows the scatter plot of the average expression values of the probe sets in the Harvard dataset plotted against that of the Michigan dataset: (a) weighted by the quality score (see Methods section for our definition of the quality score), and (b) unweighted. This plot is intended to be illustrative only – our algorithm weights the test statistics, rather than the gene expression measures. In Figure 1(a), it can be seen, as expected, that many of the genes with low levels of expression are associated with low quality weights.
Figure 1. Scatter plots of (a) qualityweighted and (b) qualityunweighted mean expression values for the 6124 common probe sets in the Harvard and Michigan datasets based on normal (healthy) samples.
We show examples of quality scores for selected probe sets in Table 1. The two datasets may have very different detection pvalue distributions, which are reflected in the quality scores. Figure 2 shows a box plot of the detection pvalues for one probe set from Table 1. When the two datasets give small pvalues (e.g., last line in Table 1), the minimum pvalue may be much smaller in one dataset than another. Both, however, will give high quality scores with an appropriate choice of the sensitivity parameter s that adjusts how the quality measure interprets the detection pvalues.
Table 1. Quality scores for selected probe sets at a sensitivity parameter s = 0.2
Figure 2. Detection pvalues for a sample probe set (38249_at). H1 and H2 denote the detection pvalues in normal and lung cancer groups, respectively, for the Harvard study; whereas M1 and M2 denote the detection pvalues in normal and lung cancer groups, respectively, for the Michigan study.
Figure 3 shows the adjusted and unadjusted quantile – quantile (QQ) plots of the observed vs. expected Q values. Q is the test statistic we used for assessing heterogeneity, and is described in detail later in the Methods section. In the adjusted QQ plot, the quality score was used as a weight in the computation of Q while it was not considered in the unadjusted QQ plot. From these graphs, we can see that the quantiles of the observed Q values are far from the expected quantiles of a distribution, suggesting that these two datasets generated heterogeneous results beyond random sampling errors. Therefore, we applied the random effect model in this study. The quantiles of the Q statistic were closer to the quantiles of the expected chisquare distribution when qualityadjustment was considered (Figure 3(a)) than when it is was not (Figure 3(b)). The variance for the unadjusted Q values was 9.45, but it was reduced to 3.31 when quality adjustment was used. This result suggests that the incorporation of the adjusted quality measure into effect size estimation can increase consistency and decrease heterogeneity between these two datasets.
Figure 3. Quantile – Quantile plots of the observed versus the expected Q statistic: (a) with quality adjustment, and (b) without quality adjustment.
To identify a list of potentially "significant" genes, we adapted the false discovery rate (FDR) algorithm implemented in [30]. We first calculated the adjusted z statistics for all genes based on randomeffects model (REM). Genes were then ranked by the magnitude of their z statistic values. A permutationbased approach was used to obtain the corresponding expected ordered z statistic. The potentially "significant" genes are genes with a distance between the ordered z statistic from the observed data and that of the permuted data exceeding a given threshold (delta). Figure 4 shows the relationship between the number of significantly differentially expressed genes and different delta levels. As we see in this figure, the qualityadjusted REM can identify many more significant genes than the qualityunadjusted REM model at any fixed level of delta.
Figure 4. Relationship between number of significantly expressed genes and different delta levels, obtained from fitting the random effects model.
We calculated the estimated FDR for each given delta. As expected, the number of genes called significant increased as the value of delta decreased, but at the cost of an increasing FDR. The estimated FDR was zero down to a delta of 0.6, where 228 genes were called significant in the qualityadjusted model and 153 genes in the qualityunadjusted model. In order to get a manageable gene list, we arbitrarily set delta at 1.1. At this delta level, we selected 29 differentially expressed genes (representing 32 probe sets) at a FDR of 0% when the quality weight was incorporated into the estimation of the effect size. However, when the quality measure was not used (Choi et al's method [7]), we only selected 20 differentially expressed genes (representing 21 probe sets) at a FDR of 0%. All the 20 genes were also in the top of the list of the 29 genes identified with the qualityweights. Tables 2 and 3 show the selected differentially expressed genes between normal and lung adenocarcinoma patient samples using the qualityadjusted and qualityunadjusted models, respectively, with genes ordered based on their z statistic values [see 1 and 2]. As can be seen in Tables 2 and 3, 4 of the 9 genes that were selected by our method, but not by Choi et al's method, have also been identified by several other groups including Jiang et al. [10], Beer et al. [15] and Bhattacharjee et al. [16]. In particular, some of these 4 genes, such as TEK and TGFBR2, have been experimentally validated (shown in Table 3 of Jiang et al.[10]). For a cutoff of an adjusted z value of 1.96 (corresponding to a 5% level of significance), the qualityadjusted model identified 9 significantly expressed genes while the qualityunadjusted model identified only 2 of the 9 significantly differentially expressed genes. All these results suggest that our proposed method may have increased sensitivity to detect more differentially expressed and biologically relevant genes than Choi et al's method [7].
Additional File 1. Selected differentially expressed genes between normal and lung adenocarcinoma patient samples (Genes selected based on qualityadjusted model with delta = 1.1)
Format: XLS Size: 34KB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional File 2. Selected differentially expressed genes between normal and lung adenocarcinoma patient samples (Genes selected based on qualityunadjusted model with delta = 1.1)
Format: XLS Size: 29KB Download file
This file can be viewed with: Microsoft Excel Viewer
We compared genes identified with our method with genes identified by Jiang et al. [10], Beer et al. [15] and Bhattacharjee et al. [16], sixteen of the 29 genes identified by our proposed model were also detected in at least one of these studies. In particular, we observed that 6 of the 29 genes were consistently identified in the other three studies. There were 13 of the 29 genes that were uniquely identified in our study. Some of these are plausible candidates for lung adenocarcinoma. For example, Walker et al. [17] found that G proteincoupled receptor kinase 5 (GRK5) is a key gene regulating airway response, that may have implications in obstructive airway diseases.
Discussion
In this study we proposed a measure to quantify Affymetrix gene chip data quality for each gene in each study. The quality index measures the performance of each probe set to detect its intended target. Furthermore, we extended a traditional effect size model by using the quality index as a weight for combining information from different chip types of Affymetrix microarrays, and incorporating this weight into a randomeffects metaanalysis model. We illustrated the advantages of our proposed methods using the Harvard and Michigan gene expression datasets used in [18]. This approach of using the detection pvalues to weight the gene expression estimates can be applied in a more general context and to other microarray metaanalysis methods, such as that of Rhodes et al [6].
The assumption of a groupspecific exponential distribution for the negative log detection pvalues is a very rough approximation. The true distribution is probably closer to the twoparameter logbeta distribution. However, due to the fact that there are only 16–20 probe pairs per probe set used in the Wilcoxon test statistic for the detection pvalues, the pvalues follow a somewhat discrete distribution. In particular for highlyexpressed genes, the pvalues for all samples in a group may have the same nearzero value. Estimation of twoparameters is therefore impossible. It would be worth investigating the sensitivity of the performance of this qualityweighting approach to the distributional assumption.
The performance of the weighting also depends on the sensitivity parameter s. In this study, a high s was needed since a lower cutoff gave almost all genes extremely low quality scores. This is a consequence of the observed detection pvalue distributions. These data were created in 2001, and since that time the quality of the oligonucleotide array technology has improved with more specific probe selections and optimized experimental conditions. For expression data obtained today, smaller values of s would be more appropriate. For example, in an analysis of a dataset generated more recently using improved oligonucleotide technology (not shown), using s = 0.05 gave good performance.
The discrepancy between the calculated and expected values for the Q test of heterogeneity was smaller when our quality measure was incorporated than when it was not, as suggested by the quantilequantile plots, but overall there still remains some variability that has not been fully captured by the weight function, emphasizing the need for a more elaborate weighting strategy and sensitivity analyses.
From a biological point of view, lung adenocarcinoma may be heterogeneous originating from different causes [19] and methods like cluster analysis in which subtypes of relatively homogeneous groups of disorders can be identified might be useful. The focus of our paper was to introduce a methodology that can be used to integrate data, and as such the grouping of the samples as "diseased" versus "normal" was primarily used to identify genes that discriminate the two groups in a broader sense than at a higher resolution in which sub types could be identified.
Choi et al. [7] have also provided a brief Bayesian interpretation of their effect size model to integrate information from multiple microarray studies. They argued that a Bayesian approach could offer a more flexible and robust modeling strategy. We did not consider this issue in this study. It will be interesting to investigate how our proposed quality measure could be incorporated into their Bayesian framework in the future.
Methods
Data source and preprocessing
We selected two Affymetrix microarray data sets from the 4 lung cancer data sets provided by the organizers of the Critical Assessment of Microarray Data Analysis (CAMDA) conference [18]. These datasets were collected using different versions of the Affymetrix oligonucleotide microarrays and were conducted by two research groups, one from Harvard and the other from Michigan. The Michigan study [15] used the HuGeneFL Affymetrix chip, containing 7,129 probe sets, each with 20 probe pairs. This study included 86 lung adenocarcinoma patient samples and 10 normal samples. The Harvard study [16] used the HG_U95Av2 chip with 12,625 probe sets, each with 16 probe pairs. This study included 17 normal and 127 lung adenocarcinoma patient samples. We use the word "sample" interchangeably with "array". The main objectives addressed in these two studies were to identify differentially expressed genes related to lung adenocarcinoma, and genes whose expression was related to patient survival. Our interest is in the former. We developed a new method for identifying genes that are differentially expressed between the cancer and normal samples, by modelling the effect size and integrating information from the Harvard and Michigan studies.
We converted the probe level data to a single expression measure for each probe set using the robust multiarray average (RMA) algorithm [20], which provides higher specificity and sensitivity in detecting differentially expressed genes. We used 6124 common probe sets in these two studies for the data analysis in our study. These probe sets were selected based on a sequencebased probe matching method, which is believed to produce more consistent results when comparing similar biological data sets obtained by different microarray platforms [10,21]. The detailed procedure used to select these probe sets can be found in the Data processing Section of [10].
Quality measure for Affymetrix Genechip data
Affymetrix genechip microarrays are used to monitor gene expression for thousands of transcripts. Each transcript is represented as a probe set, and a probe set is made up of probe pairs comprised of Perfect Match (PM) and Mismatch (MM) probe cells. The level of expression of a gene product is estimated using the intensities of each probe pair in a probe set. Therefore, the probespecific variability in a probe set can be used as a measure of the performance of that probe set. The detection algorithm proposed by [22] generates a detection pvalue, which represents the probability that the probe set (gene) expression is above zero (i.e., turned on), and measured reliably and consistently. A lower pvalue is considered as a useful indicator that the measured gene expression is valid and reliable [22]. Specifically, the pvalue is based on testing whether the probespecific differences (PMMM) are almost always positive. We used the detection pvalues to define quality measures for probe sets, summarizing across the arrays and experiments. We realize that some genes may be truly "off" under some experimental conditions and hence a large detection pvalue may be providing useful information. However, if a gene is "off" under all experimental conditions, we argue that analysis of this gene contributes little to understanding of the experiment.
For any gene, let pvalue denote its detection pvalue and x_{lg }denote  log(pvalue_{lg}) for sample l in group g. We assume that each study compares G groups, where there are n_{g }samples in group g, and g = 1, 2, ..., G. For example, in the lung cancer data, G = 2, since adenocarcinoma samples are compared to controls. It is well known that pvalues follow a uniform distribution when there is no signal, and therefore, we expect x_{lg }to follow an exponential distribution with mean λ = 1 if the gene is not expressed. In order to develop a single quality measure for each gene across all samples in one study, we use this relationship with the exponential distribution to motivate a quality measure. We assumed that the detection pvalue for a single gene of sample l in group g follows the distribution
x_{lg }=  log (pvalue_{lg}) ~ Exponential (λ_{g}),
where different distributions of expression can be expected in each group g. It should be noted that we are modeling the pvalue of one gene, across the samples in a group. This is different from the approaches of Allison et al. [23] and Pounds et al. [24] who modeled the distribution of pvalues across genes. Although the true distribution of the x_{lg }may not be exponential, this assumption leads to a simple model where the one parameter can be estimated by a closedform expression. Hence, the parameter λ_{g }for each gene, study and group g can be estimated by:
where is the usual sample mean. This is a maximum likelihood estimator (MLE) with wellknown asymptotic optimality properties [25].
To combine across the groups, we assumed a sensitivity parameter s. It is defined as the probability that a representative probe set in a particular treatment group shows a detectable signal, assuming that the relevant distribution (exponential or beta) holds. It can be thought of as the equivalent to a detection pvalue defined for a whole group rather than for one array, and an appropriate value for s should be chosen with this interpretation in mind. For example, the default settings by Affymetrix software for the detection calls are p = 0.06 for a "Marginal" call, and p = 0.04 for a "Present" call, although these can be altered by the user. In practice, the appropriate choice of s may depend on the signaldetection capability of particular technologies. We recommend plotting the distribution of quality scores for different choices of s, and choosing a value that clearly distinguishes genes of low quality (scores near zero) from high quality genes (scores near 1).
The sensitivity parameter s is a chosen cutoff, so that genes that are "off" or poorly measured across all experimental conditions will have pvalue ≥ s, or in other words, . Therefore, we can define a quality measure across the groups, for each gene and each study as:
The choice of the maximum gives more weight to genes measured with high quality in at least one group, thereby allowing a gene that is "off" in one condition and "on" under another condition to provide useful information in the analysis. We treat this quality score as a weight for each gene in the subsequent analysis.
Modelling effect size with qualityadjusted weights
In order to simplify the discussion, we consider two groups, treatment (t) and control (c) groups, in study i = 1,2,...,k. Let n_{it }and n_{ic }denote the number of treatment and control samples in study i, respectively. For each gene, let μ denote its overall mean effect size, a measure of the average differential expression for that gene, and let y_{i }denote the observed effect size for study i. We modeled effect size using the hierarchical model:
Where τ^{2 }is the betweenstudy variability and is the withinstudy variance, measuring the sampling error for the i^{th }study. Choi et al. [7] used the standardized mean difference as a measure of the observed treatment difference y_{i}. This wellknown estimator of treatment difference found in Hedges and Olkin's [26] work is
where and are the average gene expression values in the treatment and control groups of study i, respectively, and is the pooled standard deviation.
For a study with n samples, an approximately unbiased estimator of is given by [26]. The estimated variance of the unbiased effect size is given by [27]
In a fixedeffects model (FEM), the error of the observed effect sizes is fully assigned to sampling error only, ignoring the between study variance, so τ^{2 }= 0 and y_{i }~ N (μ, ). On the other hand, a randomeffects model (REM) considers that each study estimates a different treatment effect θ_{i}. These parameters are drawn from a normal distributionθ_{i }~ N(μ, τ^{2}).
To assess whether FEM or REM is most appropriate, we tested the hypothesis τ = 0 using the following test statistic, which is a modification of Cochran's test statistic [28] that incorporates our quality measure for each study
where and
is the weighted least squares estimator that ignores between study variations. Under the null hypothesis of τ = 0, this statistic follows a distribution. We followed Choi's approach [7] to draw quantilequantile plots of Q to assess whether a FEM or REM model is appropriate. If the null hypothesis of τ = 0 is rejected, we estimate τ based on the method developed by DerSimonian and Laird [29]
Therefore, we can estimate μ that corresponds to a random effects model by
where . Under the REM,
The z statistics to test for treatment effect under REM is
However, when there are only a small number of arrays in each group, the estimates of standard error s for each gene can be very variable. Some genes might by chance have very small standard errors, and therefore appear highly significant. To address this problem, different approaches have been developed for "smoothing" the variance estimates by borrowing information from the ensemble of genes. This can assist in inference about each gene individually. For example, Tusher et al. [30], Efron et al. [31] and Broberg [32] used tstatistics where an offset was added to standard deviation while Smyth [33] proposed a tstatistic with a Bayesian adjustment to the denominator. We took the offset s_{0 }as the quantile of the genewise standard errors that minimizes the coefficient of variation of the z statistics [30]. Therefore, we can calculate the adjusted z statistics (used in this study) to test for treatment effect under REM as
The adjusted z statistics for FEM is the same as that for REM except that τ^{2 }= 0. Note that all these expressions refer to a single gene. The adjustment for computing z statistic was also used by GarrettMayer et al. [34].
Assessment of differentially expressed genes
We performed a multiple testing procedure, as described by Dudoit et al. [35], to evaluate statistical significance for differentially expressed genes in the combined studies. The false discovery rate (FDR) [36] has become a popular error measure for this purpose. Tusher et al. [30] developed a permutationbased method to calculate FDR for evaluating differentially expressed genes. We adapt their approach to our metaanalytic framework as follows:
1. For each gene j, j = 1,2,..., J, in the original data, compute the adjusted z statistic Z_{1},...,Z_{J }based on the metaanalysis procedure described in the previous section
2. Order these z statistic values to obtain Z_{(1) }≤,..., ≤ Z_{(J)}
3. Create B random permutations within both studies. For each permutation b = 1,2,... B, produce the adjusted z statistic Z_{1,b},..., Z_{J,b }for gene j = 1,2,..., J
4. Compute expected order z statistics:
5. The potentially "significant" genes are those that have a distance between Z_{(j) }and , greater than a given threshold delta (Δ). Therefore, we can find the smallest positive Z_{(j) }such that , say t_{1}. Similarly, we can find the largest negative Z_{(j)}, say t_{2}
6. The estimated FDR for the selected significant genes at the given delta (Δ) is given by:
Authors' contributions
JB initiated, designed and managed the study. CG proposed the quality measure method and participated in designing and managing the study. PH conducted data analysis and drafted the manuscript. JB and CG revised the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We would like to thank Dr. Hongying Jiang who provided us with the list of common probe set IDs for the two data sets used in our study. We also acknowledge helpful suggestions from two anonymous reviewers that greatly improved the quality of the manuscript.
References

Welsh JB, Sapinoso LM, Su AI, Kern SG, Wangodriguez J, Moskaluk CA, Frierson HF, Hampton GM: Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer.
Cancer Research 2001, 61:59745978. PubMed Abstract  Publisher Full Text

Dhanasekaran SM, Barrette TR, Ghosh D, Shah R, Varambally S, Kurachi K, Pienta KJ, Rubin MA, Chinnaiyan AM: Delineation of prognostic biomarkers in prostate cancer.
Nature 2001, 412:822826. PubMed Abstract  Publisher Full Text

Luo J, Duggan DJ, Chen Y, Sauvageot J, Ewing CM, Bittner ML, Trent JM, Issacs WB: Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling.
Cancer Research 2001, 61:46834688. PubMed Abstract  Publisher Full Text

Magee JA, Araki T, Patil S, Ehrig T, True L, Humphrey PA, Catalona WJ, Watson MA, Milbrandt J: Expression profiling reveals hepsin overexpression in prostate cancer.
Cancer Research 2001, 61:56925696. PubMed Abstract  Publisher Full Text

Olkin I: MetaAnalysis: methods for combining independent studies. Editor's introduction.

Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM: Metaanalysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer.
Cancer Research 2002, 62:44274433. PubMed Abstract  Publisher Full Text

Choi JK, Yu U, Kim S, Yoo OJ: Combining multiple microarray studies and modeling interstudy variation.
Bioinformatics 2003, (Suppl 19):i84i90. Publisher Full Text

Ghosh D, Barette TR, Rhodes D, Chinnaiyan AM: Statistical issues and methods for metaanalysis of microarray data: a case study in prostate cancer.
Functional & Integrative Genomics 2003, 3:180188. PubMed Abstract  Publisher Full Text

Parmigiani G, GarrettMayer ES, Anbazhagan R, Gabrielson E: A crossstudy comparison of gene expression studies for the molecular classification of lung cancer.
Clinical Cancer Research 2004, 10:29222927. PubMed Abstract  Publisher Full Text

Jiang H, Deng Y, Chen H, Tao L, Sha Q, Chen J, Tsai C, Zhang S: Joint analysis of two microarray geneexpression data sets to select lung adenocarcinoma marker genes.
BMC Bioinformatics 2004, 5:81. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Shen R, Ghosh D, Chinnaiyan AM: Prognostic metasignature of breast cancer developed by twostage mixture modeling of microarray data.
BMC Genomics 2004, 5:94. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Kuo WP, Jenssen TK, Butte AJ, OhnoMachado L, Kohane IS: Analysis of matched mRNA measurements from two different microarray technologies.
Bioinformatics 2002, 18:405412. PubMed Abstract  Publisher Full Text

Jarvinen AK, Hautaniemi S, Edgren H, Auvinen P, Saarela J, Kallioniemi OP, Monni O: Are data from different gene expression microarray platforms comparable?
Genomics 2004, 83:11641168. PubMed Abstract  Publisher Full Text

Tritchler D: Modelling study quality in metaanalysis.
Statistics in Medicine 1999, 18:21352145. PubMed Abstract  Publisher Full Text

Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S: Geneexpression profiles predict survival of patients with lung adenocarcinoma.

Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses.
Proceedings of the National Academy of Sciences USA 2001, 98:1379013795. Publisher Full Text

Walker JKL, Gainetdinov RR, Feldman DS, McFawn PK, Caron MG, Lefkowitz RJ, Premount RT, Fisher JT: G proteincoupled receptor kinase 5 regulates airway response induced by muscarinic receptor activation.
American Journal of Physiology – Lung Cell Molecular Physiology 2004, 286:L312L319. Publisher Full Text

CAMDA 2003 [http://www.camda.duke.edu/camda03] webcite

Shigematsu H, Lin L, Takahashi T, Nomura M, Suzuki M, Wistuba II, Fong KM, Lee H, Toyooka S, Shimizu N, Fujisawa T, Feng Z, Roth JA, Herz J, Minna JD, Gazdar AF: Clinical and biological features associated with epidermal growth factor receptor gene mutations in lung cancers.

Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data.
Nucleic Acids Research 2003, 31:e15. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Brigham HM, Gregory TK, Jeffrey S, Meena A, David B, Peter B, Daniel ZW, Thomas JM, Isaac SK, Zoltan S: Sequencematched probes produce increased crossplatform consistency and more reproducible biological results in microarraybased gene expression measurements.
Nucleic Acids Research 2004, 32:e74. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Affymetrix [http://www.affymetrix.com/support/technical/manuals.affx] webcite

Allison DB, Gadbury GL, Heo M, Fernandez JR, Lee CK, Prolla TA, Weindruch R: A mixture model approach for the analysis of microarray gene expression data.
Computational Statistics & Data Analysis 2002, 39:120. Publisher Full Text

Pounds S, Morris SW: Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of pvalues.
Bioinformatics 2003, 19:12361242. PubMed Abstract  Publisher Full Text

Knight K: Mathematical statistics. Chapman & Hall/CRC Press; 2000.

Hedges LV, Olkin I: Statistical methods for metaanalysis. Orlando, FL: Academic Press; 1995:81.

Cooper H, Hedges LV: The handbook of research synthesis. New York: Russell Sage; 1994:268.

Cochran BG: The combination of estimates from different experiments.

DerSimonian R, Laird NM: Metaanalysis in clinical trials.
Controlled Clinical Trials 1986, 7:177188. PubMed Abstract  Publisher Full Text

Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response.
Proceedings of the National Academy of Sciences USA 2001, 98:51165121. Publisher Full Text

Efron B, Tibshirani R, Tusher V: Empirical Bayes analysis of a microarray experiment.
Journal of the American Statistical Association 2001, 96:11511160. Publisher Full Text

Broberg P: Statistical methods for ranking differentially expressed genes.
Genome Biology 2003, 4:R41. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Smyth GK: Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments.
Statistical Applications in Genetics and Molecular Biology 2004, 3(1):Article 3.

GarrettMayer E, Parmigiani G, Zhong X, Cope L, Gabrielson E: Crossstudy Validation and Combined Analysis of Gene Expression Microarray Data. [http://www.bepress.com/jhubiostat/paper65] webcite
Technical Report, Johns Hopkins University, Department of Biostatistics 2004.

Dudoit S, Shaffer JP, Boldrick JC: Multiple hypothesis testing in microarray experiments.
Statistical Science 2003, 18:71103. Publisher Full Text

Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing.
Journal of the Royal Statistical Society: Series B 1995, 85:289300.