Abstract
Background
With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the tstatistic based, mixedeffects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a nonparametric metaanalysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN) to those with normal functioning allograft.
Results
The simulation study comparing the nonparametric metaanalysis approach to a commonly used tstatistic based approach shows that the nonparametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be overrepresented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels.
Conclusion
We present a new approach for combining data from multiple independent microarray studies. This approach is nonparametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been reported to be relevant to renal diseases. Further study on the identified genes and pathways may lead to better understanding of CAN at the molecular level.
Background
DNA microarray technology was launched in the early 90's. With its development and commercialization, it has been a popular tool for researchers to perform genomewide analysis of gene expression profiles[1]. One major application of the technology is to compare, with simultaneous measurements on the expression of thousands of genes, the gene expression patterns under two or more different biological conditions, and to identify differentially expressed genes and their biological functions. One direct result of the popularity of the DNA microarray technology is the explosion of data generated from independent experiments that were designed to study similar biological conditions. Metaanalysis can thus be performed to integrate the results from these various DNA microarray experiments. Ordinarily, given a random sample, we assume that we can generalize the results and conclusions drawn from the sample to the population; however, the sample size for microarray experiments is usually small. By performing metaanalysis on data from multiple experiments, we can take advantage of the larger number of hybridized samples and make the findings more applicable to the full population.
Metaanalysis is the quantitative synthesis of a number of study results. A few groups of researchers have developed metaanalytic methods for combining results from multiple DNA microarray experiments [26]. The biological conditions to which such methods have been applied include prostate cancer [2,4,6], liver cancer [7], leukemia [8] breast cancer [3], pancreatic cancer [9], the common transcriptional profiles of neoplastic transformation and progression of multiple cancer types [5], and others [10].
A statistical challenge in performing a metaanalysis of microarray studies is that often samples are hybridized to different microarray platforms, and the technical differences among platforms lead to fundamental differences in the nature of the gene expression measurements produced. For example, the data for a custom spotted cDNA array are usually expressed as ratios of the intensity values corresponding to an experimental sample to the intensity values of a cohybridized reference sample; while the data from an Affymetrix high density oligonucleotide microarray are absolute intensity values for the single channel. Therefore, data from different microarray platforms are not directly comparable, and it is essential to use the original values of gene expression from each platform to derive an "effect size" estimate that is independent of platform, thus rendering the different platforms comparable. The term "effect size" commonly used in metaanalysis refers to a standardized index measuring the effect associated with a treatment or covariate, or the magnitude of difference in gene expression in microarray studies [6]. Another challenge is how to integrate the effect size measurements from the individual studies.
To address these challenges, different effect size measurements and models for integrating microarray data have been proposed. In an early study [2], the authors performed a metaanalysis on two cDNA microarray studies and two Affymetrix oligonucleotide arrays, all of which compared the gene expression profiles between clinically localized prostate cancer and benign prostate tissue specimens. For each gene g (g = 1,⋯, G) and study i (i = 1,.., I = 4), they fit a simple linear regression model with tissue type as the covariate and expression measurement as the response. Then the ordinary least square estimate of the covariate coefficient divided by its standard error was used as the effect size measurement, which is equivalent to a tstatistic. Thus these genelevel effect size measurements are comparable among all the individual studies. To integrate these tstatistics across studies, a weighted average of tstatistics was calculated to obtain a global statistic for differential expression of each gene g, whereas the proportions of the sample sizes in study i to the overall sample size were used as the weights. To identify genes that were truly differentially expressed between the cancer and healthy tissues, the authors used permutation method by fitting linear models with permuted tissue labels to calculate the false discovery rate (FDR). Genes were declared differentially expressed corresponding to a specified FDR.
The same four prostate cancer microarray datasets were analyzed using an alternative method by Rhodes et al. [4]. They used the pvalue p_{g, i }calculated from the permutation ttest for each gene g in each study i to serve as the effect size measurement. To integrate across all the studies, Fisher's method for combining pvalues, , was calculated for each gene. Under the null hypothesis that gene g did not have differential expression between the two groups, S_{g }is chisquare distributed with degrees of freedom 2·I. Then the pvalue for gene g based on the integral analysis of all the datasets can be calculated using the χ^{2}_{df = 2I}distribution. Controlling the FDR at a certain level, differentially expressed genes could be identified as those with pvalues less than a threshold determined by the FDR level.
Choi et al. [6] used a mixedeffects model approach to estimate the standardized mean expression difference for each gene g in each study i, and the study effect was treated as a random effect. A zstatistic z_{g, i}, the effect size measurement, was calculated to be the ratio of the estimated mean expression differences to its standard error. To integrate across studies, the average zstatistic was taken to be the summary z score for each gene. Then they used permutation technique to control the FDR and determine the cutoff value of the z scores. Genes with absolute z scores larger than the cutoff value were declared as significantly differentially expressed. These authors also incorporated a Bayesian method into the mixedeffects model to estimate the mean expression difference. They assumed different prior distributions for the standardized mean difference and the variance of the random study effect. The estimates of the effect size measurements were obtained from the corresponding posterior distributions.
Shen et al. [3] also used Bayesian framework to perform metaanalysis on microarray experiments studying breast cancer. However, the effect size measurement estimated using Bayesian hierarchical model was a selfdefined probability: probability of expression, which was calculated based on a few distributional assumptions and ranged in [1, 1]. After the probability of expression was obtained for each gene g in each study i (i = 1,.., I), they simply pooled the data from all I studies into one dataset and used univariate logistic regression technique to quantify genes relevance to breast cancer.
Nevertheless, to some degree, all these methods rely on the adherence of the data to a specified parametric distribution, such as the Gaussian distribution for the tstatistic methods and the mixedeffects model, or the different forms of prior distributions in the Bayesian context. However, commonly within each individual microarray study, only a small sample size is available, thus the normality assumption may not hold well. Further, it may be even more difficult to test the validity of the assumptions on the prior distributions and their parameters for Bayesian models. Some of the aforementioned methods are also somewhat computationally complicated and might be difficult to be understood by clinical researchers not having advanced training in statistics.
We obtained the data from two independent microarray experiments that compared the gene expression profiles between chronic allograft nephropathy (CAN) and normal functioning kidney allograft. Chronic Allograft Nephropathy is a major cause of graft loss and patient morbidity after kidney transplantation. The histopathology features of CAN are nonspecific and this makes it difficult to detect CAN before the occurrence of clinical manifestations. However, it is now well known that CAN may already be present in protocol biopsies before its clinical appearance [11]. It may be promising to characterize the gene expression pattern of transplant kidneys with CAN to use for prognosis of new kidney transplant recipients. A few studies used DNA microarray technology to compare, among patients who had kidney transplant, gene expression of biopsy samples from kidneys with CAN to those from the normal functioning kidneys. We obtained the gene expression data from two studies, which we refer to herein as the Hotchkiss study [12] and the Mas study [11]. Both used Affymetrix high density oligonucleotide microarrays to compare the gene expression profiles between CAN and normal functioning allograft, and identified lists of genes whose differential expression profiles could describe the molecular difference between CAN and the normal allograft. However, both of the studies suffered from the small sample size problem.
In this study, we sought to integrate the results from these two independent DNA microarray experiments. Herein we present a nonparametric approach for combining microarray data from various studies which does not suffer from the aforementioned limitations. This new method does not require any distributional assumption on the gene expression measurements, is logically intuitive, and is also easy to implement using statistical software. We will also report some of the biological findings from its application to integrate the two microarray studies.
Results
The Hotchkiss study used Affymetrix HGU133A human GeneChip arrays to measure the gene expression of 16 biopsies from patients with CAN and 6 biopsies from patients with normal functioning allograft [12]. The dataset was obtained upon request and the probe level data were already normalized and summarized using RMA (Robust Multichip Average) [13] method. The Mas study [11] was performed at Virginia Commonwealth University and the investigators used Affymetrix HGU133A 2.0 human GeneChip arrays to measure the gene expression of 10 biopsies from patients with CAN and 4 with normal functioning allograft. For consistency, we also used RMA method to obtain probe set expression summaries from the original *CEL files of this study.
Genes identified to be predictive of CAN
Using our proposed metaanalysis method on the two microarray datasets, we identified 330 probe sets representing 309 genes that were significantly relevant to CAN. Associated with each gene was a score that measured this gene's ability to discriminate between CAN and normal allograft. The lower the score value, the better discriminative ability the gene had. The definition of the score and the procedure to obtain it are elaborated in the Methods section. Table 1 lists the first 10 genes that have the most desirable score values. Figure 1 is the heatmaps of the top 50 identified genes in the two studies. The complete list of the identified genes can be found in Additional File 1.
Additional file 1. The complete list of the identified 330 probe sets that were significantly relevant to chronic allograft nephropathy.
Format: HTML Size: 461KB Download file
Figure 1. Heatmaps of the top 50 identified genes (a) Hotchkiss study. (b) Mas study.
Table 1. 10 identified probe sets with the lowest ranked scores for discriminating CAN vs. Normal allograft
The gene PVALB (parvalbumin) has the most outstanding discriminative ability, or in other words, it has very distinct expression patterns between CAN and normal allograft, as shown in Figure 2a. Although the expression measurements in the two studies are of different ranges, 5.76–8.93 in the Hotchkiss study and 4.91–9.26 in the Mas study, the relative pattern is the same: the expression in CAN is lower than that in normal allograft. This gene encodes a high affinity calcium ionbinding protein and has been reported by multiple groups of researchers to be related to renal cell cancers (RCC) (eg: [14,15]). Wiesel et al. [16] found that "aggressive tumor growth of RCC requires close follow up in patients who received a renal allograft". The finding from this metaanalysis suggests that parvalbumin might be also relevant to the progression to CAN and deserves further study.
Figure 2. The differential expression pattern of gene (a) PVALB (Affy ID "205336_at") and gene (b) ITGAE (Affy ID "205055_at").
Comparison between the genes identified by metaanalysis and SAM analysis in each individual study
The Significant Analysis of Microarray (SAM) method [17] was applied in both original studies by Hotchkiss and Mas to detect significantly differentially expressed genes at a false discovery rate (FDR) of 0.05. We applied SAM to the two datasets respectively. Controlling the FDR at 0.05, 26 probe sets are declared significant in Hotchkiss study, 10 of which are among the 330 probe sets identified by our metaanalysis; 2190 probe sets are declared significant in Mas study, 214 of which are among the 330 probe sets identified by our metaanalysis. Comparing the SAM analysis results of the two studies, only 2 probe sets are found to be in common.
The discordance among these results demonstrates that independent microarray experiments, even they use the same platform and study the same biological condition, may not be reproducible [18,19]. It also demonstrates the advantages of metaanalysis: since the overall sample size is larger than each individual study, it can find "small but consistent" [20] effect sizes which cannot be detected by analysis on individual studies. Also, some effect sizes might be significant in one study but not the other; metaanalysis can discard these inconsistent effect sizes which may be caused by the uniqueness of samples in that study.
A unique advantage of our metaanalysis method is that it can discern the differential gene expression patterns between CAN and normal allograft that cannot easily be described by some measurement of standardized mean difference in expression values, such as the tstatistic, which is the basis of the SAM method. An example is illustrated using the expression values of probe set "205055_at" in both studies. This is gene ITGAE, the fifty third in the identified gene list. In Figure 2b, plots of the distributions of ITGAE expression values in the two studies are presented. It is clear in both plots that the expression patterns are different between the two groups, with the expression of normal allograft samples generally being lower than that of CAN samples. However, because of the presence of several normal samples that have relatively high expression, the standardized means are not significantly different between the two groups. As in reality, even if a gene has been biologically validated to be upregulated (or downregulated) in a disease, relatively low (or high) expression may still be observed in a few samples. The twosided ttest for ITGAE in the Hotchkiss study yields a pvalue 0.20, and 0.14 in the Mas study. Therefore, models based on tstatistic will not recognize this gene as having differential patterns between the two groups, although the expression patterns are indeed different enough to differentiate the samples.
Comparison between the genes identified by our nonparametric metaanalysis and a tstatistic based metaanalysis
We also performed a metaanalysis on the two CAN studies using a commonly used parametric approach that is based on tstatistic, and compared the result to that from our nonparametric metaanalysis approach. The tstatistic based method is derived from [4] and is described as in the Background section. To control for the FDR, Benjamini and Yekutieli's method [21] of adjusting the pvalues was used. At an FDR level of α = 0.01, 2026 probe sets are identified to have significantly differential expression between the normal allograft and CAN samples. The majority of the 330 probe sets identified by our nonparametric metaanalysis are also identified by this parametric metaanalysis; however 73 out of the 330 probe sets are not identified by the parametric analysis, including the above mentioned probe set ''205055_at'', which has an adjusted pvalue 0.41. Most of the 73 probe sets indeed have the similar differential expression pattern as seen in Figure 2b, which cannot be depicted by the tstatistic based model, yet is a reflection of what might be observed in reality.
Biological pathways associated with the identified genes
The metaanalysis was performed on the individual gene level and assigned a score for each gene that quantified its difference in expression between CAN and normal allograft. However, genes do not express independently, especially when they are components of the same biological pathway. Therefore, we also examined the identified genes at the pathway level.
Among the 330 identified genes using our nonparametric approach, 129 have been annotated in KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway database, and they are components of 114 pathways. For each pathway we used Fisher's exact test with α = 0.01 to test the null hypothesis of no difference against the alternative that significantly more genes were represented in the list of the identified genes of that pathway than would be expected by chance. Six pathways were found to be significantly overrepresented. Table 2 lists the pathway names and the number of identified genes in the pathways.
Table 2. Overrepresented KEGG pathways by Fisher exact test, at significance level 0.01
As an illustration, we examined the most significantly overrepresented pathway in the literature. Oxidative phosphorylation is a process of cellular respiration and consisted of five complexes located in the inner mitochondrial membrane. ATP synthesis is the final protein complex in the metabolic pathway. It has been reported that mitochondrial disorders can sometimes give rise to kidney dysfunction [22]. The finding from this metaanalysis indicates that the abnormal activities in the process of oxidative phosphorylation might be related to the development of interstitial fibrosis and tubular atrophy, which are specific features of CAN.
Allograft status prediction on 6 unpublished samples
Predictions were performed on 6 additional transplant kidney biopsies that were procured and hybridized to Affymetrix HGU133A 2.0 GeneChip arrays from Mas et al. after their previous study. These 6 samples were all diagnosed as CAN by experienced pathologists based on the histological observations. Each identified gene was used as the single predictor to predict the class labels of the 6 samples, using classifiers derived from the Hotchkiss data and Mas data respectively. The weighted average of the error rates associated with each identified gene is recorded in the last column of Additional File 1, where the weight corresponding to each study was taken to be the proportion of its sample size in the total combined sample size.
When using the classifiers derived from Hotchkiss data and Mas data respectively with all the 330 identified genes expression measurements as predictors, the predicted classes all conformed to their true diagnostic classes.
Simulation Study
We conducted a simulation study to investigate the generalizability of our nonparametric metaanalysis approach. Two datasets were generated independently to simulate the RMA expression summary data observed from the two Affymetrix GeneChip experiments in which CAN was studied. We simulated expression values for G = 2000 genes in each of two studies, with a percent p = 10% of genes to be truly differentially expressed between the disease group and normal group. The sample sizes were n_{1, d }= 4 or 8, n_{1, norm }= 6 or 10 in study I for the disease and normal groups respectively; similarly, the samples sizes were n_{2, d }= 6 or 10, n_{2, norm }= 15 or 15 in study II for the two groups respectively. These small samples sizes were used to mimic the CAN metaanalysis. To identify differentially expressed genes, the nonparametric metaanalysis approach and permutation ttest based approach were then applied on the simulated datasets. We note that because of the small sample sizes in the disease group, mixtureeffects model based metaanalysis approach is not likely to be appropriate to use. The performance of the applied two approaches was evaluated by two statistics: sensitivity (percentage of correctly identified differentially expressed genes in the pool of truly differentially expressed genes); and specificity (percentage of identified nondifferentially expressed genes in the pool of truly nondifferentially expressed genes).
Simulation model and algorithm
The intensity of a gene was generated using the following model:
where y_{g, i, j }is the intensity for gene g(g = 1,..., G = 2000) in study i = 1, 2 and group j = 1 (normal) and 2 (disease); I() is the indicator function; μ_{g }is the overall gene g effect; β_{(i)g }is the study effect; γ_{(j)g }is the group effect; and ε_{i(j)g }is the random error term nested within group, allowing different variability in different groups. The procedure used in the generation of the datasets is outlined in the following steps:
1). Generate the gene effects vector
 μ
 μ
 μ
2). For samples in Study II, generate the study effect for each gene: β_{(i)g}~N(μ_{β}, σ_{β}^{2}), where μ_{β}~Uniform (0,2) and σ_{β}^{2}~Uniform(0,0.5).
3). Randomly select p = 10% of the genes to be truly differentially expressed genes. Additionally randomly select some samples to represent the diseased group. Generate the group effect for each of these genes in these disease samples: γ_{(j)g}~N(μ_{γ}, σ_{γ}^{2}), where μ_{γ}~Uniform(0.2,1) with probability 0.4, Uniform(1, 2.2) with probability 0.1, Uniform(1,0.2) with probability 0.4, and Uniform(2.3,1) with probability 0.1; σ_{γ}^{2}~Uniform(0.4,0.5).
4). Generate the random error for each gene in each sample: ε_{i(j)g}~N(0,0.1).
Simulation result
We ran 30 simulations and the means and standard deviations (SD) of the sensitivity and specificity statistics using the two approaches are reported in Table 3. The small SDs indicate that both our KNN based nonparametric approach and the tstatistic based parametric approach have stable performance over the 30 simulation runs. Using either approach, the sensitivity is increased in scenario II where a larger sample size is available compared to scenario I. Under both scenarios, our approach outperforms the tstatistic based approach, either in terms of sensitivity or specificity. The specificity statistic is always high (>95%), as is expected since 90% of the G = 2000 genes truly have nondifferential expression.
Table 3. Mean sensitivity and specificity using the nonparametric approach and tstatistic based approach from the 30 simulations, under two scenarios of different sample sizes. (SD: standard deviation)
Discussion
The two microarray experiments analyzed in this metaanalysis were both performed using Affymetrix platform, whereas the Hotchkiss study utilized the HGU133A arrays and Mas et al. used HGU133A 2.0 (version 2) arrays. Other than some differences in the control probe sets, the probe set IDs are identical between the two versions of HGU133A arrays. Therefore, gene mapping between datasets was not a challenging step.
Hu et al. [23] developed a gene hybridization quality measure for Affymetrix DNA microarray platform and incorporated it as a quality weighing strategy into the effect size estimation in Choi et al.'s mixtureeffect model. We also considered using a hybridization quality measurement as a weighing system when deriving for each gene the score that described the gene's ability in discerning CAN vs. normal allograft. However, the dataset obtained from the Hotchkiss study is already probe set level data summarized by RMA method; therefore, we were unable to implement a quality measure in the analysis.
The metaanalysis method we proposed is applicable to situations where multiple microarry platforms are involved. However, if the data are from the same platform, the same normalization method should be used. The Tumor Analysis Best Practices Working Group compared different probe set expression summary algorithms for Affymetrix GeneChip arrays and claimed "different probe set interpretation algorithms lead to different results" [24]. They often observed only "~50% concordance in general data output in their own work between comparisons of two different algorithms". Therefore, a good expression summary algorithm is essential for performing downstream analysis. Shippy et al. [25] used RNA sample titrations to assess microarray platform performance and normalization techniques [2630]. It is not the research focus in this paper, and we suggest applying the same algorithm on datasets from the same platform.
It is well known that genes, especially genes in a common pathway, are correlated. We considered starting the metaanalysis from a pathway level, i.e. first identifying pathways that might be relevant to the progress of CAN; and then focusing on the individual genes in those pathways and finding out genes whose expression patterns were differential between CAN and normal allograft. However, since only less than 30% of the genes measured on the Affymetrix HGU133A chips have been annotated with known KEGG pathway information, we decided to perform the analysis at the gene level. This may help researchers understand gene functions that are still unknown and avoid throwing away 70% of the available data.
Chronic allograft nephropathy is a complex entity at both histological and molecular level. We identified 330 sequences whose differential expression patterns could distinguish between CAN and normal allograft. The functions of most of the identified genes are not well understood yet. More studies on these genes, especially on those at the top of the list, may lead to a better understanding of the progression of CAN at the molecular level. Furthermore, each gene is associated with a score that measures its degree of differential expression between the two groups. All the identified genes have scores below the predetermined threshold 0.1737. By adjusting the threshold based on prior expertise knowledge about CAN, more or less genes can be identified for further study. To utilize a smaller set of genes for prognosis on kidney transplant recipients, the genes with the lowest score values can be selected, such as the 10 listed in Table 1. Further study on the expression of these identified genes in the kidney transplant recipients might be very informative in terms of prognosticating the development of CAN.
Conclusion
In this paper, we present a new metaanalysis technique for combining DNA microarray studies by analyzing two independent microarray studies comparing the gene expression of CAN and normal allograft. This is a nonparametric approach that is statistically easy to understand, and can discern differential expression pattern that may not be detected by tstatistic based models and mixtureeffects model. Although the new method is applied to combine two microarray studies of the same platform, its' use is by no means limited to a single platform and can be used to different platforms without difficulty.
Methods
The probe level data from the two independent Affymetrix microarray studies were normalized and summarized using RMA method. To assess the sample quality, we calculated the 3' :5' ratios for three Affymetrix control probe sets corresponding to human genes GAPDH, ISGF and βactin. For both studies, all the ratios were less than 3, the threshold recommended by Affymetrix. Therefore, sample degradation did not seem to be a problem in both studies, and thus we regarded all samples as useful. Before combining the data across the two independent studies, all Affymetrix control probe sets were excluded, leaving 22,215 probe sets that were common to both studies.
Definition and calculation of effect size measurement in individual studies
We considered finding in each individual study for each gene an effect size measure that could quantify the gene's ability in discriminating CAN from normal allograft. This was essentially determined by the degree of difference in the gene's expression between the two groups, and reflected as how well its expression measurements could classify the samples. Conceptually, if a gene is irrelevant to CAN, using its expression to determine class membership would appear as random guessing. On the other hand, if a gene is an important predictor in distinguishing class, i.e. it has different expression patterns between CAN and normal allograft, we will expect to correctly classify most of the samples using its expression, and the misclassification error rate will be close to 0.
Therefore, within study i (i = 1, 2) we used the expression for each gene g (g = 1,..., 22215) as the single predictor variable and applied the KNearestNeighbor (KNN) classification method to develop a classifier for the n_{i }samples in study i. Thereafter, the KNN classifier was used to predict class label for each sample. The predicted labels were compared with their corresponding true class labels and an unbiased estimate of the misclassification error rate, denoted as err_{g, i}, was calculated. This error rate estimate measured this gene's discriminative ability and is defined as our effect size statistic.
Using KNN, we estimate directly at each observation the posterior probability of each class, given the observed predictor (gene expression), as the proportion of that class among the k nearest "neighbors" of the target observation. Then the classification for the target observation is the class which had the largest estimated posterior probability. The advantages of KNN include that it does not require any distributional assumptions, and it has reasonable performance comparing to other classification methods. A property of the largesample behavior of KNN is described in the following theorem [31]:
THEOREM: Let E* denote the error rate of the Bayes rule in a Cclass problem, i.e. the best possible error rate for the classification problem. Then the error rate of KNN converges in L_{1 }as the size of the training set increases to a value E_{1 }bounded below by E* and above by .
For a twoclass problem, E* = E(min(P(class I
 x
 x
 x
We used an odd value of k (the number of neighbors) to avoid ties, and chose k = 3 considering in the Mas study, only 4 samples were available in the normal allograft group. For microarray studies with larger sample sizes, k can be determined using crossvalidation and can be different for the individual studies. Since the data from both studies were summarized using RMA method and thus the scales of the data are similar, the Euclidean distance was used to quantify the similarity in the predictor gene's expression profile between samples and determine the ''nearest neighbors'' for each sample.
The apparent misclassification error rate, which is the number of misclassified observations in the training dataset divided by the total number of samples in the training dataset, tends to underestimate the true misclassification error rate [32]. Therefore, we used the refined bootstrap estimate to obtain an unbiased estimate for the misclassification error rate [33]. The refined bootstrap estimate corrects the apparent error rate estimate by adding the optimism due to estimating the error rate using the same observations that are also used in deriving the classifier. For each gene g in each individual study i, we generated B = 100 bootstrap resamples. For each bootstrap resample, we used the bootstrap sampled gene's expression measurements as predictor values and applied the KNN to develop the classifier. The classifier was used to predict the class labels for the bootstrap samples, as well as the original samples, respectively. If R_{boot,(b), g, i }denotes the misclassification error rate in the b^{th }bootstrap samples, and R_{ori,(b), g, i }denotes the misclassification rate in the original samples using the classifier built from the b^{th }bootstrap samples; then is the optimism estimate. Therefore, the unbiased estimate of misclassification error rate err_{g, i }is the apparent error rate in the original samples, which uses the classifier built from the original samples themselves, plus the optimism estimate from bootstrap samplings, i.e., err_{g, i }= R_{g, i }+ R_{opt, g, i}, where R_{g, i }denotes the apparent error rate and g = 1, 2,..., 22215; i = 1, 2.
It is noteworthy to notice that this step can be carried out on microarray datasets from any platforms. Although different platforms may yield distinct scales of numerical measurements of gene expression, as long as there exists a relative different expression pattern between the two classes, the discrimination method can be applied to quantify each gene's association with the class label.
Integration of the effect sizes across studies
After we obtained the studyspecific effect size measurements (err_{g, i}) for each gene g, we calculated the weighted average across studies as the combined effect size (or score) of this gene. The weight corresponding to each study was taken to be the proportion of its sample size in the total combined sample size. The combined effect size for gene g across studies is .
Identification of genes relevant to CAN
To identify genes capable of distinguishing between CAN and normal allograft, we wanted to identify the estimates that are "equivalent" to 0. To do so, we determined a threshold T such that if the probability of a score being less than the threshold was less than α = 0.01, the score was considered to be equivalent to 0, i.e., P(  0 <T  g) ≤ α. The QQ plot of for the 22,215 probe sets (not shown here) demonstrates that are approximately normally distributed. Therefore, assuming ~ N(μ, σ^{2}), the threshold T is the quantile such that P(x  0 <Tx ~N(μ, σ^{2})) = α, and estimating T by plugging in the momentbased estimates for μ and σ^{2}. This is illustrated graphically in Figure 3. The genes whose scores are less than the threshold are identified as being relevant to CAN. The analyst can adjust α to identify either a larger or smaller number of genes.
Figure 3. Normal density plot to illustrate the selection of the threshold for the scores.
The entire metaanalysis method is summarized in Figure 4.
Figure 4. Flowchart illustrating our nonparametric metaanalysis approach.
The analysis was conducted in the R 2.4.0 environment on a PC with Intel Core Duo CPU@2.0G × 2 and 2.0G RAM. The average calculation time needed for each gene is about 0.9 second. More efficient and faster performance of the algorithm can be realized when the implementation in the C language is available. Our R code for analyzing the two CAN studies can be found in Additional File 2.
Additional file 2. The R code to perform our nonparametric metaanalysis on the two CAN microarray studies.
Format: TXT Size: 13KB Download file
Identification of overrepresented KEGG pathways
To identify the overrepresented pathways associated with the identified genes, we first filtered the whole gene list by excluding probe sets that had not been annotated in KEGG pathway database, and denoted the number of remained probe sets as G_{0}. Among the remaining probe sets, we let G_{ID }denote the number of probe sets that were identified to be relevant to CAN, and let G_{0, p }denote the number of probe sets that were components of a pathway p(p = 1, 2,..., P). Among the G_{ID }identified probe sets, G_{ID, p }probe sets were in the pathway p. Then we performed a Fisher's exact test [34] on the 2 × 2 table as in Table 4 to test whether this pathway p was overrepresented in the identified genes at a significant level 0.01.
Table 4. 2 × 2 Table for testing whether pathway j was overrepresented in the identified genes (To test H_{0}: the number of genes in pathway p is independent of the number of identified genes relevant to CAN. Vs. H_{1}: the number of genes in pathway p is overrepresented in the identified genes relevant to CAN, since all the marginal values are given, Fisher's exact test is used. The pvalue from the test is , where g_{ID, p}denotes the observed value of G_{ID, p})
Prediction on 6 unpublished samples
To predict the class labels of the 6 additional samples, we first normalized and summarized the *.CEL files using RMA method. It was noticed that although a gene's differential expression pattern might be similar in both studies, the numerical values between the studies took on different ranges. As can be seen in Figure 1, the gene PVALB was significantly differentially expressed between CAN and normal allograft in both studies; however, the measurements on CAN samples in the Hotchkiss study were in the range of (5.76, 7.61), while in the Mas study they were in the range of (4.90,5.50). Therefore, a global shift on the RMA normalized measurements existed between the two independent studies. Because the six unpublished samples may have been processed by different technicians, the shift on the measurements may also be present between these samples and the earlier two studies.
In order to diminish the impact of the global shift and make the classifier derived from the previous two studies applicable to these unpublished samples, we centered the measurements of each identified gene by subtracting its median in each individual dataset. Next the KNN algorithm was run to build classifiers respectively with the centered data in the Hotchkiss study and Mas study. Then the two classifiers were applied respectively to classify the 6 samples and the predicted results were compared to their true classes.
Authors' contributions
XK developed the method, performed the statistical analysis, wrote programming code, and drafted the manuscript. VM provided data, contributed to the scientific understanding of the clinical implications of CAN in renal transplant recipients, and contributed to revising the manuscript. KJA conceived the study and provided extensive and detailed comments on the statistical analysis and on the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We would like to thank Dr. Tearina Chu and Dr. Enver Akalin (Mount Sinai School of Medicine) for providing us the data from their microarray study. We also thank our college Dr.V. Ramakrishnan for his comment on determining the threshold.
References

Watson JD: Molecular biology of the gene. 5th edition. San Francisco, Pearson/Benjamin Cummings; 2004:xxvix, 732.

Ghosh D, Barette TR, Rhodes D, Chinnaiyan AM: Statistical issues and methods for metaanalysis of microarray data: a case study in prostate cancer.
Funct Integr Genomics 2003, 3:180188. PubMed Abstract  Publisher Full Text

Shen R, Ghosh D, Chinnaiyan AM: Prognostic metasignature of breast cancer developed by twostage mixture modeling of microarray data.
BMC Genomics 2004, 5:94. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM: Metaanalysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer.
Cancer Res 2002, 62:44274433. PubMed Abstract  Publisher Full Text

Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM: Largescale metaanalysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression.
Proc Natl Acad Sci U S A 2004, 101:93099314. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Choi JK, Yu U, Kim S, Yoo OJ: Combining multiple microarray studies and modeling interstudy variation.
Bioinformatics 2003, 19 Suppl 1:i8490. PubMed Abstract  Publisher Full Text

Choi JK, Choi JY, Kim DG, Choi DW, Kim BY, Lee KH, Yeom YI, Yoo HS, Yoo OJ, Kim S: Integrative analysis of multiple gene expression profiles applied to liver cancer study.
FEBS Lett 2004, 565:93100. PubMed Abstract  Publisher Full Text

Wang J, Coombes KR, Highsmith WE, Keating MJ, Abruzzo LV: Differences in gene expression between Bcell chronic lymphocytic leukemia and normal B cells: a metaanalysis of three microarray studies.
Bioinformatics 2004, 20:31663178. PubMed Abstract  Publisher Full Text

Grutzmann R, Boriss H, Ammerpohl O, Luttges J, Kalthoff H, Schackert HK, Kloppel G, Saeger HD, Pilarsky C: Metaanalysis of microarray data on pancreatic cancer defines a set of commonly dysregulated genes.
Oncogene 2005, 24:50795088. PubMed Abstract  Publisher Full Text

Stevens JR, Doerge RW: Combining Affymetrix microarray results.
BMC Bioinformatics 2005, 6:57. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Mas V, Maluf D, Archer K, Yanek K, Mas L, King A, Gibney E, Massey D, Cotterell A, Fisher R, Posner M: Establishing the molecular pathways involved in chronic allograft nephropathy for testing new noninvasive diagnostic markers.
Transplantation 2007, 83:448457. PubMed Abstract  Publisher Full Text

Hotchkiss H, Chu TT, Hancock WW, Schroppel B, Kretzler M, Schmid H, Liu Y, Dikman S, Akalin E: Differential expression of profibrotic and growth factors in chronic allograft nephropathy.
Transplantation 2006, 81:342349. PubMed Abstract  Publisher Full Text

Irizarry RA, Hobbs B, Collin F, BeazerBarclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data.
Biostatistics 2003, 4:249264. PubMed Abstract  Publisher Full Text

Adley BP, Papavero V, Sugimura J, Teh BT, Yang XJ: Diagnostic value of cytokeratin 7 and parvalbumin in differentiating chromophobe renal cell carcinoma from renal oncocytoma.
Anal Quant Cytol Histol 2006, 28:228236. PubMed Abstract

Martignoni G, Pea M, Chilosi M, Brunelli M, Scarpa A, Colato C, Tardanico R, Zamboni G, Bonetti F: Parvalbumin is constantly expressed in chromophobe renal carcinoma.
Mod Pathol 2001, 14:760767. PubMed Abstract  Publisher Full Text

Wiesel M, Carl S, Drehmer I, Hofmann WJ, Zeier M, Staehler G: [The clinical significance of renal cell carcinoma in dialysis dependent patients in comparison with kidney transplant recipients].
Urologe A 1997, 36:126129. PubMed Abstract  Publisher Full Text

Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response.
Proc Natl Acad Sci U S A 2001, 98:51165121. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Shi L, Tong W, Goodsaid F, Frueh FW, Fang H, Han T, Fuscoe JC, Casciano DA: QA/QC: challenges and pitfalls facing the microarray community and regulatory agencies.
Expert Rev Mol Diagn 2004, 4:761777. PubMed Abstract  Publisher Full Text

Shi L, Tong W, Fang H, Scherf U, Han J, Puri RK, Frueh FW, Goodsaid FM, Guo L, Su Z, Han T, Fuscoe JC, Xu ZA, Patterson TA, Hong H, Xie Q, Perkins RG, Chen JJ, Casciano DA: Crossplatform comparability of microarray technology: intraplatform consistency and appropriate data analysis procedures are essential.
BMC Bioinformatics 2005, 6 Suppl 2:S12. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Cooper HM, Hedges LV: The Handbook of research synthesis. New York, Russell Sage Foundation; 1994:xvi, 573 p..

Benjamini Y, Yekutieli D: The control of the false discovery rate in multiple testing under dependency.

Rotig A: Renal disease and mitochondrial genetics.
J Nephrol 2003, 16:286292. PubMed Abstract  Publisher Full Text

Hu P, Greenwood CM, Beyene J: Integrative analysis of multiple gene expression profiles with qualityadjusted effect size models.
BMC Bioinformatics 2005, 6:128. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Expression profilingbest practices for data generation and interpretation in clinical trials
Nat Rev Genet 2004, 5:229237. PubMed Abstract  Publisher Full Text

Shippy R, FulmerSmentek S, Jensen RV, Jones WD, Wolber PK, Johnson CD, Pine PS, Boysen C, Guo X, Chudin E, Sun YA, Willey JC, ThierryMieg J, ThierryMieg D, Setterquist RA, Wilson M, Lucas AB, Novoradovskaya N, Papallo A, Turpaz Y, Baker SC, Warrington JA, Shi L, Herman D: Using RNA sample titrations to assess microarray platform performance and normalization techniques.
Nat Biotechnol 2006, 24:11231131. PubMed Abstract  Publisher Full Text

Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV, Knight CR, Lee KY, Ma Y, Maqsodi B, Papallo A, Peters EH, Poulter K, Ruppel PL, Samaha RR, Shi L, Yang W, Zhang L, Goodsaid FM: Evaluation of DNA microarray results with quantitative gene expression platforms.
Nat Biotechnol 2006, 24:11151122. PubMed Abstract  Publisher Full Text

Patterson TA, Lobenhofer EK, FulmerSmentek SB, Collins PJ, Chu TM, Bao W, Fang H, Kawasaki ES, Hager J, Tikhonova IR, Walker SJ, Zhang L, Hurban P, de Longueville F, Fuscoe JC, Tong W, Shi L, Wolfinger RD: Performance comparison of onecolor and twocolor platforms within the MicroArray Quality Control (MAQC) project.
Nat Biotechnol 2006, 24:11401150. PubMed Abstract  Publisher Full Text

Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, ThierryMieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas AB, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu TM, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan XH, Fang H, FulmerSmentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, LeClerc JE, Levy S, Li QZ, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, ThierryMieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhong S, Zong Y, Slikker W Jr.: The MicroArray Quality Control (MAQC) project shows inter and intraplatform reproducibility of gene expression measurements.
Nat Biotechnol 2006, 24:11511161. PubMed Abstract  Publisher Full Text

Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, Zhang L, Mei N, Chen T, Herman D, Goodsaid FM, Hurban P, Phillips KL, Xu J, Deng X, Sun YA, Tong W, Dragan YP, Shi L: Rat toxicogenomic study reveals analytical consistency across microarray platforms.
Nat Biotechnol 2006, 24:11621169. PubMed Abstract  Publisher Full Text

Tong W, Lucas AB, Shippy R, Fan X, Fang H, Hong H, Orr MS, Chu TM, Guo X, Collins PJ, Sun YA, Wang SJ, Bao W, Wolfinger RD, Shchegrova S, Guo L, Warrington JA, Shi L: Evaluation of external RNA controls for the assessment of microarray performance.
Nat Biotechnol 2006, 24:11321139. PubMed Abstract  Publisher Full Text

Cover TM, Heart PE: Nearest Neighbor Pattern Classification.
IEEE Transactions on Information Theory 1967, IT13:2127. Publisher Full Text

Ripley BD: Pattern recognition and neural networks. Cambridge ; New York, Cambridge University Press; 1996:xi, 403 p..

Efron B, Tibshirani R: An introduction to the bootstrap. In Monographs on statistics and applied probability ; 57. New York, Chapman & Hall; 1993:xvi, 436 p..

Agresti A: Categorical data analysis. In Wiley series in probability and statistics. 2nd edition. New York, WileyInterscience; 2002:xv, 710 p..