Abstract
Background
Highthroughput sequencing is now regularly used for studies of the transcriptome (RNAseq), particularly for comparisons among experimental conditions. For the time being, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential expression. As their cost continues to decrease, it is likely that additional followup studies will be conducted to readdress the same biological question.
Results
We demonstrate how pvalue combination techniques previously used for microarray metaanalyses can be used for the differential analysis of RNAseq data from multiple related studies. These techniques are compared to a negative binomial generalized linear model (GLM) including a fixed study effect on simulated data and real data on human melanoma cell lines. The GLM with fixed study effect performed well for low interstudy variation and small numbers of studies, but was outperformed by the metaanalysis methods for moderate to large interstudy variability and larger numbers of studies.
Conclusions
The pvalue combination techniques illustrated here are a valuable tool to perform differential metaanalyses of RNAseq data by appropriately accounting for biological and technical variability within studies as well as additional studyspecific effects. An R package metaRNASeq is available on the CRAN (http://cran.rproject.org/web/packages/metaRNASeq webcite).
Keywords:
Metaanalysis; RNAseq; Differential expression; pvalue combinationBackground
Studies of gene expression have increasingly come to rely on the use of highthroughput sequencing (HTS) techniques to directly sequence libraries of reads (i.e., nucleotide sequences) arising from the transcriptome (RNAseq), yielding counts of the number of reads arising from each gene. Due to the cost of HTS experiments, for the time being RNAseq experiments are typically performed on very few biological replicates, and therefore analyses to detect differential expression between two experimental conditions tend to lack detection power. However, as costs continue to decrease, it is likely that additional followup experiments will be conducted to readdress some biological questions, suggesting a future need for methods able to jointly analyze data from multiple studies. In particular, such methods must be able to appropriately account for the biological and technical variability among samples within a given study as well as for the additional variability due to studyspecific effects. Such interstudy variability may arise due to technical differences among studies (e.g., sample preparation, library protocols, batch effects) as well as additional biological variability.
In recent years, several methods have been proposed to analyze microarray data arising from multiple independent but related studies; these metaanalysis techniques have the advantage of increasing the available sample size by integrating related datasets, subsequently increasing the power to detect differential expression. Such metaanalyses include, for example, methods to combine pvalues [1], estimate and combine effect sizes [2], and rank genes within each study [3]; Hu et al.[4] and Hong and Breitling [5] provide a review and comparison of such methods, and Tseng et al.[6] present a recent literature review and discussion of statistical considerations for microarray metaanalysis. In particular, Marot et al.[1] showed that the inverse normal pvalue combination technique outperformed effect size combination methods or moderated ttests [7] obtained from a linear model with a fixed study effect on several criteria, including sensitivity, area under the Receiver Operating Characteristic (ROC) curve, and gene ranking.
In many cases the metaanalysis techniques previously used for microarray data are not directly applicable for RNAseq data. In particular, differential analyses of microarray data, whether for one or multiple studies, typically make use of a standard or moderated ttest [7,8], as such data are continuous and may be roughly approximated by a Gaussian distribution after logtransformation. On the other hand, the growing body of work concerning the differential analysis of RNAseq data has primarily focused on the use of overdispersed Poisson [9] or negative binomial models [10,11] in order to account for their highly heterogeneous and discrete nature. Under these models, the calculation and interpretation of effect sizes is not straightforward. Kulinskaya et al.[12] recently proposed an effect size combination method for Poissondistributed data, based on an Anscombe transformation, but this method is not welladapted to RNAseq data due to the presence of overdispersion among biological replicates as well as zeroinflation. To our knowledge, no other transformation has been proposed to obtain effect sizes for overdispersed Poisson or negative binomial data.
In this paper, we consider several methods for the integrated analysis of RNAseq data arising from multiple related studies, including two pvalue combination methods as well as a model fitted over the full data with a fixed study effect. We first demonstrate how the inverse normal and Fisher pvalue combination methods can be adapted to the differential metaanalysis of RNAseq data. Then we compare these two methods to the results of independent perstudy analyses and a negative binomial generalized linear model (GLM) with a fixed study effect as implemented in the DESeq Bioconductor package [10]. All methods are compared on real data from two related studies on human melanoma cell lines, as well as in an extensive set of simulations varying the interstudy variability, number of studies, and biological replicates per study.
Finally, we note that our focus is on RNAseq data arising from two or more studies in which all experimental conditions under consideration are included in every study (with potentially different numbers of biological replicates); differential analyses among conditions that are not studied in the same experiment are typically limited, or even compromised, due to the confounding of condition and study effects.
Methods
Let y_{gcrs} be the observed count for gene g (g = 1, …, G), condition c (c = 1, 2), biological replicate r (r = 1, … R_{cs}), and study s (s = 1, …, S). Note that the number of biological replicates R_{cs} may vary between conditions and among studies. We use dot notation to indicate summations in various directions, e.g., , , and so on. Let μ_{gcs} be the mean expression level for gene g in condition c and study s. For an integrated differential analysis of gene expression across all studies, two approaches can be envisaged: the combination of pvalues from perstudy differential analyses, and a global differential analysis. We illustrate both using the default methods and parameters of the DESeq (v1.10.1) analysis pipeline [10], although other popular methods, e.g., edgeR[11], could also be used; we note that the recent extensive comparison of Soneson and Delorenzi [13] provides a helpful guide to choosing an appropriate method and software package to use in practice.
Pvalue combination from independent analyses
For the differential analysis of gene expression within a given study s, we assume that gene counts y_{gcrs} follow a negative binomial distribution parameterized by its mean η_{gcrs} = ℓ_{crs}μ_{gcs} and dispersion ϕ_{gs}, where ℓ_{crs} is a normalization factor to correct for differences in library size. A comparison of different methods to estimate ℓ_{crs} may be found in Dillies et al.[14].
After obtaining pergene mean and dispersion parameter estimates in each study independently, a parametric gamma regression is used to obtain fitted dispersion estimates by pooling information from genes with similar expression strengths. Subsequently, for each gene in each study, the null hypothesis to be tested is that there is no difference in the relative proportion of read counts attributed to each condition, or in other words, that the gene is nondifferentially expressed. Pergene and perstudy pvalues p_{gs} are computed using a conditioned test analogous to Fisher’s exact test, where the pvalue of a pair of observed count sums (y_{g1·s},y_{g2·s}) is calculated as the sum of all probabilities less than p(y_{g1·s}, y_{g2·s}) given the overall sum y_{g··s}:
where it is assumed that p(a, b) = p(a)p(b), and p(a) and p(b) represent the probability of a and b counts in the first and second conditions, respectively. These probabilities are calculated using the negative binomial distributions parameterized by the corresponding estimated mean and dispersion parameters, μ_{gcs} and ϕ_{gs}.
Additional details are described by Anders and Huber [10] and in the DESeq package vignette. Once these vectors of raw pvalues have been obtained for each study, we consider two possible approaches to combine them: the inverse normal and the Fisher combination methods. We note that both of these approaches assume that under the null hypothesis, each vector of pvalues is assumed to be uniformly distributed.
Inverse normal method
For each gene g, we define
where p_{gs} corresponds to the raw pvalue obtained for gene g in a differential analysis for study s, Φ the cumulative distribution function of the standard normal distribution, and w_{s} a set of weights [15,16]. We propose here to define the studyspecific weights w_{s}, as described by Marot and Mayer [17]:
where is the total number of biological replicates in study s. This allows studies with large numbers of biological replicates to be attributed a larger weight than smaller studies. We note that other weights may also be defined by the user depending on the quality of the data in each study, if this information is available.
Under the null hypothesis, the test statistic N_{g} in Equation (1) follows a distribution. A unilateral test on the righthand tail of the distribution may then be performed, and classical procedures for the correction of multiple testing such as the approach of Benjamini and Hochberg [18] may subsequently be applied to the obtained pvalues to control the false discovery rate at a desired level α.
Fisher combination method
For the Fisher combination method [19], the test statistic for each gene g may be defined as
where as before p_{gs} corresponds to the raw pvalue obtained for gene g in a differential analysis for study s. Under the null hypothesis, the test statistic F_{g} in Equation (2) follows a χ^{2} distribution with 2S degrees of freedom. As with the inverse normal pvalue combination method, classical procedures for the correction of multiple testing [18] may be applied to the obtained pvalues to control the false discovery rate at a desired level α.
Additional considerations for pvalue combination
We note that the implementation of the previously described pvalue combination techniques requires two additional considerations to be taken into account when dealing with RNAseq data.
First, a crucial underlying assumption for the statistics defined in Equations (1) and (2) is that pvalues for all genes arising from the perstudy differential analyses are uniformly distributed under the null hypothesis. This assumption is, however, not always satisfied for RNAseq data; in particular, a peak is often observed for pvalues close to 1 due to the discretization of pvalues for very low counts. To circumvent this first difficulty, as is commonly done for differential analyses in practice, we propose to filter the weakly expressed genes in each study, using the HTSFilter Bioconductor package [20] as described in the Additional file 1. We note that in so doing, it is possible for a gene to be filtered from one study and not from another. As will be seen in the following, this approach appears to effectively filter those genes contributing to a peak of large pvalues, resulting in pvalues that are roughly uniformly distributed under the null hypothesis.
Additional file 1. Supplementary materials. This document contains a discussion concerning the filtering of RNAseq data, supplementary information about the characteristics of the data and the Ingenuity Pathways Analysis discussed in the real data analysis, and supplementary figures.
Format: PDF Size: 1.8MB Download file
This file can be viewed with: Adobe Acrobat Reader
Second, for the two pvalue combination methods described above, unlike microarray data, under and overexpressed genes are analyzed together for RNAseq data. As such, some care must be taken to identify genes exhibiting conflicting expression patterns (i.e., underexpression when comparing one condition to another in one study, and overexpression for the same comparison in another study). In the case of microarray data, Marot et al.[1] suggested the use of onetailed pvalues for each study to avoid directional conflicts; as the inverse normal combination method was used in their work, the combined statistic thus follows a normal distribution, which is symmetric. Because under and overexpressed genes may be found in the left and right tail, respectively, of the corresponding normal distribution, it is thus possible to use a twotailed test to simultaneously study over and underexpressed genes. Note that Pearson [21] and Owen [22] proposed another alternative to handle conflicting differential expression if the Fisher combination method is used instead. However, in the case of RNAseq data, the use of the conditioned test described above does not enable the separation of over and underexpressed genes in distribution tails; this implies that it is not possible to use the approaches proposed by Marot et al.[1] or Owen [22]. We thus suggest that genes exhibiting differential expression conflicts among studies be identified post hoc, and removed from the list of differentially expressed genes; this step to remove genes with conflicting differential expression from the final list of differentially expressed genes may be performed automatically within the associated R package metaRNASeq.
Global differential analysis
For a global analysis of RNAseq data arising from multiple studies, we assume that gene counts y_{gcrs} follow a negative binomial distribution parameterized by mean η_{gcrs} = ℓ_{crs}μ_{gcs} and dispersion ϕ_{g}, where ℓ_{crs} is the library size normalization factor. In order to estimate a possible effect due to study, a full and reduced model are fitted for each gene using negative binomial generalized linear models (GLM); the full model regresses gene expression on fixed effects for the experimental condition and study, while the reduced model regresses gene expression only on a fixed effect for the study.
Specifically, the full model is log (η_{gcrs}) = β_{g} + λ_{gc} + δ_{gs} + log (ℓ_{crs}), where β_{g} is an intercept, λ_{gc} is a fixed condition effect, δ_{gs} a fixed study effect, and λ_{g1} = δ_{g1} = 0, with the choice of the condition and study to be used as references being arbitrary. The reduced model is log(η_{gcrs}) = β_{g} + δ_{gs} + log(ℓ_{crs}). Pergene dispersion parameters are estimated as before, where a parametric gamma regression is used to obtain fitted dispersion estimates by pooling information from genes with similar expression strengths across all studies.
We are now interested in testing the global pergene null hypothesis
Following parameter estimation, the two models are compared using a χ^{2} likelihood ratio test (with degrees of freedom equal to the number of conditions minus one) to determine whether including the experimental condition significantly improves the model fit. Note that for the global differential analysis we use the HTSFilter Bioconductor package [20] to filter the full set of data across studies prior to calculating pvalues, resulting in a single vector of raw filtered pergene pvalues that may be corrected for multiple testing using classical procedures [18] to control the false discovery rate at a desired level α. Additional details may be found in the DESeq package vignette.
Results and discussion
Application to real data
Presentation of the data
The negative binomial GLM and pvalue combination methods were applied to a pair of real RNAseq studies performed to compare two human melanoma cell lines [23]. Each study compares gene expression in a melanoma cell line expressing the Microphtalmia Transcription Factor (MiTF) to one in which small interfering RNAs (siRNAs) were used to repress MiTF, with three biological replicates per cell line in the first (hereafter referred to as Study A) and two per cell line in the second (Study B). The raw read counts and phenotype tables for Study A are available in the Supplementary Materials of Dillies et al. [14], and the data from Study B from Strub et al. [23].
The characteristics of the data from these two studies are summarized in Additional file 1: Table S1. In particular, we note that the data from Study A tend to have larger total library sizes and a smaller number of unique reads (i.e. reads that appear once in the reference genome) than those from Study B; in addition, Study A appears to exhibit larger overall pergene variability than does Study B (Additional file 1: Figure S8). These two points indicate that in this pair of studies, a considerable amount of interstudy heterogeneity appears to be present (Additional file 1: Figure S9).
Results
After performing individual differential analysis for each study using the negative binomial model and exact test as described in the previous section, we obtained pergene pvalues for each study (Figure 1, histograms in background). As previously stated, an important underlying assumption of the pvalue combination methods is that the pvalues are uniformly distributed under the null hypothesis; we note that this is not the case here, especially for the second study, due to a large peak of values close to 1 resulting from the discretization of pvalues. In order to remove the weakly expressed genes contributing to this peak in each study, we filtered the data from each study as proposed in Rau et al.[20], resulting in a distribution of raw pvalues from each study that appears to satisfy the uniformity assumption under the null hypothesis (Figure 1, histograms in foreground).
Figure 1. Raw pvalue histograms from perstudy differential analyses of real data. Histograms of raw pvalues obtained from perstudy differential analyses in the real data from Study A (left) and Study B (right): unfiltered (in grey) and filtered (in blue) using the method of [20]. Figure made using the ggplot2 package [24].
The perstudy filtered pvalues were combined using the test statistics defined in Equations (1) and (2), and the corresponding results were compared to those of the intersection of independent perstudy analyses and the global analysis using a negative binomial GLM with a fixed study effect as previously described. We note that for the independent perstudy differential analyses, a gene is declared to be differentially expressed if identified in both studies with no differential expression conflict. For the Inverse normal and Fisher methods, a total of 310 (6.8% of differentially expressed genes) and 439 (9.0% of differentially expressed genes) genes were respectively found to exhibit conflicting expression between the two studies, and were subsequently removed from the final list of differentially expressed genes. Unsurprisingly, these genes tended to be those with relatively large pvalues in both studies (Additional file 1: Figure S11).
In addition, we also investigated whether genes identified as differentially expressed by the Inverse normal and Fisher methods tended to be disproportionately dominated by one study over the other, i.e. very small pvalues in only one study (Additional file 1: Figure S12). Although Study B appears to have slightly more genes with very small pvalues, for the most part, pvalues for differentially expressed genes tend to be wellbalanced between the two studies.
The Venn diagram presented in Figure 2 compares the lists of differentially expressed genes found for all methods considered. It may immediately be noticed that the independent perstudy analysis approach is very conservative, and both of the pvalue combination approaches (Fisher and inverse normal) considerably increase the detection power. In addition, a large number of genes are found in common among the pvalue combination methods and the global analysis (3578 compared to only 1583 from the intersection of individual studies). In order to determine whether the genes uniquely identified by a particular method appear to be biologically pertinent, an Ingenuity Pathways Analysis (Ingenuity^{®;} Systems, http://www.ingenuity.com webcite) was performed to identify functional annotation for the genes uniquely identified by the Fisher pvalue combination method with respect to the global analysis, and vice versa. We note that the sets of genes uniquely identified by the Fisher method or the global analysis (Additional file 1: Tables S2 and S3), as well as the set of genes found in common (Additional file 1: Table S4), all appear to include genes of potential interest related to cancer or melanoma, which was the main focus of this set of studies. As such, for this pair of studies it appears that the union of genes identified by the two approaches may be of biological interest; to further study the effect of number of studies and interstudy variability on the performance of each method, we investigate an extensive set of simulated data in the following section.
Figure 2. Comparison of results from differential analyses of real data. Venn diagram presenting the results of the differential analysis for the real data for the two metaanalysis methods (Fisher and inverse normal), the global analysis (DESeq (study)), and the intersection of individual perstudy analyses (Individual). Figure made using the VennDiagram package [25].
Simulation study
Data were simulated according to a negative binomial distribution,
where μ_{gcs} and ϕ_{gs} represent the mean and dispersion, respectively, for gene g, condition c and study s, and the meanvariance relationship is defined by
In order to incorporate interstudy variability, we consider the following situation for the mean parameter μ_{gcs}:
where θ_{gc} represents the mean for gene g in condition c, ε_{gcs} the variability around these means due to a study and conditionspecific random effect, and σ^{2} the size of the interstudy variability. Note that as ε_{gcs} affects μ_{gcs} through a log link, the value of exp(ε_{gcs}) has a multiplicative effect on the mean.
Parameters for simulations
To fix realistic values for the parameters {θ_{gc}, ϕ_{gs}, σ}, we first performed individual perstudy differential analyses by fitting a negative binomial model with the default methods and parameters of the DESeq package on the unfiltered human data presented above. The perstudy false discovery rate was subsequently controlled at the α=0.05 level [18]. For the genes identified as differentially expressed in both studies, θ_{g1} and θ_{g2} were fixed to be the values of the empirical means (after normalization for library size differences) for each condition across studies. For the remaining genes, we set θ_{g1} = θ_{g2} = θ_{g} to be the overall empirical mean (after normalization for library size differences) for gene g across both conditions and studies. Using the gammafamily GLM fitted to the pergene mean and dispersion parameter estimates for each study (Additional file 1: Figure S8), we fixed the dispersion parameter ϕ_{gs} to be equal to the fitted values
where and are the estimated coefficients from the gammafamily GLM for study s, and θ_{g} is the overall empirical mean for gene g. For weakly expressed genes, it has been observed that little overdispersion is present as biological variation is dominated by shot noise (i.e., the variation inherent to a counting process); for genes with θ_{g} < 10, the dispersion parameter is therefore fixed to be ϕ_{gs} = 10^{10}, which corresponds to nearly zero overdispersion (i.e., mean nearly equal to the variance).
Finally, the parameter σ is chosen to represent a range of values for the amount of interstudy variability. The observed human data exhibit a considerable amount of interstudy variability, corresponding to a value of roughly σ = 0.5 (see Additional file 1: Figure S9). In the following simulations, four values are considered for the parameter σ: {0,0.15,0.3,0.5}, representing zero, small, medium, and large interstudy variability, respectively. Finally, we note that for genes simulated to be nondifferentially expressed, we set .
The simulation settings used for the number of studies and number of replicates per condition in each study are presented in Table 1 and were chosen to reflect the size of real RNAseq experiments. When more than two studies were simulated, the same simulation parameters were used as for the first two, as determined from the real data. For simplicity, the same number of replicates was simulated in each condition for all studies.
Table 1. Parameter settings for the simulations, including the number of studies and the number of replicates per condition in each study
Methods and criteria for comparison
In addition to the intersection of independent perstudy analyses (where genes were declared to be differentially expressed if identified in more than half of the studies with no differential conflict), the Fisher and inverse normal pvalue combination techniques, and the global analysis with fixed study effect, we also considered a global analysis with no study effect. For each simulation setting and level of interstudy variability σ, 300 independent datasets were simulated, and the filtering method of Rau et al.[20] was applied, either independently to each study (for the independent perstudy analyses and pvalue combination techniques) or to the full set of data (for the global analysis).
For each method, performance was assessed using the sensitivity, false discovery rate (FDR) and area under the receiver operating characteristic (ROC) curve (AUC). In addition, we also considered a criterion to assess the “value added” for the pvalue combination methods with respect to the global analysis, and vice versa: the proportion of true positives among those uniquely identified by a given method (e.g., the Fisher approach) as compared to another (e.g., the global analysis).
Results
The different methods were first compared with ROC curves, presented in Figure 3 for low and high interstudy variability (results for zero and moderate interstudy variability are shown in Additional file 1: Figure S5). We note that for clarity, the inverse normal method is not represented on these plots as its performance was found to be equivalent to the Fisher method. It can first be noted that for no or small interstudy variability (σ = 0 or σ = 0.15), no practical difference may be observed among the methods. On the other hand, for moderate to large interstudy variability (σ = 0.3 or σ = 0.5) differences among the methods become more apparent; this pattern is observed for any number of studies. As expected, including a study effect in the global analysis improves the performance over a naive global analysis without such an effect. We note that the two proposed metaanalysis methods (inverse normal and Fisher pvalue combination) were found to perform very similarly and were able, in the case of large interstudy variability, to outperform the global analysis in terms of AUC (Additional file 1: Figure S1). In particular, in the presence of large interstudy variability, the naive global analysis without a study effect unsurprisingly has the lowest AUC, and the two metaanalysis methods yield a larger AUC than the global analysis with a study effect.
Figure 3. Receiver operating characteristic curves for low and high interstudy variability. Receiver Operating Characteristic (ROC) curves, averaged over 300 datasets. Each plot represents the results of a particular setting, with columns corresponding (from left to right) to simulations including 2 studies, 3 studies, and 5 studies, and rows corresponding (from top to bottom) to simulations with interstudy variability set to σ=0.15 and σ=0.50 (small to large interstudy variability). Within each plot: Fisher (red lines), global analysis with no fixed study effect (DESeq (no study), blue lines), and global analysis with a fixed study effect (DESeq (study), orange lines). The dotted black line represents the diagonal.
Considering the sensitivity (Figure 4 and Additional file 1: Figure S6), the metaanalyses appear to lead to similar, and in some settings considerably higher, detection power compared to the other methods. We note that in all settings, using the intersection of independent analyses leads to much lower sensitivity, even for low or zero interstudy variability. As for the AUC, the sensitivity was found to be considerably improved for the global analysis when including a study effect in the GLM model, particularly for medium to large interstudy variability. The two metaanalysis methods were found to lead to significant improvements in sensitivity as compared to the global analysis in the presence of moderate to large interstudy variability when three or more studies were considered. However, for the setting that most resembles our real data analysis (2 studies, σ=0.50), the global analysis with study effect and metaanalyses appear to have similar detection power. Finally, we also note that for all methods the FDR was well controlled below 5% (Additional file 1: Figure S2).
Figure 4. Sensitivity for low and high interstudy variability. Sensitivity, for the simulation settings corresponding to σ=0.15 and σ=0.50. Each barplot represents the results of a particular setting, with columns corresponding (from left to right) to simulations including 2 studies, 3 studies, and 5 studies, and rows corresponding (from top to bottom) to simulations with interstudy variability set to σ=0.15 and σ=0.50 (low interstudy variability to large interstudy variability). Within each barplot, from left to right: Individual perstudy analyses (green bars), inverse normal (purple bars), Fisher (red bars), global DESeq with no study effect (blue bars), and global DESeq with a fixed study effect (orange bars).
Based on these criteria, the two proposed metaanalysis methods (inverse normal and Fisher) seem to perform very similarly. In order to more thoroughly investigate the differences between pvalue combination methods and the global analysis including a study effect, we calculated the proportion of true positives uniquely detected by the Fisher method as compared to the global analysis with study effect, and vice versa (Figure 5 and Additional file 1: Figure S7). In the setting closest to the real data analysis presented above (two studies and large interstudy variability), the proportion of true positives found uniquely by either the Fisher approach or the global analysis with fixed study effect are very large (around 80% for both methods). This seems to suggest that the additional genes uniquely found either by the global analysis or Fisher pvalue combination method in the real data application may indeed be of great biological interest. For more than two studies, however, as the interstudy variability increases the proportion of truly differentially expressed genes uniquely found by the Fisher method increases compared to the global analysis. For example, for three studies with large interstudy variability (σ = 0.5), the proportion of truly DE genes uniquely found with the Fisher method was equal to more than 80%, whereas it was only around 40% for the global analysis with a study effect.
Figure 5. Proportion of true positives among unique discoveries. Proportion of true positives among unique discoveries for DESeq with a fixed study effect (orange bars) and Fisher (red bars). Each barplot represents the results of a particular setting, with columns corresponding (from left to right) to simulations including 2 studies, 3 studies, and 5 studies, and rows corresponding (from top to bottom) to simulations with interstudy variability set to σ = 0.15 and σ = 0.50 (low interstudy variability to large interstudy variability). Error bars represent one standard deviation, and numbers in parentheses represent the mean total number of unique discoveries for DESeq with study effect as compared to Fisher and vice versa, respectively.
Conclusions
The aim of this paper was to present and compare different strategies for the differential metaanalysis of RNAseq data arising from multiple, related studies. As expected, naive analyses such as the overlap of lists of differentially expressed genes found by individual studies or a global analysis not accounting for a study effect perform very poorly. On the other hand, the two proposed metaanalysis methods seem to have very similar performances. For low interstudy variability, the results are very close to those of a global GLM analysis including a study effect. When the interstudy variability increases, however, the gains in performance in terms of AUC, sensitivity, and proportion of true positives among uniquely identified genes for the metaanalysis techniques are significant as compared to the global analysis, particularly for the analysis of data from more than two studies. We note that both of the proposed pvalue combination methods are implemented in an R package called metaRNASeq, available on the CRAN; a package vignette describing the use of metaRNASeq may be found in Additional file 2 as well as by calling vignette(~metaRNASeq~) after loading the package in R.
Additional file 2. metaRNASeq vignette. This document contains a vignette describing in greater detail the metaRNASeq R package. It can also be obtained by running the following commands in the R console:
> library(metaRNASeq)
> vignette(“metaRNASeq”)
Format: PDF Size: 248KB Download file
This file can be viewed with: Adobe Acrobat Reader
Our focus in this work is on differential analyses between two experimental conditions, but can readily be extended to multigroup comparisons. However, as previously noted, the methods presented here are intended for the analysis of data in which all experimental conditions under consideration are included in every study, thus avoiding problems due to the confounding of condition and study effects. As with all metaanalyses, the pvalue combination techniques presented here must overcome differences in experimental objectives, design, and populations of interest, as well as differences in sequencing technology, library preparation, and laboratoryspecific effects.
The differential metaanalyses presented here concern expression studies based on RNAseq data. However, other genomic data are generated by highthroughput sequencing techniques, including chromatin immunoprecipitation sequencing (CHIPseq) and DNA methylation sequencing (methylseq), and the proposed techniques could potentially be extended to these other kinds of data. However, in order to be biologically relevant, the pvalue combination methods rely on the fact that the same test statistics, or in the case of RNAseq data conditioned tests, are used to obtain pvalues for each study. An important challenge for the future will be to propose methods able to jointly analyze related heterogeneous data, such as microarray and RNAseq data, or other kinds of genomic data. This is not straightforward in a metaanalysis framework and remains an open research question.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
AR participated in the design of the study, performed simulations and data analyses, and helped draft the manuscript. GM designed the study, wrote the associated R package, and helped draft the manuscript. FJ conceived of the study, participated in its design, and drafted the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We thank Thomas Strub, Irwin Davidson, Céline Keime and the IGBMC sequencing platform for providing the RNAseq data, as well as two anonymous reviewers for their helpful comments and suggestions.
References

Marot G, Foulley JL, Mayer CD, Jaffrézic F: Moderated effect size and Pvalue combinations for microarray metaanalyses.
Bioinformatics 2009, 25(20):26922699.
doi:10.1093/bioinformatics/btp444
PubMed Abstract  Publisher Full Text 
Choi JK, Yu U, Kim S, Yoo OJ: Combining multiple microarray studies and model interstudy variation.
Bioinformatics 2003, 19(Suppl 1):8490. Publisher Full Text

Breitling R, Armengaud P, Amtmann A, Herzyk P: Rank products: a simple yet powerful new method to detect differential regulated genes in replicated microarray experiments.
FEBS Lett 2004, 573:8392. PubMed Abstract  Publisher Full Text

Hu P, Greenwood CM, Beyene J: Statistical methods for metaanalysis of microarray data: a comparative study.
Inf Syst Front 2006, 8:920. Publisher Full Text

Hong F, Breitling R: A comparison of metaanalysis methods for detecting differentially expressed genes in microarray experiments.
Bioinformatics 2008, 24(3):374382. PubMed Abstract  Publisher Full Text

Tseng GC, Ghosh D, Feingold E: Comprehensive literature review and statistical considerations for microarray metaanalysis.
Nucleic Acids Res 2012, 40(9):37853799. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments.
Stat Appl Genet Mol Biol 2004., 3(Article 3)
http://dx.doi.org/10.2202/15446115.1027 webcite

Jaffrézic F, Marot G, Degrelle S, Hue I, Foulley JL: A structural mixed model for variances in differential gene expression studies.
Genet Res 2007, 89:1925. PubMed Abstract  Publisher Full Text

Auer P, Doerge R: A twostage Poisson model for testing RNAseq data.

Anders S, Huber W: Differential expression analysis for sequence count data.
Genome Biol 2010., 11(R106)
doi:10.1186/gb20101110r106

Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
Bioinformatics 2010, 26:139140. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Kulinskaya E, Morgenthaler S, Staudte RG: Meta Analysis: a guide to calibrating and combining statistical evidence, Volume Volume 756 of Wiley Series in Probability and Statistics. West Sussex, England: John Wiley & Sons; 2008.

Soneson C, Delorenzi M: A comparison of methods for differential expression analysis of RNAseq data.
BMC Bioinformatics 2013, 14:91. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Dillies MA, Rau A, Aubert J, HennequetAntier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, Guernec G, Jagla B, Jouneau L, Laloë D, Le Gall C, Schaëffer B, Le Crom S, Jaffrézic F: A comprehensive evaluation of normalization methods for Illumina highthroughput RNA sequencing data analysis.
Brief Bioinform 2012.
[doi:10.1093/bib/bbs046]

Stouffer S, Suchman E, DeVinney L, Star S, Williams JRM: The American soldier. Adjustment during Army life. Princeton, NJ: Princeton University Press; 1949.

Liptak T: On the combination of independent tests.
Magyar Tudomanyos. Akademia Matematikai Kutato Intezetenek Kozlemenyei 1958, 3:171197.

Marot G, Mayer CD: Sequential analysis for microarray data based on sensitivity and metaanalysis.
Stat Appl Genet Mol Biol 2009., 8(Article 3)
[doi:10.2202/15446115.1368]

Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing.

Fisher RA: Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd; 1932.

Rau A, Gallopin M, Celeux G, Jaffrézic F: Databased filtering for replicated highthroughput transcriptome sequencing experiments.
Bioinformatics 2013, 29(17):214652. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Pearson K: On a new method of determining “goodness of fit”.

Owen AB: Karl Pearson’s metaanalysis revisited.
Annals of Statistics 2009, 37(6B):38673892. Publisher Full Text

Strub T, Giuliano S, Ye T, Bonet C, Keime C, Kobi D, Gras SL, Cormont M, Ballotti R, Bertolotto C, Davidson I: Essential role of microphthalmia transcription factor for DNA replication, mitosis and genomic stability in melanoma.
Oncogene 2011, 30:23192332. PubMed Abstract  Publisher Full Text

Wickham H: ggplot2: Elegant Graphics for Data Analysis. New York: Springer; 2009.

Chen H, Boutros PC: VennDiagram: a package for the generation of highlycustomizable Venn and Euler diagrams in R.
BMC Bioinformatics 2011, 12:35. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text