Abstract
Background
When analyzing microarray data a primary objective is often to find differentially expressed genes. With empirical Bayes and penalized ttests the sample variances are adjusted towards a global estimate, producing more stable results compared to ordinary ttests. However, for Affymetrix type data a clear dependency between variability and intensitylevel generally exists, even for logged intensities, most clearly for data at the probe level but also for probeset summarizes such as the MAS5 expression index. As a consequence, adjustment towards a global estimate results in an intensitylevel dependent false positive rate.
Results
We propose two new methods for finding differentially expressed genes, Probe level Locally moderated Weighted mediant (PLW) and Locally Moderated Weightedt (LMW). Both methods use an empirical Bayes model taking the dependency between variability and intensitylevel into account. A global covariance matrix is also used allowing for differing variances between arrays as well as arraytoarray correlations. PLW is specially designed for Affymetrix type arrays (or other multipleprobe arrays). Instead of making inference on probeset summaries, comparisons are made separately for each perfectmatch probe and are then summarized into one score for the probeset.
Conclusion
The proposed methods are compared to 14 existing methods using five spikein data sets. For RMA and GCRMA processed data, PLW has the most accurate ranking of regulated genes in four out of the five data sets, and LMW consistently performs better than all examined moderated ttests when used on RMA, GCRMA, and MAS5 expression indexes.
Background
Microarrays are widely used for measuring gene expression in biomedical research. For the purpose of finding differentially expressed genes there exist numerous methods. In early studies genes where often ranked with respect to foldchange. Genes showing foldchange above 2 (or 3) were regarded as potentially regulated and were selected for further investigation. The obvious drawback with such an approach, as pointed out by many authors, is that genes with high foldchange may also be highly variable and thus with low significance of the regulation. On the other hand, since the number of replicates in many studies is small, variance estimators computed solely within genes are not reliable in that very small values can occur just by chance. As a consequence the ordinary ttest suffers from low power and is not a better option for filtering out regulated genes.
Many methods have been proposed to improve on the variance estimator in order to find more powerful statistical tests for differential expression. In empirical Bayes methods [18] and the penalized ttest suggested in [9], the genespecific variance estimator is modified in order to produce more stable results. With proportions determined by the accuracy of the genespecific variance estimators, a mixture of the genespecific variance estimator and a global variance estimate is used in place of the genespecific variance estimator in the denominator of the ttest. Similarly, in the Significance Analysis of Microarrays (SAM) method [10] and the method suggested in [11], a constant is added to the genespecific sample standard deviation.
Another approach is to pool variance estimators for genes having similar expression level, thus modeling the variance as a function of intensitylevel. For example Eaves et al. [12] use a weighted average of the genespecific variance estimator and a pooled estimate based on the 500 genes with most similar mean expression level, and Jain et al. [13] suggest the localpoolederror method (LPE) where a variance function fitted to estimated variances and mean intensities is used. Comander et al. [14] pool genes with respect to minimum intensity rather than mean intensity, and Hu et al. [15] use a hierarchical model with a linear relationship between variance and intensitylevel. Of these four methods, only the one suggested in [15] takes the accuracy of the genespecific variance estimators into account when setting the weights for the genespecific estimator and the pooled estimator, respectively. On the other hand Hu et al. [15] only deal with a linear relationship between variance and intensitylevel. A variance to intensitylevel dependency is also utilized in the moderated ttest suggested in [6]. The method proposed builds on the moderated ttest suggested in [2,3] with the addition of fitting a loess curve in the scatter plot of logged variance estimators against mean intensity when estimating the model parameters.
The type of arrays considered in this paper is the Affymetrix GeneChip arrays. These arrays are one color arrays and each gene is represented by a set of probes, the probeset, consisting of 10–16 probepairs. Each probepair consists of one perfect match (PM) probe and one mismatch (MM) probe. The probes are 25 bases long and the PM and MM probes have identical sequences of bases except for the middle probe which in the MM probe is set to the complementary base of that in the PM probe. The MM probes are thus designed to measure the background intensity for the corresponding PM probe. The standard way of dealing with the multipleprobes is to derive a probe set summary, an expression index, for each probeset (gene) and array (sample), for example using the RMA method [16] or the Affymetrix MAS5 algorithm. The expression indexes are then used in downstream analysis by only considering the expression index itself, the precision of the expression index is ignored. However, in the fully Bayesian probelevel BGX model [17] information about the accuracy of the expression index is obtained as a complete distribution which is subsequently used when computing the posterior distribution of differential expression. Also, the probelevel measurement error from the probabilistic probelevel model multimgMOS [18] is used when computing the probability of positive logratio in the PPLR method [19].
For Affymetrix type arrays a dependency between variability and intensitylevel generally exists, even for logtransformed data. Figure 1 shows scatter plots of sample variance versus sample mean calculated on logged PM intensities (background corrected and normalized using the default methods of RMA) and three different expression indexes: RMA, GCRMA and MAS5. Except for the RMA expression index a clear dependency between variability and intensitylevel exists, with a unique signature for each type of preprocessing of the raw CELfile data. The GCRMA expression index shows increasing variability with intensitylevel while MAS5 shows the opposite relationship. As a consequence, methods assuming constant variance as well as methods adjusting the genespecific variance (or standard deviation) estimators towards a global estimate suffer from intensitylevel dependent false positive rates. Figure 2 shows an example where the moderated ttest in the Rpackage LIMMA [3] was used on MAS5 expression indexes computed on a set of replicated arrays. The false positive rate obtained with LIMMA follows the same pattern as in the right lower panel in Figure 1 where the same data set is used. Almost identical result was obtained using data set A in a similar simulation (data not shown).
Figure 1. Probe, or probeset, wise sample variances against sample means. Scatter plots of sample variance s^{2 }(logged with base 2) against mean intensity for logged PM intensities and three expression indexes. Left and right panels show data set A and B, respectively (see Section Data sets).
Figure 2. False positive rate against mean intensity. False positive rate (α) calculated on resampled data and plotted against mean intensity. 100 data sets of size 6 were sampled from the complete data set B (see Section Data sets) of 18 replicated arrays and then analyzed using the Affymetrix MAS5 algorithm followed by a two group analysis of 3+3 arrays using the moderated ttest in the Rpackage LIMMA [3], on logged MAS5 indexes and indexes transformed using the variance stabilizing transformation in the Rpackage vsn [21], and the proposed method LMW using logged MAS5 indexes. false positive rate were obtained by averaging over the sampled data sets using loesscurves fitted to mean intensity and indicator of significance (1 if the probeset is among the 5% probesets with highest absolute statistic, 0 otherwise). The mean intensities of each data set are shifted to the range [0,15].
The aim of variance stabilizing transformations is to reduce or eliminate the problem of dependency between variability and intensitylevel. A family of transformations, the generalizedlog family (glog), was introduced in [2022] and further used in [23,24]. A comparison of the glog family with the started logarithm transformation [25] and the loglinear hybrid transformation [26] is presented in [27]. It is concluded that the glog family is "probably the best choice when it is convenient to use it", but it is also noted that the direct interpretation of differences as logged ratios for microarray data when using the ordinary logtransformation, does not hold when using such variance stabilizing transformations.
Generally, the glog family effectively stabilizes the variance when applied to raw Affymetrix probelevel data, for example using the parameter estimation procedure described in [21]. However, the transformations implicitly defines a background correction, and when applied to data already having been subject to another background correction (or further processed data), the glog transformations may not be able to capture the structure of the dependency between variability and intensitylevel. This applies, for example, to probelevel data background corrected using the RMA default background method, and MAS5 expression indexes, see Figure 2. Thus, there is a need for more flexible solutions, and in short, Figures 1 and 2 may be seen as one motivation for the methods proposed in this paper.
The hierarchical Bayesian model WAME proposed and developed in [4,5,7,8] is in the present paper extended to incorporate the variability to intensitylevel dependency. The Probe level Locally moderated Weighted mediant method (PLW) applies the extended model to logged PM intensities resulting in moderated and weighted tstatistics for all PM probes. In the final step of PLW the median tstatistic of all PM probes building up each probeset is computed, and this median is the value used for ranking the probesets with respect to differential expression.
The Locally Moderated Weightedt method (LMW) is a more general method intended for single probe type of arrays or summary measures of multiple probe type arrays, such as RMA and MAS5. LMW use the same model as PLW but since only one tstatistic is obtained for each probeset no median is calculated. The proposed methods are compared with existing methods on five publicly available spikein data sets.
Results and Discussion
Model and methodology
Given a set of n arrays let y_{ip }be the background corrected and normalized logintensity on array i for PM probe p and put y_{p }= (y_{1p},...,y_{np})^{T}. The PM probes are divided into G (disjoint) probesets and thus there are a total of probes. For p = 1,...,P assume
where μ_{p }is the logintensity profile for probe p across the n arrays with mean logintensity level , ∑ is an n × n covariance matrix, m is a realvalued parameter, and ν(·) is a smooth realvalued function. N_{n }denotes an ndimensional normal distribution, and Γ^{1}(a, b) denotes the inversegamma distribution with shape parameter a and scale parameter b. A cubic spline is used to parameterize the function ν(·). Given set of K interior splineknots
where β is a parameter vector of length 2K  1 and H : ℝ → ℝ^{2K1 }is a set of Bspline basis functions, see chapter 5 of [28].
As in the model suggested in [4] the model in Equ. 1 makes use of a global covariance matrix, thus allowing differing variances as well as correlations between arrays. To account for the dependency between variability and intensitylevel the scaleparameter of the Γ^{1}distribution depends on the mean logintensity level for the probe through the smooth function ν.
We assume that the vector μ_{p }is determined by a full rank n × k design matrix D and a parameter vector γ_{p }of length k. The aim is to estimate and test hypothesis for δ_{p}, a linear combination of γ_{p }specified by a 1 × k matrix C. In summary,
For the special case of comparing two conditions, with n_{1 }and n_{2 }arrays from conditions 1 and 2, respectively, the design matrix D is an (n_{1 }+ n_{2}) × 2 matrix. For example, with n_{1 }= 3 and n_{2 }= 4 we can use
and thus μ_{p }= (γ_{p1}, γ_{p1}, γ _{p1}, γ_{p2}, γ_{p2}, γ_{p2}, γ_{p2})^{T}. With C = [1 1] we have δ_{p }= γ_{p2 } γ_{p1}, thus δ_{p }is the logged fold change between conditions 2 and 1.
However, instead of estimating the parameters of the model in Equ. 1 we use a reduced model derived from Equ. 1 through a linear transformation of the vector y_{p}. Define the n × n and n × 1 matrices
Since A_{0 }is of rank n  k only we let A be an n × (n  k) matrix whose column space equals that of A_{0}.
With q = n  k + 1 form the n × q transformation matrix M and the vector z_{p }of length q
giving the reduced model
where ∑_{z }= M^{T}∑M.
The reduced model is fitted using the EM algorithm [29] as described in Section Parameter estimation.
The c_{p}'s are treated as missing data and we replace the unknown intensitylevel for probe p, , with the observed mean intensity across arrays, . Given estimators of the parameters ∑_{z}, m, and β we proceed as if these parameters are known, and weighted moderated ttests are computed for each probe p. The unbiased minimum variance estimator of δ_{p }is
where λ is the vector (0,...,0, 1)^{T }of length q. The weighted moderated tstatistic is defined as
and under H_{0}: δ_{p }= 0 it can be shown that is tdistributed with q + m  1 degrees of freedom. Here
is the weighted residual sum of squares. See [5] for details. The PLW statistic for the probeset is then defined as
The LMW and PLW methods are implemented in the R package plw [30] available at the authors' web page and at the Bioconductor projects web page [31] (at the time of writing only among develpackages, bioconductor 2.2).
Parameter estimation
The q × q covariance matrix ∑_{z }of the reduced model in Equ. 3 is divided according to
where ∑_{A }is the covariance matrix for all but the last dimension of z_{p }and is the variance of the last dimension (indexes A and B refer to the corresponding submatrices of the transformation matrix M in Equ. 2). The reduced model is fitted in two steps. First the parameters m, β and the submatrix ∑_{A }are estimated by dropping the last dimension of the vectors z_{p}. Since the reduced model is not identifiable without a restriction on the function ν or the covariance matrices ∑_{z }we use the restriction trace(∑_{A}) = q  1. Secondly, the parameters m and β are held fixed and ∑_{z }is estimated using the complete z_{p }vectors. Temporarily the assumption of no regulated genes is used (δ_{p }= 0 for all probes) and ∑_{z }is estimated under the restriction that the trace of the ∑_{A }part should be equal to q  1.
In step 1, we let x_{p }denote the subvector of z_{p }obtained by dropping the last element. Under the reduced model x_{p }is distributed according to the model in Equ. 1 with ∑ = ∑_{A}, μ_{p }= 0, n = q  1, and using the EMalgorithm an iterative procedure for estimating m, β and ∑_{A }is obtained. Given estimates of the previous iteration, m_{0}, β_{0 }and ∑_{A0}, updated estimates are found as follows. Let
The updated estimate of ∑_{A }is
and the updated estimate of β is found by numerical maximization of the function
With equal to the updated estimate of β let
where is the digamma function. The updated estimate of m is then found using numerical maximization of the function
In step 2 a similar iterative procedure is used to estimate ∑_{z}. With ∑_{z0 }denoting the estimate of ∑_{z }from the previous iteration and with w_{p }redefined as
where and are the estimates obtained in step 1, an updated estimate of ∑_{z }is computed according to Equ. 8 with z_{p }replacing x_{p}. In order for the estimators of ∑_{A }and ∑_{z}, in step 1 and 2, respectively, to comply with the trace restriction the updated estimates are scaled at the end of each iteration. [For more details see Additional file 1]
Additional file 1. Parameter estimation details. A detailed description of the parameter estimation procedure.
Format: PDF Size: 72KB Download file
This file can be viewed with: Adobe Acrobat Reader
Data sets
The two data sets used in Figures 1 and 2 are publicly available at the Gene Expression Omnibus repository [32] with series or sample reference number indicated below. Data set A consists of the 18 arrays from the severe group of the COPD data set [33] (series reference number GSE1650), where Affymetrix arrays of type HG U133A were used. In data set B the 18 arrays with normal tissue where selected from a lung tumor data set [34] (sample reference numbers GSM47958GSM47976, excluding GSM47967). Here the HGU95A arrays were used.
Five spikein data sets were used to evaluate the proposed methods. In the Affymetrix U95 and 133A Latin Square data sets [35] arrays of type HGU95A and HGU133A, respectively, were used. The Affymetrix U95 data set consists of data from 59 arrays divided into 19 groups of size 3, and one group of size 2.
From the 20 groups there are 178 possible pairwise group comparisons each with 16 [36] known differentially expressed genes among the 12626 genes present on the arrays. The Affymetrix 133A data set comprise data from 42 arrays with a total of 22300 probesets of which 42 were spiked in at known concentration. The 42 arrays are divided into 14 groups of size 3 and thus there are 91 possible pairwise group comparisons. As done in the Affycomp II assessments [36] we exclude 271 probesets which are likely to crosshybridize to spikein probesets. The sequence of each spikein clone was blasted against all HGU133A target sequences (~600 bp regions from which probes are selected). A threshold of 100 bp identified 271 probesets which are available in the Affycomp Rpackage.
From the Gene Logic Tonsil and AML data sets [37] all groups with 3 replicated arrays were used, giving a total of 12 and 10 groups, respectively. For these data there are 11 genes spiked in at known concentration, which can be studied in 66 and 45 pairwise group comparisons, respectively. Both data sets were obtained using the Affymetrix HGU95A arrays having 12626 genes.
The Golden Spike data set [38] consists of 6 arrays of type Drosgenome1 divided into 2 groups of equal size. The samples used in this experiment consist of mRNA from 3866 genes, of which 1331 are differentially expressed between the groups. The Drosgenome1 array has a total of 14010 genes, thus 10144 of these should not be expressed, 2535 should be expressed but not regulated, and 1331 should be expressed and regulated.
Since all 1331 genes are upregulated in the spikein group it is necessary to take special care in the normalization when analyzing this data set. Generally this means performing normalizations based on a subset of genes, either only the 2535 genes spiked in at identical concentration in both groups [39], or the 2535 genes together with the 10144 absent genes [38]. Thus, knowledge about which genes are regulated, which of course is not available for a real situation, is used in the normalization.
For PPLR, BGX, and the analysis based on MAS5 expression indexes we used a loesssubset normalization of probe set summaries as done in [17,19,38]. For the analysis based on RMA (and GCRMA) preprocessed data we used a loesssubset normalization of PM probe intensities similar to the one performed in [40]. PM probe intensities were corrected for background using the default background method of RMA (or GCRMA) and then loessnormalized using the same subset as used in [38], thus the 2535 genes spiked in at identical concentration in both groups together with the 10144 absent genes. At this point PLW, mediant, and combinedp were applied to logged PM intensities. Probe set summaries using median polish were then computed and all 10 methods using expression indexes as input were used to rank genes with respect to differential expression.
Comparison with existing methods
Using the spikein data sets listed above the proposed methods, PLW and LMW were compared with 14 existing methods for ranking genes. The 14 methods include ranking with respect to: observed fold change (FC), ordinary ttest, the moderated ttest in the Rpackage LIMMA [3], the weighted moderated ttest in the Rpackage WAME.EM [8], Efron's penalized ttest [11] and the Shrinkt method [9] in the Rpackage st, the SAM method [10] in the Rpackage samr, the Localpoolederror test [13] in the Rpackage LPE, the IntensityBased Moderated Tstatistic (IBMT) [6] using the Rcode available at http://eh3.uc.edu/r/ibmtR.R webcite, and the two probelevel methods mediant and combinedp suggested by Hess and Iyer [40].
All methods using expression indexes as input (including PLW) were applied to RMA, GCRMA and MAS5 expression indexes obtained using the Rpackage affy, while PLW, mediant, and combinedp were applied to logged PM intensities, background corrected and normalized using the default methods of RMA and GCRMA. (The empirical Bayes approach of GCRMA was used to calculate background corrected intensities, thus the fast option was set to FALSE). With LMW 4–6 splineknots (depending on the number of probesets) were used for the function ν, whereas 12 knots were used in PLW (the splineknots are set using an internal function in the Rpackage plw). Note that RMA and GCRMA were applied only to the arrays involved in each group comparison, as opposed to running RMA and GCRMA using all arrays of each data set.
We also compared with the PPLR method [19] applied to the expression index and probelevel measurement error of the multimgMOS model [18] available in the Rpackage puma, the logitt procedure implemented in the Rpackage plw according to the description in [41], and the BGX method [17] as implemented in the Rpackage bgx. The Rcode used for each of the 14 methods is available as supplementary material. [See Additional file 2]
Additional file 2. Rcode and package information. The Rcode used for each of the methods compared together with information about used Rpackages (names and versions).
Format: TXT Size: 14KB Download file
Due to long computer run times the comparison with the BGX method is restricted to the Golden Spike data set, and a subset of genes from the Gene Logic AML data set (the run time for one single analysis of 6 arrays with all 12626 probesets is more than 24 hours). The subset of size 1011 of the Gene Logic AML data set consists of probesets number 6000–7002 (excluding 6030, 6367, and 6463) together with the 11 spiked probesets and the same subset was used in [17]. The probeset numbering is as obtained when loading data into R using the Rpackage affy. Also, to avoid introducing yet another normalization of the Golden Spike data set, logitT was not applied to this data set.
For each spikein data set and combination of method and expression index/preprocessing ROCcurves were calculated. Also, for the analysis using a complete set of probesets, the area (AUC) under the ROC curve up to 25, 50, 100 and 200 false positives was computed. In the comparison with BGX using only 1011 probesets, AUC was computed up to 2, 4, 8 and 16 false positives in order to cover the same false positive range as for the complete probeset comparisons.
For the analysis based on RMA preprocessing, ROC curves for a subset of the compared methods are found in Figure 3. AUC values up to 100 false positives from the complete probeset analysis are found in Tables 1 and 2. (ROC curves for all methods and AUC up to 25, 50, 200 false positives are available as supplementary material. [See Additional file 3 and Additional file 4])
Additional file 3. ROC curves for all methods. Figures with ROC curves for all methods compared for each of the five data sets.
Format: PDF Size: 1.6MB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 4. ROC curve AUC for all methods. Tables with ROC curve AUC up to 25, 50, 100, 200 false positives for all methods compared for each of the five data sets.
Format: PDF Size: 28KB Download file
This file can be viewed with: Adobe Acrobat Reader
Figure 3. ROC curves. ROC curves for a subset of the methods compared when applied to RMA preprocessed data. The horizontal axis shows the number of false positives (FP) and the vertical axis the proportion of true positives found (TP).
Table 1. Area under ROC curves up to 100 false positives, RMA and GCRMA
Table 2. Area under ROC curves up to 100 false positives, MAS5, PPLR, BGX, and LogitT
Results for RMA preprocessing are found in the upper part of Table 1. Three of the four methods taking the variabilitytointensitylevel dependency into account (PLW, LMW and LPE) performed overall better than the other methods, with the proposed method PLW having the highest AUC on four of the five data sets. The fourth method taking the variabilitytointensitylevel dependency into account (IBMT) performed comparably well on the Affymetrix and Golden Spike data sets but less so on the two Gene Logic data sets.
Ranking genes with respect to FC performs quite well on the Affymetrix U133A and the two Gene Logic data sets but not on the other two data sets. Among the penalized and moderated ttest methods, WAME and Efront consistently perform better than the other ones. However, the difference between these methods for the two Affymetrix Latin Square and the Golden Spike data sets are small, compared to the difference in AUC obtained using the two Gene Logic data sets. Thus, the two Gene Logic data sets appear slightly different from the other three.
The results obtained using GCRMA in the lower part of Table 1 are very similar to the results with RMA shown in the upper part. For the Golden Spike data set, replacing GCRMA with RMA improves the performance of all methods but the ordering of the methods is fairly unchanged. Overall, IBMT, Efront and mediant, performs slightly better when applied to GCRMA expression indexes whereas WAME and LPE performs slightly worse. The overall ordering of the toptwo methods is unchanged.
Since MAS5 expression indexes show a very clear dependency between variability and intensity level in Figure 1, and since the variability decreases with intensity it comes as no surprise that all three methods taking this dependency into account consistently performs better than all other methods as shown in the upper part of Table 2. The LMW method has the most accurate ranking of genes in 4 out of the 5 datasets, and performs better than the IBMT method on all 5 datasets. Since the main difference between LMW and IBMT is that LMW performs a weighted analysis, and since WAME overall performs better than LIMMA, it appears as if weighted analysis should be used in preference to analysis using unweighted analysis.
The lower part of Table 2 shows results for PPLR, BGX, and logitt. For the Golden spike data set both PPLR and BGX perform comparably well, only LMW and IBMT applied on MAS5 expression indexes have higher ROC curve AUC. In the analysis of 1011 probe sets from the Gene Logic AML data set, PLW shows consistently higher true positive rate compared with BGX (Figure 3) and the AUC up to 8 false positives (scaled so that optimum is 100) is 84 and 75 for PLW and BGX, respectively. For the remaining data sets PPLR performed comparably well with other methods using MAS5 expression indexes, but less so when comparing with RMA and GCRMA preprocessed data.
The second proposed method LMW differs from existing moderated and penalized ttest in that the global variance estimator (which genespecific estimators are adjusted towards) varies with intensitylevel.
Actually this is the only difference between LMW and the WAME method. The LPE method also uses a global variance estimator that varies with intensitylevel. But opposed to using a weighted mean of the global and genespecific estimator, only the global estimator is used in the denominator of the LPE statistic. Thus for genes with similar intensitylevel, LPE is basically identical to ranking using fold change. Hence, since LMW consistently performs better than WAME, and LPE has higher AUC than fold change in four of the five data test, modeling the global variance estimator as a function of intensity is worthwhile doing.
Figure 2 shows that the false positive rate obtained by adjusting towards a global estimate that varies with intensitylevel results in a much more stable false positive rates compared to using a (truly) global estimate. It should be mentioned that as the number of arrays increases, the variability of the false positive rate across intensitylevel decreases, when adjusting towards a truly global estimate as well as when adjusting towards an intensitylevel dependent global estimate. Figure 2 is based on 3+3 arrays, and the estimated false positive rate for LIMMA varies between 1.3% and 8.3%. When repeating the same analysis using 5+5 arrays the estimated false positive rate varied between 1.8% and 7%.
With RMA (and GCRMA) preprocessed data, we do comparisons between 1) Compute probe set summaries from probe intensities and then do inference, and 2) Do inference for each PM probe and then summarize into one score. Thus, the summarization (here done using medianpolish) is considered as a part of the first approach. PLW and mediant use the second approach, whereas LMW and the ordinary ttest are the corresponding methods using the first approach. From the results presented here the second approach appears a better option. This was also shown by Hess and Iyer in [40] where they propose the mediant and combinedp method.
Approach 1 could also include a second normalization of probe set summaries. However, neither of the two approaches can be given information about which genes that are regulated without making the comparison biased. Thus, in the analysis of the Golden Spike data set, a second subsetloess normalization of probe set summaries as done in [38] can not be used when comparing approach 1 and 2. We therefore used a subsetloess normalization of PM probe intensities in a similar way as done in [40].
We have also computed results for the Golden Spike data set using subsetloess normalized MAS5 expression indexes as done by Choe et al. in [38]. They show that, for a large set of different preprocessing methods, a second loesssubset normalization of probeset summaries has a large effect (Figure 7a). They give no direct answer to whether a subsetbased probe normalization to the same extent improves the performance of the corresponding normalization using all probes. Therefore, to present comparable results only, we have separated results from these two types of subsetnormalizations using knowledge about which genes are regulated. Thus, for the Golden Spike data set, we mainly compare BGX and PPLR with the analysis based on MAS5 expression indexes, and the results from RMA and GCRMA preprocessed data is mainly compared separately.
More complicated models often come with the prize of longer computer run times. Of the methods evaluated the BGX model and the PPLR method together with the multimgMOS model are the most computer intense ones. The computer run time for one single two group analysis of 3+3 HGU95A arrays with data from 12626 genes is more than 24 hours with BGX and 1.5 hours for PPLR+multimgMOS (using the recommended EM method of PPLR) on a 2.2 GHz AMD Opteron machine. The corresponding time (including preprocessing of PM and MM data) is 2–3 minutes for PLW and 9 seconds for the moderated ttest in LIMMA.
Conclusion
We have presented two new methods for ranking genes with respect to differential expression: Probe level Locally moderated Weighted mediant (PLW) and Locally Moderated Weightedt (LMW). Both methods perform very well compared to existing methods with PLW having the most accurate ranking of regulated genes in four out of five examined spikein data sets with RMA and GCRMA processed data. With LMW we show that introducing an intensitylevel dependent scale parameter for the prior distribution of the genespecific variances improves the performance of the moderated ttest. Also, compared to the moderated tstatistic, LMW shows a much more stable false positive rate across intensitylevels when used on MAS5 expression indexes. In the PLW method inference is performed directly on logged PM intensities and the median of the resulting moderated tstatistics for each probeset is used to find differentially expressed genes. Overall the PLW method performs better than all compared methods and thus probelevel inference appears to be preferable over the standard approach using gene expression indexes for inference.
Authors' contributions
MÅ provided the initial idea, formulated the model, derived and programmed the estimation procedure, erformed the analysis of real and simulated data, and drafted the manuscript. All authors finalized and approved the final version of the manuscript.
Acknowledgements
The research was supported by the Swedish foundation for Strategic Research through GMMC and the Swedish Research Council through Stochastic Centre.
References

Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: regularized t test and statistical inferences of gene changes.
Bioinformatics 2001, 17(6):509519. PubMed Abstract  Publisher Full Text

Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments.
Stat Appl Genet Mol Biol 2004., 3(article 3) PubMed Abstract

Kristiansson E, Sjögren A, Rudemo M, Nerman O: Weighted Analysis of Paired Microarray Experiments.
Stat Appl Genet Mol Biol 2005., 4(article 30) PubMed Abstract

Kristiansson E, Sjögren A, Rudemo M, Nerman O: Quality optimised analysis of general paired microarray experiments.
Stat Appl Genet Mol Biol 2006, 5:article 10. PubMed Abstract

Sartor MA, Tomlinson CR, Wesselkamper SC, Sivaganesan S, Leikauf GD, Medvedovic M: Intensitybased hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments.
BMC Genomics 2006, 7:article 538. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sjögren A, Kristiansson E, Rudemo M, Nerman O: Weighted analysis of general microarray experiments.
BMC Bioinformatics 2007, 8:387. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Åstrand M, Mostad P, Rudemo M: Improved covariance matrix estimators for weighted analysis of microarray data.
J Comput Biol 2007, 14(10):95102. Publisher Full Text

OpgenRhein R, Strimmer K: Accurate Ranking of Differentially Expressed Genes by a DistributionFree Shrinkage Approach.

Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response.
PNAS 2001, 98(9):51165121. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Efron B, Tibshirani R, Storey JD, Tusher V: Empirical Bayes Analysis of a Microarray Experiment.
J Amer Statist Assoc 2001, 96:11511160. Publisher Full Text

Eaves IA, Wicker LS, Ghandour G, Lyons PA, Peterson LB, Todd JA, Glynne RJ: Combining Mouse Congenic Strains and Microarray Gene Expression Analyses to Study a Complex Trait: The NOD Model of Type 1 Diabetes.
Genome Res 2002, 12(2):232243. PubMed Abstract  Publisher Full Text

Jain N, Thatte J, Braciale T, Ley K, O'Connell M, Lee JK: Localpoolederror test for identifying differentially expressed genes with a small number of replicated microarrays.
Bioinformatics 2003, 19(15):19451951. PubMed Abstract  Publisher Full Text

Comander J, Natarajan S, Gimbrone M, GarciaCardena G: Improving the statistical detection of regulated genes from microarray data using intensitybased variance estimation.
BMC Genomics 2004, 5:17. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Hu J, Wright FA: Assessing Differential Gene Expression with Small Sample Sizes in Oligonucleotide Arrays Using a MeanVariance Model.
Biometrics 2007, 63:4149. PubMed Abstract  Publisher Full Text

Irizarry RA, Hobbs B, Collin F, BeazerBarclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data.
Biostatistics 2003, 4(2):249264. PubMed Abstract  Publisher Full Text

Hein AM, Richardson S, Causton HC, Ambler GK, Green PJ: BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data.
Biostatistics 2005, 6(3):349373. PubMed Abstract  Publisher Full Text

Liu X, Milo M, Lawrence ND, Rattray M: A tractable probabilistic model for Affymetrix probelevel analysis across multiple chips.
Bioinformatics 2005, 21(18):36373644. PubMed Abstract  Publisher Full Text

Liu X, Milo M, Lawrence ND, Rattray M: Probelevel measurement error improves accuracy in detecting differential gene expression.
Bioinformatics 2006, 22(17):21072113. PubMed Abstract  Publisher Full Text

Munson P: A 'consistency' test for determining the significance of gene expression changes on replicate samples and two convenient variancestabilizing transformations.
Gene Logic Workshop of Low Level Analysis of Affymetrix GeneChip Data 2001.

Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression.
Bioinformatics 2002, 18:S96104. PubMed Abstract  Publisher Full Text

Durbin BP, Hardin JS, Hawkins DM, Rocke DM: A variancestabilizing transformation for geneexpression microarray data.
Bioinformatics 2002, 18:S105S110. PubMed Abstract  Publisher Full Text

Durbin BP, Rocke DM: Estimation of transformation parameters for microarray data.
Bioinformatics 2003, 19(11):13601367. PubMed Abstract  Publisher Full Text

Geller SC, Gregg JP, Hagerman P, Rocke DM: Transformation and normalization of oligonucleotide microarray data.
Bioinformatics 2003, 19(14):18171823. PubMed Abstract  Publisher Full Text

Holder D, Raubertas RF, Pikounis VB, Svetnik V, Soper K: Statistical analysis of high density oligonucleotide arrars: a SAFER approach.
Gene Logic Workshop of Low Level Analysis of Affymetrix GeneChip Data 2001.

Rocke DM, Durbin B: Approximate variancestabilizing transformations for geneexpression microarray data.
Bioinformatics 2003, 19(8):966972. PubMed Abstract  Publisher Full Text

Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning. Volume 1. first edition. Springer; 2001.

Dempster AP, Laird NM, Rubin DB: Maximum Likelihood from Incomplete Data via the EM Algorithm.

Åstrand M: plw: An R implementation of Probe level Locally moderated Weighted mediant (PLW) and Locally Moderated Weightedt (LMW). [http://www.math.chalmers.se/~astrandm] webcite

Bioconductor Project [http://www.bioconductor.org] webcite

Gene Expression Omnibus repository [http://www.ncbi.nlm.nih.gov/geo/] webcite

Spira A, Beane J, PintoPlata V, Kadar A, Liu G, Shah V, Celli B, Brody JS: Gene Expression Profiling of Human Lung Tissue from Smokers with Severe Emphysema.
Am J Respir Cell Mol Biol 2004, 31(6):601610. PubMed Abstract  Publisher Full Text

Stearman RS, DwyerNield L, Zerbe L, Blaine SA, Chan Z, Bunn PAJ, Johnson GL, Hirsch FR, Merrick DT, Franklin WA, Baron AE, Keith RL, Nemenoff RA, Malkinson AM, Geraci MW: Analysis of Orthologous Gene Expression between Human Pulmonary Adenocarcinoma and a CarcinogenInduced Murine Model.
Am J Pathol 2005, 167(6):17631775. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Affymetrix U95 and 133A Latin Square spikein data sets [http://www.Affymetrix.com/support/datasets.affx] webcite

Cope LM, Irizarry RA, Jaffee HA, Wu Z, Speed TP: A benchmark for Affymetrix GeneChip expression measures.
Bioinformatics 2004, 20(3):323331. PubMed Abstract  Publisher Full Text

Gene Logic spikein data sets [http://www.genelogic.com/newsroom/studies/] webcite

Choe S, Boutros M, Michelson A, Church G, Halfon M: Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset.
Genome Biology 2005, 6(2):R16. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Gaile D, Miecznikowski J: Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent.
BMC Genomics 2007, 8:105. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Hess A, Iyer H: Fisher's combined pvalue for detecting differentially expressed genes using Affymetrix expression arrays.
BMC Genomics 2007, 8:96. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Lemon W, Liyanarachchi S, You M: A high performance test of differential gene expression for oligonucleotide arrays.
Genome Biology 2003, 4(10):R67. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text