Determining whether a gene is differentially expressed in two different samples remains an important statistical problem. Prior work in this area has featured the use of t-tests with pooled estimates of the sample variance based on similarly expressed genes. These methods do not display consistent behavior across the entire range of pooling and can be biased when the prior hyperparameters are specified heuristically.
A two-sample Bayesian t-test is proposed for use in determining whether a gene is differentially expressed in two different samples. The test method is an extension of earlier work that made use of point estimates for the variance. The method proposed here explicitly calculates in analytic form the marginal distribution for the difference in the mean expression of two samples, obviating the need for point estimates of the variance without recourse to posterior simulation. The prior distribution involves a single hyperparameter that can be calculated in a statistically rigorous manner, making clear the connection between the prior degrees of freedom and prior variance.
The test is easy to understand and implement and application to both real and simulated data shows that the method has equal or greater power compared to the previous method and demonstrates consistent Type I error rates. The test is generally applicable outside the microarray field to any situation where prior information about the variance is available and is not limited to cases where estimates of the variance are based on many similar observations.
Determining whether a gene is differentially expressed under different conditions is an important statistical problem [1-3]. Consider, for example, a common microarray experiment: for a particular gene Y, one has an unordered set of measurements of log-expression levels in a control situation and in a treatment situation (or any non-control situation) (notation similar to ). The question of interest is whether any expression difference is a significant difference, or if it would be expected under the null hypothesis of no actual difference in expression.
Though multifactorial experimental designs are becoming increasingly popular [5-7], there continue to be experimentalists interested in analyzing two-sample designs. There are many approaches to determining whether genes are differentially expressed in such designs and there are a number of excellent reviews of the various methods [2,3]. Many of the approaches include the use of a t-test [8-14], which is a common frequentist statistical approach to comparing a difference in sample means. Some have pointed out that the normality assumption used in t-tests may not always hold and other statistical techniques may be warranted [15,16]. Even for those measurements where normality holds the t-test is difficult to apply because the number of replicates n1 and n2 are often quite small (typically, n1 = n2 = 1 to 3), leading to uncertainty in the sample variance and relatively low power. An early approach to dealing with this problem was based on the addition of a constant value to the estimate of the standard deviation [16,17]. While the approach does tend to regularize the variance estimates, it is ad hoc and is not expected to exhibit statistically consistent behavior (unknown or incorrect Type I error rate under the null).
More rigorous Bayesian methods have been developed that incorporate prior information about the variance but require Markov chain sampling of the posterior density or numerical integration of the cumulative distribution function [18-21]. A very popular work by Baldi and Long  avoids such calculations while using a statistically justified regularization technique via construction of a probabilistic Bayesian framework that applies a prior probability to parameters of interest such as the expression means and variances. An analytically tractable solution can then be obtained by taking a point estimate for the variance. The chief advantages of this Bayesian framework are twofold: it allows the use of a regularized t-test to determine whether a difference in expression is significant, and it provides a natural method for incorporating information from putatively related measurements. It is this basic framework that we seek to extend in the work presented here.
As noted by Baldi and Long , the use of a point estimate for the variance is a compromise between a rigorous Bayesian approach and tractable calculation. As demonstrated below, this compromise can lead to bias in the specificity of the test and improper behavior when there are few prior degrees of freedom. Our hope here is to correct this and other issues (both theoretical and practical) towards the goal of improving the performance and understanding of microarray and general laboratory data analysis. To address these issues, we extend the basic framework put forth earlier, taking the model to a fully Bayesian approach by explicitly calculating the marginal posterior distribution for the difference in mean expression levels. This analytically tractable solution does not require simulation from the posterior distribution and obviates the need to use point estimates of the prior variance while overcoming the undesirable correlation between specificity and prior degrees of freedom. Moreover, a clear and statistically rigorous connection between the prior variance and the prior degrees of freedom is established, reducing the number of hyperparameters from two to one while yielding consistent Type I error rates in the process. It should be stressed that achieving consistent Type I error rates is not just a conceptual luxury. Indeed proper determination of important quantities such as the false discovery rate are of great economical significance and rely on having correct Type I error rates .
Bayesian probability model
The original model proposed by Baldi and Long  is briefly reviewed here. The likelihood of the observed data, y, given the true expression level, μ, and variance, σ2, for a single gene is assumed to follow a normal distribution:
where n is the number of measurements in one sample. For the priors on μ and σ2 the authors attempt to capture the a priori dependence between μ and σ2 by factoring the joint prior as p(μ,σ2) = p(μ|σ2)p(σ2) and taking each factor as
p(μ|σ2) = N(μ;μ0,σ2/λ0) (2)
where the prior probability of μ follows a normal distribution and σ2 follows a scaled inverse gamma distribution. The hyperparameters μ0 and λ0 represent the prior beliefs regarding the value of μ and the confidence associated with that belief, while the hyperparameters and ν0 represent the prior beliefs regarding the value of σ2 and the degrees of freedom or confidence associated with that belief. The authors subsequently let λ0 → 0 later in their derivation, leading to a diffuse, noninformative prior on μ:
λn = λ0 + n (7)
νn = ν0 + n (8)
This estimate is then used in the hypothesis testing procedures implemented in their Cyber-T software. Though this model is developed using a Bayesian framework to make a point estimate of the variance, the hypothesis testing is carried out using a classical frequentist t-test.
In the standard t-test, the variance is still defined when there are no prior degrees of freedom, but inspection of Eq. (10) shows that when the number of prior degrees of freedom ν0 < 2 - n, the variance is undefined. This limiting behavior stems from the use of a point estimate for the variance. The total number of prior degrees of freedom used in the Cyber-T software is actually 2ν0 as it counts ν0 prior degrees of freedom for each sample in the two-sample test. However, for consistency and clarity we hereafter refer to the total number of prior degrees of freedom as ν0. Therefore the total number of degrees of freedom used in the Cyber-T software for the two-sample hypothesis test then appears to be νt-test = ν0 + n1 + n2 - 4, where n1 is the number of experimental replicates of the gene in the control regime and n2 is the number of replicates of the gene in the treatment regime. This equation for the total degrees of freedom also demonstrates undesired limiting behavior when ν0 = 0, since positive degrees of freedom exist only when n1 + n2 > 4. This result contradicts what we would expect to see from a classic two-sample t-test with no prior estimate of the variance, i.e. a classical two-sample t-test has degrees of freedom n1 + n2 - 2 and positive degrees of freedom exist when n1 + n2 > 2.
Full two-sample model
Instead of using a point estimate for the variance to conduct a frequentist t-test, we can recast the problem to directly infer the posterior distribution for a two-sample case:
p(μ,Δμ,σ2|y) ∝ p(μ,Δμ,σ2)p(y|μ,Δμ,σ2) (11)
where μ is the mean expression level of the gene in the control regime, μ +Δμ the mean of the gene in the alternate experimental regime, and σ2 the variance associated with the measurements. In contrast to the previous, point estimate model, in this formulation there is no explicit prior dependence between σ2 and μ. While this is admittedly a simplification, this independence is not only a practical requirement for the full Bayesian integration, we also believe it is justified on several other grounds. First, while it has been widely observed that genes with different expression levels often have different variances , it is not clear how the prior used by Baldi and Long  to derive the point estimate of variance could capture the dependencies observed with actual microarray data. Their figures show that the logarithm of the mean expression level tends to decrease with increasing variance, yet the prior given by Eq. (2) asserts that the prior probability of the mean should be more diffuse – not necessarily smaller – with increasing variance. The observed trend between mean and variance cannot be captured by such a prior. Second, their method does include an implicit dependence between mean and variance by using other gene expression levels to determine the priors; indeed, this is what lends the method its power. Because the Bayesian formulation proposed here determines the prior hyperparameters in a similar fashion, it retains this implicit dependence. One may also consider a number of variance stabilizing transforms that may be applied in order to achieve constant variance across the expression spectrum [23,24]. Finally, as shown below, simulations reveal that the method still retains power even when the true variance is allowed to vary over a substantial range.
Returning to the posterior distribution, we assume that the samples from each experimental regime follow a normal distribution, each with equal variances and (possibly) different means.
yi ~ N(μ,σ2) (12)
yj ~ N(μ + Δμ,σ2) (13)
This leads to the following posterior distributionp
where n1 and n2 are the number of measurements in each sample. We elect not to make use of the prior on μ used in the Baldi work, Eq. (2), or its limiting form, Eq. (4). Instead, we assume a completely flat prior on both μ and Δμ while retaining the prior for σ2 given by Eq. (3). Similar to the earlier work, the observed dependence between μ and σ2 is established by calculating based on the variance of similarly expressed genes. However, in this formulation, instead of taking the average standard deviation of similarly expressed genes we choose to estimate the prior variance by totaling the sum-squared differences for each similar gene and dividing by the total prior degrees of freedom. This is a more statistically rigorous way of incorporating prior information and leads to more a more consistent test as will be discussed below.
Marginal posterior for Δμ
With the above definitions and assumptions in place, it can be shown (see Appendix) that the marginal posterior distribution follows a t distribution:
νn = n1 + n2 + ν0 - 2 (16)
The sufficient statistics are , and the sum squared differences (n1 - 1) and (n2 - 1), where and . This distribution has the feature that the standard t-test is obtained even when there are no prior degrees of freedom, ν0 = 0, in which case the observed variance for each gene dominates the posterior density. A hypothesis test is performed by asserting a null hypothesis that the true difference in expression levels is zero, i.e. Δμ = 0 (though other values for the true difference, Δμ, can be used when attempting to identify genes that are differentially expressed by some threshold degree). When the posterior probability of no differential expression approaches zero, Pr(Δμ = 0|y) → 0, the null is rejected. It is worth noting that while the posterior probability distribution for no differential expression, Pr(Δμ = 0|y), follows the same frequentist distribution for the data, Pr(y|Δμ = 0) , the resulting probabilities have different interpretations. In the Bayesian approach the posterior probability is interpreted as the probability that there is no difference in expression levels. In the frequentist approach the probability is interpreted as the probability of observing such a difference from the null distribution of no differential expression given chance alone. When Δμ = 0 the Bayesian and frequentist methods give the same numerical results and one can ignore the interpretational differences when discussing such things as the false positive rate under the null. However, power analysis calculations (Δμ ≠ 0) are best conducted under a frequentist framework using a non-central t-distribution .
Estimating the prior variance
For m similarly expressed genes each having n replicates, the prior degrees of freedom for the variance can be calculated as
ν0 = m(n - 1) (18)
The prior variance is then calculated as the total sum-square differences for each sample of similarly expressed genes divided by the prior degrees of freedom, namely
where is the mean response of gene k and yk,i is the response i of gene k. In general the prior degrees of freedom and variance should be determined using these equations and not chosen separately from one another. While it is tempting to think of the prior variance as separate from the precision or credibility associated with that estimate (as represented by ν0), a consistent hypothesis test requires that they be considered together based on the actual prior data collected.
To assess the performance of the new test, a series of simulations were done to measure the false positive rate and statistical power under various conditions. The simulations consisted of the following steps:
1) Draw random samples from two normal distributions of size n1 and n2 distributed as N(μ1,) and N(μ2,) respectively. This step represents the selection of two different experimental regimes for a gene of interest in order to test for differential expression. For power testing, the standardized effect (μ1 - μ2)/σtest = 2 is used to generate the data by setting μ1 = 2, μ2 = 0, and σtest = 1. For false positive rate testing μ1 = 0.
2) Draw an estimate of the prior variance from the following Chi-square distribution: . This step simulates the task of estimating the prior variance from similarly expressed genes. It is mathematically equivalent to calculating the sum-squared differences of similarly expressed genes and dividing by their prior degrees of freedom, but it is computationally more efficient.
3) Perform hypothesis tests using a) the method used by Baldi and Long  based on the mean posterior point estimate for the variance, PE(σ2) or just PE, and b) the new test proposed here based on the marginal posterior for the difference in expression level, MP(Δμ) or just MP. The hypothesis tests using each method are done at the α = 0.05 level using a two-sided, two-sample t-test. Test p-values below this level are counted as significant. To make valid comparisons, both methods used the same number of prior degrees of freedom.
4) Steps 1 to 3 above are repeated 10,000 times.
Results of the power simulations for a number of sample sizes and prior degrees of freedom are shown in Table 1. Because the PE test cannot be used when ν0 + n1 + n2 ≤ 4, the test has no power and no false positive rate when n1 = n2 = 2 and ν0 = 0. Conversely, the MP test results in a classical t-test in this case and has a power of 0.2169 and a false positive rate of 0.0509, which is near the expected value given the α = 0.05 level of the test. Holding n1 = n2 = 2 but increasing the number of prior degrees of freedom to ν0 = 2 (equal to 2 other similarly expressed genes used to determine the prior variance) the PE test can now be used but has a very low power of 0.0450, compared to the MP power of 0.3343. Overall, when the estimate of the variance is balanced between the prior and observed variances (i.e. the prior number of degrees of freedom is not large compared to the number of replicates in the test samples) the MP test is significantly more powerful. The difference in power between the two methods is less pronounced when ν0 = 16 (the default value used in the Cyber-T software when n1 = n2 = 2). However, one should note that such a relatively high number for the prior degrees of freedom represents a strong degree of belief in the prior estimate of the variance relative to the variance in the genes of interest.
Table 1. Comparison of false positive rate and statistical power for fixed α. For a given significance level (α) the observed false positive rate (FP) and power were determined by simulation using either a point estimate for the variance (PE) or the marginal posterior distribution for Δμ (MP). Values are given for different sample sizes (n1 = n2 = n) and prior degrees of freedom (ν0).
These simulations reveal that the observed false positive rate for the PE test is substantially out of line with the nominal value of 0.05 for small sample replicates (n1 = n2 ≤ 3) and few prior degrees of freedom (ν0 ≤ 16). Conversely, the MP test demonstrates consistent behavior for the observed false positive rate. However, it is possible to obtain the desired false positive rate for the PE method by iterating through values of α. Table 2 shows that, by adjusting the critical α value for the PE method until the observed false positive rate matches that of the MP method, both tests have similar power. The disadvantage is that α no longer represents the expected Type I error rate, and simulation or calibration is required whenever a new test is performed. Conversely, with the MP test, the critical value α represents an unbiased estimate of the Type I error rate.
Table 2. Comparison of α and statistical power for iterated false positive rate. The values are the same as those given in Table 1 except the significance level (α) was iterated for the PE method until its power matched that of the MP method where α was held constant.
It is interesting to look for a simple correction to the degrees of freedom in order to make the two methods match each other. Unfortunately, there does not appear to be any simple correction available. The degrees of freedom used in the t-test itself would be easy to modify – though ad hoc, one could change the last term in the expression for the degrees of freedom used in the PE test (νt-test = ν0 + n1 + n2 - 4) from -4 to -2 by simply adding two to ν0. Unfortunately, the point estimate of the variance given by Eq. (10) is more problematic as it appears to give an increased estimate of the variance compared to the one used in the MP test, resulting in a decrease in both the t-statistic and the statistical power.
In the Cyber-T software  the prior variance is estimated by averaging the standard deviation over 100 similarly expressed genes. This has two problems associated with it. First, the measured standard deviation is a biased estimate and tends to be smaller than the true value for small sample sizes like n = 2 or 3, the average of these measured standard deviations will also be biased downward and in general , where E(·) represents the expected value of a random variable. The correct way to estimate the variance based on prior observations is to calculate the total sum square differences for all the prior samples and divide by the total degrees of freedom, as shown in Eq. (19). Second, if similarly expressed genes for each of the two samples are used to estimate the prior variance and they have the same number of replicates, n, as the samples themselves, then estimating the standard deviation from m = 100 similarly expressed genes is tantamount to asserting ν0 = 100·(n - 1) prior degrees of freedom. Setting ν0 to less than the actual number of prior degrees of freedom used to calculate the prior variance results in a lower actual false positive rate than the rejection level α. For consistency if we fix the number of prior degrees of freedom that are used in the test, then we must choose a corresponding number of similarly expressed genes that together have the same number of prior degrees of freedom. This dependence between ν0 and m is more consistent with the formulation of the prior itself and leads to consistent Type I error rates. When we calculate the prior variance in the statistically rigorous manner (by taking the total sum square differences of prior observations and dividing by the total prior degrees of freedom) the PE method becomes overly conservative, generating false positives with a rate of 0.0044 at n1 = n2 = 2, α = 0.05, and ν0 = 2, this appears to be the result of using a point estimate for the prior variance that is larger than the expected value given similar genes, see Eq. (10).
As mentioned previously, to perform the full Bayesian integration required by the MP test, the assumption of a constant variance is required. To test the effect of violating this assumption, simulations were performed as described above, but in step 2 the variances for each sample were drawn from a uniform distribution in the range [0.05,1]. The same uniform distribution was used to generate prior variances by calculating the total sum-squared differences of m = ν0/(n - 1) different draws (corresponding to m different genes) for the variance and dividing by the prior degrees of freedom, ν0. The results for both the PE and MP methods are shown in Table 3, where the observed false positive rate for the MP method is used to calibrate the α value for the PE test so as to achieve nearly the same false positive rate for each method. Though both methods are seen to have nearly the same power for a given false positive rate, the PE method again shows large deviations between α and the observed false positive rate. However, the MP method shows more consistency between the observed and theoretical rates and is fairly robust to violations of the constant variance assumption.
Table 3. Comparison of α and statistical power for iterated false positive rate and variable variance. The values are the same as those given in Table 1 and Table 2 except the variance for each sample was drawn from a uniform distribution between 0.05 and 1 during the simulation. The significance level (α) for the PE method was iterated so that the observed false positive rate matched that of the MP method where α was held constant.
To test whether these statistical improvements in the method translate to improvements in experimental data analysis, both methods were evaluated with the same microarray data  used by Baldi and Long . The Arfin data matrix is composed of 8 columns by 1973 rows that contain gene expression levels used to study IHF regulated genes in Escherichia coli. Columns 1 to 4 correspond to 4 IHF+ strains while columns 5 to 8 correspond to 4 IHF- strains. The 1973 rows correspond to the measured gene expression levels for each strain. There are actually more rows in the original data set but to be consistent with the earlier work, we only chose rows that had measurable expression levels for all 8 runs. As was done in the Long paper the 4 control and 4 treatment replicates were partitioned into 12 subsets that differ by at least two replicates so that quasi-independent pairwise comparisons could be made (i.e. 12v56, 12v78, 13v57, 13v68, 14v58, 14v67, 23v58, 23v67, 24v57, 24v68, 34v56, and 34v78). For the experimental evaluation we adopted the same strategy for pooling variances used by Baldi and Long. We sorted all the genes within each sample by the mean expression level and computed the prior variance by considering a window of the m/2 most similarly expressed genes around each gene of interest. For the two samples this corresponds to a total of m similarly expressed genes and Eq. (18) gives the total prior degrees of freedom, ν0 = m.
To measure the consistency of a given testing method the number of common genes discovered between two quasi-independent subsets can be recorded. Genes that are discovered to be in common between subsets are more likely to be truly differentially expressed, thus higher a commonality percentage (genes discovered in common divided by the total number of genes discovered) is an indication that the testing method is more reliable. The number of common genes discovered can be recorded for the 66 possible ways of comparing the 12 subsets. Based on p-value rankings, the MP method achieved similar results (within sampling error) to those published using the Cyber-T software. The single hyperparameter (ν0) was optimized for the MP method and a 95% confidence interval of [62.4, 67.8] common genes were identified out of the top 120. This compares well with the published results using Cyber-T where optimizing two parameters independently (ν0 and m) achieved an average common gene count of 67. These results represent a substantial improvement over the 33 common genes identified using a zero parameter classical t-test.
Because the p-value rankings by themselves are not actually statistical tests, a more stringent approach to assessing the performance of each method was accomplished by identifying genes that are differentially expressed at some confidence level α. For the following tests we used a p-value of 0.01 for the null hypothesis cutoff (uncorrected for multiple tests). The results are presented in Table 4. In general, for a given number of prior degrees of freedom, the MP method consistently detects more genes as differentially expressed than does the PE method and more of these genes are commonly discovered when comparing them with quasi-independent subsets. In addition the fraction of common genes within those detected is generally higher for the MP method. The difference between the methods tends to decrease as the number of prior degrees of freedom is increased. The optimal performance (in terms of % commonality) is achieved at around ν0 = 400, which corresponds to basing the local variance on about 10% of the 1972 similarly expressed genes within each sample (there are 3944 similarly expressed genes when considering both samples). As ν0 increases past 400 the commonality percentage drops somewhat. Basing the variance on the combined estimated of all 3944 similar genes results in a difference-in-means test, and the drop in performance is similar to that seen by Long and Baldi when a simple difference in means was used to rank the expression differences .
Table 4. Experimental comparison of hypothesis testing methods. The microarray data consists of 1973 genes partitioned into 12 quasi independent n1 = 2 and n2 = 2 subsets using replicate experiments of 4 control and 4 treatment runs [26, 27]. The number of genes detected as differentially expressed using each method is listed for an α = 0.01 significance level (there was no correction for multiple testing) and different prior degrees of freedom (ν0). The average number of common genes for the 66 unique comparisons of the 12 subsets is also listed along with the % of commonality (common/detected) for each test.
It is important to stress that the above results for the PE method are not what one would expect using the implementation found in the Cyber-T software . As mentioned before, this is because the existing software incorrectly calculates the pooled variance by averaging the standard deviation of similarly expressed genes. This results in a substantial error for example, when n1 = n2 = 2 and ν0 = 400, the Type I error rate is approximately 0.125 (2.5 times higher than that predicted by the nominal rate α = 0.05). The authors of Cyber-T recommend smaller values of ν0 where the overestimate of variance given by their point estimate, Eq. (10), is balanced by the underestimate of variance owing to incorrect pooling, thereby achieving (approximately) correct false positive rates by offsetting two competing biases.
Discussion and conclusion
The new marginal posterior (MP) formulation proposed here is uniformly at least as powerful as the previous point estimate (PE) method which it is based on and more powerful when the number of similarly expressed genes (and therefore prior degrees of freedom ν0) is small. Moreover, unlike the existing regularization methods, the new formulation consistently maintains the proper false positive rates under the null hypothesis across the entire range of pooling. The MP formulation also makes clear the dependence between the prior degrees of freedom and prior variance and thus methods to estimate the single hyperparameter ν0 are expected to be more robust and interpretable.
As mentioned previously, the key to the power of these Bayesian methods is the use of similarly-expressed genes to estimate the expected variance. Naturally, the more similar the variance in expression of the other genes is to the gene of interest, the greater the statistical consistency of the method. In generating the simulated data we have attempted to remain agnostic on the definition of "similarly-expressed", using variances that are either constant or drawn from a uniform distribution. Another Bayesian method  utilizes a clustering technique based on a series of t-statistics to group genes into prior categories in order to pool their prior variances. Because the Bayesian posterior probabilities used by that method are not interpretable as frequentist Type I error rates, it is difficult to set the desired false positive rate using such a method without resorting to simulation and iteration. Other clustering strategies include those based on non-parametric regression to obtain local variance estimates  and mixture models that group together similar variances  using maximum likelihood. These clustering strategies assume the variance is known based on the large number of observations in each group. This allows normal distributions to be used in deriving test statistics. Unfortunately, when the number of similarly expressed genes is small the use of a normal distribution is not justified as it assumes there is no uncertainty in the variance. The advantage of the MP formulation is that the normal approximation for the test statistic is not required – it is statistically robust across the entire range of pooling or clustering strategies (assuming the strategies work properly to group together genes of similar variance). In our experimental application of the method we have adopted a pooling strategy based on grouping together genes with similar expression levels. Thus the MP formulation can be used in situations where two or more control replicates can be compared with only single observations under treatment conditions; this is not possible with clustering strategies that require at least two replicates in each sample .
Though the MP test was developed in a Bayesian framework, there is a strong analogy with frequentist statistics. One person's prior is another person's data; if we treat the prior estimates of variance as just additional data collection then our hypothesis test using the t distribution is the same one as that used in any introductory statistics course. In either situation, one considers the estimate of variance as the total sum-of-squared differences for all observations of the variance divided by the total degrees of freedom. By "all observations" we mean both the two samples involved in the hypothesis test plus some number of replicates from other genes we believe to be similarly expressed and are good representatives of the variance. This is just like estimating the method standard deviation and conducting a normal t-test with that estimate along with the additional degrees of freedom that come with it. All of the power analysis calculations can then be done without recourse to simulations; for example, built-in functions in the desktop statistical package R (using the non-central t distribution) gave the same results we present by simulation for the MP method. It should be stressed that this Bayesian hypothesis test is not limited in any way to the analysis of microarray data. It can be applied to any experimental situation where prior beliefs about the variance associated with small sample sizes are used to achieve higher statistical power while maintaining consistent Type I error rates. This is particularly important when only a few similar observations of the variance are available, for example when only a handful of genes are under study across a range of treatment conditions.
The method is implemented in R and is freely available for download  or by contacting the corresponding author.
RJF conceived of and derived the statistical formulation, applied the method to various data sets and drafted the manuscript. MWD contributed major revisions to the manuscript and suggested ways of improving the study. Both authors contributed to the final version of the manuscript and approved it.
Determining the marginal posterior distribution of Δμ
The posterior is the product of the prior and the likelihood:
p(μ,Δμ,σ2|y) ∝ p(μ,Δμ,σ2)p(y|μ,Δμ,σ2) (20)
The likelihood consists of the product of two normal distributions:
yi ~ N(μ,σ2) (21)
yj ~ N(μ + Δμ,σ2) (22)
Combining the joint likelihood into a common exponential gives:
Next we can define a variable that that only contains terms for the expression levels:
Expanding the square leads to the following:
Next, we look to complete the square by adding and subtracting the same value k:
Now collapsing back to a single squared term in the exponent and defining a new variable, Cσ, we have
Cμ = (n1 + n2)(μ - k)2 + Cσ (33)
The posterior density can then be written as
We can now look to integrate out the true expression level, μ:
We have flat priors on μ and Δμ, allowing us to complete the integration over all μ.
p(μ,Δμ,σ2) ∝ p(σ2) (37)
The posterior for the joint density of Δμ and σ2 is then
We next look to integrate out σ2 by defining the sufficient statistics associated with the variance:
The last term in Eq. (43) CΔμ, now contains all references to the null difference in means, Δμ. The term can be expanded by multiplying out k2 and then simplified:
This can be inserted back into Eq. (43) to give
The prior for variance is said to follow a scaled inverse Chi-square distribution:
The posterior density is then
But now we can integrate out σ2
With a change of variables
The posterior density then becomes the well known t distribution.
νn = n1 + n2 + ν0 - 2 (55)
We would like to thank Magda Bartilson and Stephen DelCardayre for giving us the inspiration to develop the method. We also thank John Grate, Lori Giver and John Whitman for their generous support and feedback and for reviewing various versions of the manuscript.
J Biomed Optics 1997, 2(4):364-374. Publisher Full Text
Genome Biology 2003, 4(4):210.0-210.1. BioMed Central Full Text
Vinciotti V, Khanin R, xD'Alimonte R, Liu X, Cattini N, Hotchkiss G, G. B, de Jesus O, Rasaiyaah J, Smith CP, Kellam P, Wit E: An experimental evaluation of a loop versus a reference design for two-channel microarrays.
Orian A, van Steensel B, Delrow J, Bussemaker HJ, Li L, Sawado T, Williams E, Loo LW, Cowley SM, Yost C, Pierce S, B.A. E, Parkhurst SM, Eisenman RN: Genomic binding by the Drosophila Myc, Max, Mad/Mnt transcription factor network.
J Comp Biol 2001, 8(1):37-52. Publisher Full Text
Proc Natl Acad Sci U S A 2000, 98(9):5116-5121. Publisher Full Text
J Comp Biol 2001, 8(6):585-614. Publisher Full Text