Abstract
Background
Using covariance or mean estimates from previous data introduces randomness into each power value in a power curve. Creating confidence intervals about the power estimates improves study planning by allowing scientists to account for the uncertainty in the power estimates. Driving examples arise in many imaging applications.
Methods
We use both analytical and Monte Carlo simulation methods. Our analytical derivations apply to power for tests with the univariate approach to repeated measures (UNIREP). Approximate confidence intervals and regions for power based on an estimated covariance matrix and fixed means are described. Extensive simulations are used to examine the properties of the approximations.
Results
Closedform expressions are given for approximate power and confidence intervals and regions. Monte Carlo simulations support the accuracy of the approximations for practical ranges of sample size, rank of the design matrix, error degrees of freedom, and the amount of deviation from sphericity. The new methods provide accurate coverage probabilities for all four UNIREP tests, even for small sample sizes. Accuracy is higher for higher power values than for lower power values, making the methods especially useful in practical research conditions. The new techniques allow the plotting of power confidence regions around an estimated power curve, an approach that has been well received by researchers. Free software makes the new methods readily available.
Conclusions
The new techniques allow a convenient way to account for the uncertainty of using an estimated covariance matrix in choosing a sample size for a repeated measures ANOVA design. Medical imaging and many other types of healthcare research often use repeated measures ANOVA.
Keywords:
Sample size; Replication study; Study planning; Univariate approach; UNIREPBackground
Motivation
Computing power for a linear model involving repeated measures requires specifying a set of means and a covariance matrix. Scientists usually feel comfortable specifying a pattern of means that corresponds to a difference of clinical or scientific importance. However, specifying plausible variance and covariance values usually requires estimates from a previous study.
Using data from a previous study to estimate the covariance matrix makes the power value a random variable. Kraemer, et al. [1] noted that if the estimated variance is too small, i.e., when the pilot study is overly favorable, power will be overestimated. Conversely, if the estimated variance is too large (pilot study overly conservative), power will be underestimated. Maxwell [2] conducted simulations to illustrate the amount of bias that can occur. Taylor and Muller [3] and Muller and Pasour [4] derived exact distributions of noncentrality and power in univariate linear models based on all combinations of estimated variance and means. The results account for power computed conditional on a previous result being significant, or conditional on a previous result being nonsignificant. The former creates optimistic bias, while the latter creates pessimistic bias.
Providing confidence intervals to account for the uncertainty inherent in the random power values would be useful for study planning. For example, a lower bound for power would allow stating that a test has power of at least “P” to detect an effect, with a specified confidence. A confidence region for a power curve would be even more informative.
Medical imaging research motivated the work here because it often generates the type of complete data that can be handled with the univariate approach to repeated measures (UNIREP). Muller, et al. [5] reviewed the advantages gained by being able to use the UNIREP model, a special case of the general linear mixed model. The same authors described accurate and convenient power approximations for UNIREP analysis. The four UNIREP tests, Box conservative, GeisserGreenhouse, HuynhFeldt, and the uncorrected, all use the same test statistic. For data analysis, UNIREP tests differ only by their respective degrees of freedom due to different degrees of freedom multipliers, which measure sphericity in the error covariance for the hypothesis variables. Muller and Stewart [6] provided detailed discussion of the basic theory for both the null and nonnull cases. Earlier work detailed basic UNIREP theory. Box [7,8], Geisser and Greenhouse [9,10] and Huynh and Feldt [11] gave null results. Davies [12] and Muller and Barton [13,14] treated the nonnull case.
Browne [15] evaluated the impact of using a pilot study to estimate the variance for a ttest. More generally, Taylor and Muller [16] demonstrated how to construct exact power confidence intervals for the general linear univariate model for a dataestimated variance and fixed means. The same authors also generalized the result to provide an exact confidence region around a power curve. Parallel results for the UNIREP setting would be equally useful. We generalize the methods in Taylor and Muller [16] to UNIREP tests for repeated measures. We use analytic and simulation results to demonstrate that the techniques allow computing approximate confidence intervals and regions for power with good accuracy for the UNIREP tests, based on an estimated covariance matrix and fixed means.
Existing results
A vector z, (n×1), is lower case bold. A matrix, z, is upper case bold with transpose Z^{′}, inverse Z^{−1} and generalized inverse Z^{−}. Also, 1_{n} is an (n×1) vector of 1’s and I_{n} is an (n×n) identity matrix. A diagonal matrix with (i,i) element z_{i} is written Dg (z). The expected value, variance, and trace are E(Z), , and tr(z), respectively. Throughout, Z∼χ^{2}(ν,ω) indicates that Z has a noncentral chisquare distribution with ν degrees of freedom and noncentrality ω, while Z∼χ^{2}(ν) indicates a central distribution. Similarly, Z∼F(ν_{1},ν_{2},ω) indicates X has a noncentral F distribution with ν_{1} numerator and ν_{2} denominator degrees of freedom, and noncentrality ω with cumulative distribution function F_{F}(ν_{1},ν_{2},ω). A central F is written Z∼F(ν_{1},ν_{2}) with quantile q indicated . Writing indicates z (p×1) is Gaussian with mean μ and covariance Σ (p×p). If z (N×p) has independent rows and , then indicates S follows a Wishart distribution with N degrees of freedom, covariance Σ, and noncentrality Ω=E(Z^{′})E(Z)Σ^{−1}.
The general linear multivariate model,
assumes N independent rows and . In the model, X is the fixed, known design matrix with 1≤rank(X)≤q, and B contains fixed, unknown regression coefficients. For repeated measures ANOVA, onegroup designs have rank (X)=1, and twogroup comparisons have rank (X)=2. The associated general linear hypothesis is
suchthat Cdefinesthebetweensubject effects(rank a) while U defines the withinsubject effects (rank b). Requiring estimable Θ and full rank {C,U} ensure a testable hypothesis. Appropriate selections of the contrast matrices (C and U) and null matrix (Θ_{0}) allows testing important onedegreeoffreedom parameters, such as the difference between two means, or a comparison of two trends.
For M = C(X^{′}X)^{−}C^{′}, unscaled noncentrality is Δ = (Θ−Θ_{0})^{′}M^{−1}(Θ−Θ_{0}), scaled noncentrality is . Here Σ_{∗}=U^{′}ΣU=ΥDg (λ)Υ^{′} is the covariance matrix among the hypothesis variables, with ΥΥ^{′}=Υ^{′}Υ=I_{b}, and λ={λ_{k}} the eigenvalues. Estimates are and , with ν_{e}=N−rank(X), the error degrees of freedom. Furthermore, , and , with . The sum of squares hypothesis matrix is and the sum of squares error matrix is , which are independent of one another. The notation follows that in Muller and Stewart [6]. Additional notation is in Appendix A: Additional notation.
The univariate approach to repeated measures can be expressed in terms of the general linear multivariate model. The Box conservative (Box), the GeisserGreenhouse (GG), the HuynhFeldt (HF), and the uncorrected (Un) UNIREP tests use the same test statistic,
and a central F distribution to approximate the null distribution of T_{u},
The sphericity parameter, ∊=tr^{2}(Σ_{∗})/[btr(Σ∗2)], quantifies the spread of population eigenvalues and is used to discount the degrees of freedom. The term sphericity reflects the fact that uncorrelated Gaussian variables with equal variances in three dimensions have a spherical scattergram. The eigenvalues of Σ_{∗} are the variances of the (uncorrelated) principal components of the hypothesis response variables. Perfect sphericity requires ∊ = 1, which occurs with all eigenvalues equal. Minimal sphericity has ∊=1/b, which occurs with one nonzero eigenvalue. Other patterns of Σ_{∗} have 1/b<∊<1.
The Box conservative test uses the fixed, lower bound of ∊, while the uncorrected test uses the fixed, upper bound of ∊. With sphericity (∊=1), the uncorrected test is exact and uniformly most powerful (among similarly invariant tests). The GeisserGreenhouse and HuynhFeldt tests use the observed data to estimate ∊. The GeisserGreenhouse estimator, , is the maximum likelihood (ML) estimator. The HuynhFeldt estimator, was proposed as the ratio of two unbiased estimators. Their claim holds only for the special case of rank (X)=1. Lecoutre [17] provided a more general form. In turn, Gribbin [18] and Chi et al. [19] described a rankadjusted approximately unbiased estimator, , which applies to any general linear multivariate model. The rankadjusted power approximation was shown through simulations to approximate observed mean power values as well as, or better than, the HuynhFeldt power approximation (Chi et al. [19]). Only the rankadjusted HuynhFeldt estimator will be considered in the remainder of the paper.
Although the four UNIREP tests all use the same test statistic, they each use a different measure of sphericity, here indicated e. For data analysis, all four tests use a critical value . Here ν_{1}=ab and ν_{2}=bν_{e}. The Box test uses e=1/b, the GG test uses , HF uses , and the uncorrected test uses e=1. The pvalue is then computed, for observed test statistic t, as p=1−F(t,ν_{1}e,ν_{2}e). In all cases . In turn, the pvalues always have the reverse order, with the Box pvalue being largest, and the uncorrected being smallest.
Muller et al. [5] showed that the distribution function of the UNIREP test statistic can be expressed exactly in terms of the distribution function of the sum of b positively and b negatively weighted independent chisquares, namely y_{kh}∼χ^{2}(a,ω_{k}) and y_{ke}∼χ^{2}(ν_{e}),
Muller et al. [5] also reported accurate F approximations of the form
Here, y_{∗1}∼χ^{2}(ν_{∗1},ω_{∗}), y_{∗2}∼χ^{2}(ν_{∗2}), and . Parameters λ_{∗1}, ν_{∗1}, ω_{∗}, λ_{∗2}, and ν_{∗2} are defined in Appendix A: Additional notation. Power analysis involves {λ_{k}} and {ω_{k}}, with
the diagonal elements of the scaled noncentrality, Ω_{∗}=Υ^{′}ΔΥDg (λ)^{−1}=Δ_{∗}Dg (λ)^{−1}.
Methods
Estimating approximate UNIREP power with estimated covariance and fixed means
By extending results in Muller et al. [5], the following lemma helps simplify the F approximations. Appendix B: Supporting lemmas and proofs contains all proofs.
Lemma 1
The constant in the critical value of the UNIREP test statistic approximation introduced by Muller et al. [5] is equal to 1,
Thus,
For known covariance and means, the power approximations for the Box, GeisserGreenhouse, rankadjusted HuynhFeldt, and uncorrected tests are all of the form
Here, is equal to tr(Σ_{∗})/b with b equal to the rank of Σ_{∗}, and . Table 1 contains values for e_{1} through e_{5} for the four UNIREP tests when , and . The expressions for ∊_{d} and ∊_{n} were derived using the properties described in Lemma B.1.
Table 1. Sphericity multipliers for UNIREP power approximations for fixed means
In practice, some elements of may be estimated and hence random. The random elements imply random power values, as with estimated covariance and fixed means, , for , the unbiased restricted maximum likelihood (REML) estimator. A distinction must be carefully maintained between the estimation study and target study. The estimation study provides the covariance estimate and has sample size N_{est}, design matrix rank of rank (X_{est}), and ν_{est}=N_{est}−rank(X_{est}) degrees of freedom. The target study for which power is desired has sample size N, rank (X) and ν_{e}=N−rank(X) degrees of freedom.
The ML estimator from the GeisserGreenhouse test, , is an obvious estimator for the target study’s ∊. For power analysis, a parallel estimator is available for ∊_{n}:
A better choice, given in the following lemma, uses a ratio of unbiased estimators. The result generalizes the rankadjusted HuynhFeldt estimator for data analysis. Appendix B: Supporting lemmas and proofs has derivations of moments as well as all proofs.
Lemma 2
For the nonnull case, a ratio estimating ∊_{n} in terms of correlated, but unbiased, estimators is [b]
The corresponding estimator for the null case is , the rankadjusted HuynhFeldt sphericity estimator.
For estimated covariance and fixed means, approximate estimated UNIREP power is
with , and e_{1} through e_{5} estimated if unknown (Table 1). Nearly every combination of , , , , 1 and 1/b was examined for each UNIREP test for the wide range of simulations discussed in Muller et al. [5]. The values chosen provided the most accurate results. In retrospect, they are natural choices as well.
Approximate UNIREP power confidence intervals
The solution to the UNIREP problem parallels the solution to the univariate problem in Taylor and Muller [16]. The methods apply to any general linear hypothesis, including one degreeoffreedom contrasts, such as pairwise group comparisons and differences in linear trend between two groups. Tests giving scalar secondary parameters are also common for onegroup designs and twogroup comparisons. For known covariance and means, e_{5} is defined to be ∊_{n} (Table 1), and the noncentrality in equation 9 is with and . For . Therefore, it follows that
For estimated covariance and fixed means, a ratio involving one biased and two unbiased estimators (Lemma B.2) for estimating λ_{∗1} may be written as
In parallel to the univariate setting, the distribution of can be approximated with a Satterthwaite approximation: with . Lower and upper tail probabilities, α_{L} and α_{U}, respectively, define the confidence coefficient, p_{CL}=1−α_{L}−α_{U}. Also, and . Approximate confidence limits for the noncentrality may be calculated using the following:
Approximate lower and upper bounds are therefore and . The strict monotone dependence of the noncentral F function on the noncentrality ensures an approximate confidence interval for power. Lower and upper bounds on power are, with e_{1} through e_{4} defined in Table 1 for ,
and
Taylor and Muller [16] recommended onesided power confidence intervals by noting that “the change from a onesided to a twosided confidence interval has little effect on the upper bound, but a large effect on the lower bound”. Muller and Fetterman [20] provided examples of a onesided power confidence interval in the univariate case.
Approximate UNRIEP power confidence regions for power curves
The new methods allow calculating a confidence interval for a single power value. The logic of a proof in Taylor and Muller ([16] equations 2.12.13 and surrounding text) guarantees that accurate confidence regions are provided by the pointwise calculations. The proof may be sketched for the present setting as follows. Equations 1421 establish the validity of the approximate confidence interval for a particular alternative hypothesis, as specified by the scalar constant tr(Δ). The randomness in the noncentrality arises from a scalar random variable, , analogous to a variance. Equation 19 describes a single event with a specified probability. The inequality defining the event, and the associated probability, do not change for different values of the scalar constant tr(Δ). The smooth and strictly monotone dependence of power on the noncentrality ensures the validity of equations 2021. The proof is completed by noting that the monotonicity extends the simultaneity property to the power confidence region.
Figure 1 gives an example plot of approximate power confidence regions surrounding the predicted power curve for the rankadjusted HuynhFeldt test for ∊=0.720. Graphical representations such as Figure 1 help researchers accurately recognize the amount of uncertainty in their power calculation, and lead to better decisions about design.
Figure 1. Approximate 95% confidence region for predicted power of the rankadjusted HuynhFeldt test of interaction over tr(Δ) with N = 10 and population ∊=0.720 for conditions described in Section ‘Simulation 1 with rank (X) = 1 (onegroup repeated measures ANOVA)’.
In some cases scientists prefer to consider sample size as a function of the pattern of mean differences. The theory already presented allows plotting sample size as a function of mean difference, albeit with a shift in algorithm. The power function must be numerically inverted to solve for the sample size desired. Taylor and Muller [16] outlined the steps of algorithm needed for the univariate case. Details are not presented here for the sake of brevity.
Results
Simulation overview
The accuracy of the new approximate confidence intervals isevaluatedfor a widerange of conditions. Appendix C: Simulation details contains more details of the simulations and examples. All simulations were conducted in SAS/IML (SAS 9.1, SAS Institute, 2003) using a version of LINMOD 3.4 modified to include the rankadjusted HuynhFeldt estimator and test. Predicted power values and approximate power confidence intervals were computed using a similarly modified version of POWERLIB 2.03. The modified versions of LINMOD and POWERLIB are available at http://www.healthoutcomespolicy.ufl.edu/muller/ webcite.
Simulation 1 with rank (X)=1 (onegroup repeated measures ANOVA)
The accuracy of the new approximate confidence intervals were evaluated for a completely withinsubject design with p=9 repeated measures, N∈{10,20,40}, and q=rank(X)=1. Values for B, contrast matrices C, U, and Θ_{0} were chosen to test a withinsubject interaction for α=0.05. The model was chosen to ensure predicted power values for the GeisserGreenhouse test of 0.20, 0.50, and 0.80, using the power approximation in Muller et al. [5]. Population covariance matrices were chosen to provide ∊∈{0.282,0.505,0.720,1.00}. The sphericity values were selected to cover a range of eigenvalue patterns (i.e., patterns of the principal component variances) arising from the structure of Σ_{∗}. For example, if b=3, then gives ∊≈0.50. In turn, gives ∊=1/3≈0.33. Pseudorandom realizations of the error matrix, E, were generated and tests were calculated. The observed mean power values for the four UNIREP tests were calculated and tabulated for 500,000 replications per condition.
For the conditions described above, additional pseudorandom realizations of the error matrix were generated using an estimating study with sample size, N_{est}, of 10 and rank of X, rank (X_{est}), of 1 with 500,000 replications per condition for all four UNIREP tests. Corresponding estimated covariance matrices were calculated, as well as lower and upper bounds for power. Both one and twosided confidence intervals were evaluated with target coverages of 90% and 95%. The number of replications gave a standard error of estimated coverage probability less than or equal to 0.0003 for 1−α=0.95, and 0.0004 for 1−α=0.90, nearly guaranteeing 3 digits of accuracy. Only coverage of observed mean power values, and not predicted, was tabulated. The accuracy of the predicted power values, with respect to the observed, made it essentially redundant to consider both.
Only the worst case results for twosided 95% confidence intervals are presented here. The worst cases occurred with the smallest sample size for the target study. Table 2 contains results for the Box conservative test with a target sample size of 10. For a wide range of sphericity values and target power values, the target 95% estimated coverage is consistently reached. The two cases in which the target coverage is not reached occur with large population sphericity and low power. Under these conditions, the Box conservative test would not be used in practice.
Table 2. Target 95% CI (twosided) estimated coverage (×100) of simulated population power for N = 10
Table 2 also contains coverage results for the GeisserGreenhouse and the rankadjusted HuynhFeldt tests. The target 95% estimated coverage is consistently reached for extreme sphericity values for both tests. For midrange sphericity values, the coverage fell below the target coverage from 0.8% to 7.3% for the GeisserGreenhouse, and 1.4% to 12.1% for the rankadjusted HuynhFeldt. Coverage accuracy improved as the estimated power increased. In practice, lower power values are of little concern. For target power of 0.80 for the GeisserGreenhouse test, the largest deviation from the target 95% estimated coverage was 2.6% for the GeisserGreenhouse test and 4.1% for the rankadjusted HuynhFeldt test. Both occurred for the population sphericity value of 0.505.
Only a spherical case is appropriate to consider for the uncorrected test because otherwise the test will have inflated test size. Simulation results in Table 3 show that the approximation for the uncorrected test (with sphericity) always reached the target estimated coverage for the uncorrected test. The conservative bias could be eliminated by using optimal maximum likelihood estimates for the common variance and covariance (Morrison [21]), rather than the unstructured covariance estimate. Additional small changes are needed, associated with degrees of freedom, and corresponding to making all choices of e_{1} through e_{5} equal to 1.
Table 3. Target 95% CI (Twosided) estimated coverage (×100) of simulated population power for the uncorrected test (∊ = 1.00)
Although not presented here, in general, the accuracy of the coverage improved directly with increasing sample size, for all tests and conditions. The accuracy of the approximate confidence bounds for all four UNIREP tests also improved as the population sphericity increased.
Simulation 2 with rank (X)>1
All of the simulations in the second example considered the condition of rank of X greater than 1. The cases used p=5 repeated measures, N∈{16,32,48}, and q=rank(X)∈{2,4,8,16}, corresponding to a three, five, nine, and seventeengroup comparison, respectively. Appropriate fixed matrices of regression coefficients, B, contrast matrices, C and U, and Θ_{0} were chosen to test a withinsubject interaction for a test size, α, of 0.05. The matrices were also chosen to ensure approximate target predicted power values for the rankadjusted HuynhFeldt test of 0.20, 0.50, and 0.80. Specific design matrices, X, were defined. Population covariance matrices were chosen to provide specific population sphericity values, ∊∈{0.282,0.505,0.720,1.00}. Observed mean power values were simulated and tabulated in a similar manner to that described in section ‘Simulation 1 with rank (X)=1 (onegroup repeated measures ANOVA)’.
Pseudorandom realizations of the error matrix were generated using an estimating study with sample size, N_{est}, of 16 and rank of X, rank (X_{est}), of 4 with 500,000 replications per condition for all four UNIREP tests. Corresponding estimated covariance matrices were calculated, as well as lower and upper bounds for power using the methods presented in section ‘Approximate UNIREP power confidence intervals UNIREP power confidence intervals’. Approximate confidence interval coverage was defined as the proportion of the 500,000 simulated bound realizations that successfully covered the observed mean power values for each condition described above. Only coverage of observed mean power values, and not predicted, were tabulated. The accuracy of the predicted power values, with respect to the observed, made it essentially redundant to consider both. Both one and twosided confidence intervals were evaluated with target coverages of 90% and 95%.
In practical biomedical research, low power values are of little concern. Rarely will one have a power targeted below 0.70. Therefore, only the results for target power values of 0.80 will be presented and discussed. Power confidence interval coverage converged to the target coverage as sample size increased. Only the worst case results for twosided 95% confidence intervals are presented here. The worst cases occurred with the smallest sample size for the target study, for a variety of population sphericity values and estimated population powers.
In Table 4, the observed mean population powers are presented for the four UNIREP tests for the population sphericity values and ranks of X considered for target rankadjusted HuynhFeldt power of 0.80 and sample size of 16 or 48. In general, as the population sphericity increased and rank of X increased, the observed mean power values for the Box conservative, the GeisserGreenhouse, and the rankadjusted HuynhFeldt tests decreased. Only the Box conservative had severely biased power values as the population sphericity increased.
Table 4. Simulated population power for target power = 0.80, N = 16 and rank (X ) = q
In Table 5, the proportion of simulations in which the estimated confidence interval successfully covered the observed mean population power values for each test is shown. The results are based on using an estimating study with sample size, N_{est}, of 16 and rank of X, rank (X_{est}), of 4. In general, the approximate power confidence intervals nearly always reached the target 95% coverage for the Box conservative test. The coverage became more conservative as rank of X decreased. Similarly, the coverage became more conservative for the GeisserGreenhouse and rankadjusted HuynhFeldt tests as rank of X decreased. The GeisserGreenhouse and rankadjusted HuynhFeldt tests performed adequately in all cases except for the midrange population sphericity value, ∊=0.505. The largest deviation from the target 95% estimated coverage was 13.6% and 16.0% for the GeisserGreenhouse and rankadjusted HuynhFeldt tests, respectively, which occurred for ∊=0.505 and rank of X equal to 8. The approximate power confidence intervals for the uncorrected test reached the target coverage value for every case considered in which the uncorrected test would be used.
Table 5. Target 95% CI (twosided) estimated coverage (×100) of simulated population power for target power = 0.80, N = 16, rank (X) = q, N_{est }= 16, and rank (X_{est}) = 4
Although not presented here, in general, as sample size increased the conservative coverage values observed for the Box conservative and the uncorrected tests slowly converged to the target coverage value. This trend was observed for the conservative coverage values with the extreme population sphericity values for the GeisserGreenhouse and the rankadjusted HuynhFeldt tests as well. The same is true of the liberal coverage values observed for the midrange population sphericity values for the GeisserGreenhouse and the rankadjusted HuynhFeldt tests. Similar results were obtained for the target 90% twosided confidence interval coverage as well as the 95% and 90% onesided confidence intervals coverage.
The estimated coverages of these tabulated observed mean power values for each test were simulated for population sphericity values of 0.282, 0.505, 0.720, and 1.00. In Table 6, the worst case results from these simulations, which occurred for population sphericity 0.505, are presented. Approximate confidence intervals were simulated for 500,000 replications per condition (standard error of estimated coverage probability less than or equal to 0.0003 for 1−α=0.95, and 0.0004 for 1−α=0.90). The estimating studies use sample sizes, N_{est}, of 16, 32, and 48, and ranks of X_{est} of 2, 4, and 8.
Table 6. Target 95% CI (twosided) estimated coverage (×100) of simulated population power for ∊=0.505 , target power = 0.80, N= 48 and rank (X) = q
In general, for population sphericity values of 0.282 and 0.505, the approximate power confidence interval coverage for the Box conservative test converged to the target coverage value as rank of X_{est} increased, and thus ν_{est} decreased. Coverage decreased as rank of X from the target study increased. For larger rank of X, the approximate power confidence interval coverage fell short of the target coverage in several instances. No clear trend was apparent as N_{est} increased. The Box conservative test would not be used for larger population sphericity values. However, the realization that the target coverage was reached in nearly every case considered for the larger population sphericity values is worth mentioning.
The approximate power confidence interval coverages for both the GeisserGreenhouse and rankadjusted HuynhFeldt tests seem to have converged to the target coverage value as rank of X_{est} increased, and thus ν_{est} decreased, except in cases of sphericity. Such cases have little practical importance since exact results are available if sphericity is valid. Coverage decreased as rank of X from the target study increased. As observed in previous simulations, the approximate power confidence interval coverages for both the GeisserGreenhouse and rankadjusted HuynhFeldt tests fell short of the target coverage to varying degrees in nearly every case considered for midrange population sphericity values. This outcome was also observed for larger rank of X from the target study for population sphericity of 0.282. The approximate power confidence interval coverage for the uncorrected tests reached the target coverage value in every case except for large ν_{est} and small rank of X from the target study. The approximate power confidence interval coverage increased as the ranks of X for both the target and estimating studies increased and as N_{est} decreased.
The slow convergence of the approximate power confidence interval coverage to the target coverage may be due, in part, to use of and in the approximate power confidence interval equation. These estimators of the sphericity parameter are ratios of unbiased estimators for the nonnull and null cases, respectively. The variances of these estimators are much larger than the variances for and . The larger variances may account for the slow convergence to the population power as the target and estimating study sample sizes and degrees of freedom increase. Further simulations may be needed to confirm this reasoning.
Alternate approximations
In attempts to develop even better confidence bound estimates, additional approximations were developed and evaluated. One approach approximated the distribution of with an F. Using the methods presented in Kim et al. [22], the numerator of was approximated with a weighted noncentral chisquare, while the denominator was approximated with a weighted central chisquare. Two concerns arose. First, the denominator is not necessarily a central quadratic. The 2tr(Δ/a) component makes the denominator more of a shifted central quadratic. Second, the Kim et al. [22] result requires that the components of the numerator and denominator be mutually independent, which does not hold. Simulations demonstrated that the approximation was inaccurate in small samples.
Alternative approximations were developed and evaluated. The alternatives matched only the numerator to a weighted noncentral chisquare or to a weighted central chisquare with the denominator a constant equal to E. All were less accurate than the approximation presented here.
Discussion
Even for small sample sizes, the proposed power confidence intervals attain very accurate coverage probabilities for the Box conservative test in all cases and for the uncorrected test with ∊=1 (the only case for which it should be used). The result is also true for the extreme population sphericity values for the GeisserGreenhouse and rankadjusted HuynhFeldt tests. For midrange population sphericity values, the coverage probabilities of the approximate power confidence intervals for the latter two tests often fell somewhat short of the various target coverage values considered. Coverage probabilities improve as sample size increases. Accuracy is better for higher target power values than for lower, which makes the results useful in practice. Onesided confidence intervals are recommended for lower bounds on power.
The techniques also allow plotting power confidence regions around an estimated power curve (Figure 1). The resulting plots have been well received by researchers.
Conclusions
Good statistical practice requires associating a credible measure of uncertainty with any parameter estimate. We described and evaluated new methods to meet the need for UNIREP power estimates based on an estimated covariance with fixed means. Across a large range of conditions, the methods provide reasonably accurate coverage for all four UNIREP tests.
Appendix
Appendix A: Additional notation
The additional notation comes from Muller et al. [5], who showed that if , , and , then
They used the parameters (assumed known) to approximate the UNIREP test statistic with a noncentral F distribution, as presented in equation 6.
Appendix B: Supporting lemmas and proofs
Lemma B.1
Proof of lemma B.1
Lemma B.2
The first moments of , , , and are known.
Proof of lemma B.2
Following Wishart [23], , such that ν_{e}=N−rank(X). For Wishart while here 〈Σ〉_{jj}=σ_{jj} and ρ_{jk}=σ_{jk}/(σ_{jj}σ_{kk})^{1/2}. Wishart [23] said, with emphasis not in the original, “ …moment coefficients are in all cases except the first calculated about the mean of the sample …”. Here μ(n) indicates the expression in equation n at the end of Wishart [23], . Thus
Hence
Finally, has E (S)=ν_{e}Σ_{∗} (Muller and Stewart [6], Theorem 10.10). Hence E [tr(ΔS)]=tr[E(ΔS)]=tr[ΔE(S)]=tr[Δ(ν_{e}Σ_{∗})]=ν_{e}tr(ΔΣ_{∗}) and
Proof of lemma 1
Substituting equivalent terms from equations A.1A.5 into (λ_{∗2}abν_{∗2})/(λ_{∗1}ν_{∗1}bν_{e}) allows combining like terms and simplifying to give
If , then and with y_{∗1}∼χ^{2}(ν_{∗1},ω_{∗}) and y_{∗2}∼χ^{2}(ν_{∗2}) as described in Muller et al. [5]. In turn,
Pr{(y_{∗1}/ν_{∗1})/(y_{∗2}/ν_{∗2})≤tλ_{∗2}abν_{∗2}/(λ_{∗1}ν_{∗1}bν_{e})}=F_{F}(t;ν_{∗1},ν_{∗2},ω_{∗}).
Proof of lemma 2
Using Lemma B.2, unbiased estimators for tr^{2}(Σ_{∗}) and are and , as introduced in Gribbin [18] and Chi et al. [19]. As shown in Lemma B.2, and are unbiased estimators for and , respectively. Thus,
In the null case Δ=0 and reduces to the rankadjusted sphericity estimator, .
Appendix C: Simulation details
Covariance conditions
Covariance conditions 58 from Table III of Coffey and Muller [24] were used for each example described below: Σ_{∗}=Dg(λ_{j}) for j∈{1,2,3,4}, with
Thus, ∊∈{0.28,0.51,0.72,1.00}. Given Σ_{∗}=Dg(λ_{j}), it follows that Σ=UΣ_{∗}U^{′}.
CLAHE mammography example
Computer scientists developed the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm to improve contrast in medical images. Independent observers considered 3×3=9 Clip ×Region combinations. Region denotes the size of the image (pixels^{2}) at which contrasts are controlled and Clip level limits the maximum contrast adjustment. In the multivariate model X=1_{N}, while withinperson factors Clip and Region gave Y, (N×9). Also B, (1×9), contained mean log10(contrast) for the unprocessed condition minus the mean for each of the nine combinations of Clip and Region (β_{cr}=μ_{unprocessed}−μ_{cr}). If T_{c} contains orthonormal linear and quadratic trends for log2(Clip) ∈ {1,2,4}, and T_{r} does the same for log2(Region) ∈ {1,3,5}, then the 9 × 4 withinpersons contrast matrix, U_{cr} is
With L the linear and Q the quadratic trends for interaction components being tested, .
All four covariance patterns were factorially combined with N∈{10,20,40}. The multivariate test considered with α=0.05, and β_{P} the scaling factor for B corresponding to approximate target power P∈{0.20,0.50,0.80} for the GeisserGreenhouse approximation using methods in Muller et al. [5]. The conditions in the example were used in section ‘Simulation 1 with rank (X)=1 (onegroup repeated measures ANOVA)’. More details of the example are in Muller et al. [5].
Test of interaction with rank (X)>1 Example
All cases used 5 repeated measures, N∈{16,32,48}, and rank (X)∈{2,4,8,16}. For obvious reasons, a rank of X equal to 16 was not considered for the smallest sample size. All four covariance patterns were factorially combined with the sample sizes and ranks X. In the multivariate model, X=I_{q}⊗1_{repn}, such that repn = N/q, and ⊗ is a Kronecker product. If
then B=β_{P}·D_{a}, such that β_{P} was the scaling factor giving approximate target power P∈{0.20,0.50,0.80}, for the rankadjusted HuynhFeldt power approximation. The withinsubject contrast, U, (5×4), contained orthonormal linear, quadratic, cubic and quartic trends:
Each row of the betweensubject contrast, C, a (q−1×q) orthonormal matrix, contained one of the (q−1) trends. The contrasts define a test of interaction of between and withinsubject trends. Without loss of generality, we assumed Θ_{0}=0. A test size, α, of 0.05 was used.
Computational methods
All power computations were conducted in SAS/IML (SAS 9.1, SAS Institute, 2003). Free software LINMOD 3.4 was used for all data analysis and includes new methods. Free software POWERLIB 2.1 in Johnson et al. [25] was used for all power analysis and includes the new methods. Both are available at http://healthoutcomespolicy.ufl.edu/facultydirectory/mullerkeith/listofsoftware/ webcite. UNIREP power is also available in GLIMMPSE, a free webbrowser based program with a graphical user interface aimed at health scientists (http://www.SampleSizeShop.org webcite). The next version of GLIMMPSE is expected to implement the confidence interval methods.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The majority of the work comes directly from the dissertation of MJG at UNC Chapel Hill, in partial fulfillment of the requirements of the DrPH, under the direction of KEM. Dr. KEM and Dr. YYC assisted in revising and compressing the notation, text and tables. Dr. PWS was a member of the dissertation committee and assisted in formulating the proof of simultaneous coverage of the confidence regions and in final editing. All authors read and approved the final manuscript.
Acknowledgements
We thank the reviewers Drs. Warren Comulada, Peter Lachenbruch, and Russell Lenth for their timely review and beneficial comments. This work was supported in part by a UF CTSI core grant to Chi and Muller (NCATS UL1TR000064). Chi’s support included NIDCR R01DE020832, NIDDK R01DK072398, NIDA R01DA031017, and NICHD P01HD065647. Muller’s support included NIDCR R01DE02083201A1, NIDDK R01DK072398, NIDCR U54DE019261, NCRR K30RR022258, NHLBI R01HL091005, NIAAA R01AA01345801, and NIDA R01DA031017.
References

Kraemer HC, Mintz J, Noda A, Tinklenberg J, Jerome A, Yesavage: Caution regarding the use of pilot studies to guide power calculations for study proposals.
Arch Gen Psychiatry 2006, 63:484489. PubMed Abstract  Publisher Full Text

Maxwell SE: The persistence of underpowered studies in psychological research:causes, consequences, and remedies.
Psychol Methods 2004, 9:147163. PubMed Abstract  Publisher Full Text

Taylor DJ, Muller KE: Bias in linear model power and sample size calculation due to estimating noncentrality.
Commun Stat Theory Methods 1996, 25:15951610. Publisher Full Text

Muller KE, Pasour VB: Bias in linear model power and sample size due to estimating variance.
Commun Stat Theory Methods 1997, 26:839851. Publisher Full Text

Muller KE, Edwards LJ, Simpson SL, Taylor DJ: Statistical tests with accurate size and power for balance linear mixed models.
Stat Med 2007, 26:36393660. PubMed Abstract  Publisher Full Text

Muller KE, Stewart PW: Linear Model, Theory: Univariate, Multivariate and Mixed Models. New York: Wiley; 2006.

Box GEP: Some theorems on quadratic forms applied in the study of analysis of variance problems: I. Effects of inequality of variance in a oneway classification.
Ann Math Stat 1954, 25:290302. Publisher Full Text

Box GEP: Some theorems on quadratic forms applied in the study of analysis of variance problems: II. Effects of inequality of variance and of correlation between errors in the twoway classification.
Ann Math Stat 1954, 25:484498. Publisher Full Text

Geisser S, Greenhouse SW: An extension of Box’s results on the use of the F distribution in multivariate analysis.
Ann Math Stat 1958, 29:885891. Publisher Full Text

Greenhouse SW, Geisser S: On methods in the analysis of profile data.
Psychometrika 1959, 24:95112. Publisher Full Text

Huynh H, Feldt LS: Estimation of the Box corrections for degrees of freedom from sample data in randomized block and splitplot designs.

Davies RB: Distribution of a linear combination of chisquare random variables.
Appl Stat 1980, 29:323333. Publisher Full Text

Muller KE, Barton CN: Approximate power for repeated measures ANOVA lacking sphericity.
J Am Stat Assoc 1989, 84:549555. Publisher Full Text

Muller KE, Barton CN: Correction to Approximate power for repeatedmeasures ANOVA lacking sphericity.

Browne RH: On the use of a pilot sample for sample size determination.
Stat Med 1995, 14:19331940. PubMed Abstract  Publisher Full Text

Taylor DJ, Muller KE: Computing confidence bounds for power and sample size of the general linear univariate model.

Lecoutre B: A correction for the , approximate test with repeated measures design with two or more independent groups.

Gribbin MJ: Better power methods for the univariate approach to repeated measures.
PhD Dissertation 2007.
University of North Carolina at Chapel Hill, Department of Biostatistics

Chi YY, Gribbin MJ, Lamers Y, Gregory JF, Muller KE: Global hypothesis testing for highdimensional repeated measures outcomes.
Stat Med 2012, 31:724742. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Muller KE, Fetterman BA: Regression and, ANOVA: An Integrated Approach Using SAS®; Software Chapter 17. Cary: SAS Institute; 2002.

Morrison DF: Multivariate Statistical, Methods. New York: McGrawHill; 1990.

Kim HY, Gribbin MJ, Muller KE, Taylor DJ: Analytic, computational and approximate forms for ratios of noncentral and central Gaussian quadratic forms.
J Comput Graphical Stat 2006, 15:443459. Publisher Full Text

Wishart J: The generalized product moment distribution in samples from a normal multivariate population.

Coffey CS, Muller KE: Properties of internal pilots with the univariate approach to repeated measures.
Stat Med 2003, 22:24692485. PubMed Abstract  Publisher Full Text

Johnson JL, Muller KE, Slaughter JC, Gurka MJ, Gribbin MJ, and Simpson SL: POWERLIB: SAS/IML software for computing power in multivariate linear models.
J Stat Softw 2009, 30:127.
[http://www.jstatsoft.org/v30/i05/paper webcite]
PubMed Abstract  PubMed Central Full Text
Prepublication history
The prepublication history for this paper can be accessed here: