Center for Medical Statistics, Informatics, and Complex Systems, Medical University of Vienna, Vienna, Austria

European Medicines Agency (EMA), London, UK

Abstract

Background

For gene expression or gene association studies with a large number of hypotheses the number of measurements per marker in a conventional single-stage design is often low due to limited resources. Two-stage designs have been proposed where in a first stage promising hypotheses are identified and further investigated in the second stage with larger sample sizes. For two types of two-stage designs proposed in the literature we derive multiple testing procedures controlling the False Discovery Rate (FDR) demonstrating FDR control by simulations: designs where a fixed number of top-ranked hypotheses are selected and designs where the selection in the interim analysis is based on an FDR threshold. In contrast to earlier approaches which use only the second-stage data in the hypothesis tests (pilot approach), the proposed testing procedures are based on the pooled data from both stages (integrated approach).

Results

For both selection rules the multiple testing procedures control the FDR in the considered simulation scenarios. This holds for the case of independent observations across hypotheses as well as for certain correlation structures. Additionally, we show that in scenarios with small effect sizes the testing procedures based on the pooled data from both stages can give a considerable improvement in power compared to tests based on the second-stage data only.

Conclusion

The proposed hypothesis tests provide a tool for FDR control for the considered two-stage designs. Comparing the integrated approaches for both selection rules with the corresponding pilot approaches showed an advantage of the integrated approach in many simulation scenarios.

Background

Modern experimental techniques in genetic research such as microarray experiments or gene association studies produce high dimensional data and often thousands of hypotheses are tested simultaneously to identify genetic markers. Due to limited resources, the number of measurements per marker in a conventional single-stage design is often low. Two-stage designs have been proposed where in a first stage promising markers are identified from the set of all markers considered initially. Thus, hypotheses corresponding to unpromising markers can be dropped in the interim analysis such that the second stage is performed with the reduced set of selected hypotheses. Given limited total resources or budgets, this allows the allocation of a larger number of observations to more promising hypotheses. It has been shown that such sequential procedures are typically considerably more powerful than single-stage designs

An important problem when drawing inference from data produced by such designs is the construction of hypothesis tests that control the False Discovery Rate (FDR). While the construction of such test procedures is straightforward if only the second-stage data is used for testing, tests that make use of the data from both stages need to account for the specific selection rule used to select hypotheses for the second stage.

For two-stage procedures where in an interim analysis all hypotheses with an unadjusted first-stage _{1} are selected for the second stage, hypothesis tests based on the pooled data from both stages have been proposed that control FDR or the familywise error rate

In this work we propose statistical tests to control the FDR in two-stage designs with selection rules that are not based on a fixed threshold for the first-stage _{1} in the interim analysis (FDRS design). All hypotheses that can be rejected with a test controlling the FDR at level _{1} are selected for the second stage
_{1} no hypothesis can be rejected with the interim test at FDR level _{1} and the trial is stopped for futility.

A simple approach to construct hypothesis tests controlling the FDR for two-stage designs is to consider tests based on the second-stage data only. Standard multiple testing procedures applied to the second-stage data will control the FDR. For the FDRS design Benjamini and Yekutieli

The paper is structured as follows: in the next section the testing problem and the selection rules are introduced. Then the results of a simulation study which investigates the actual FDR and compares the mean number of rejected alternatives of the integrated approach to the pilot approach are presented. Finally, a real data example and a short discussion are given.

Methods

The test problem

We consider an experiment to test _{0i
}: _{
i
} = 0 versus _{1i
}: _{
i
} ≠ 0,

To adjust for multiple testing we aim to control the FDR of the experiment. The FDR
_{(1)} ≤ _{(2)} ≤ ⋯ ≤ _{(m)} and let _{
i
}{_{(i)} ≤ _{(i)} smaller than or equal to _{0i
} such that _{(i)} ≤ _{0}
_{0} denotes the (unknown) proportion of true null hypotheses among the

The two-stage procedure

In the first-stage for each hypothesis _{1} observations are collected. Then an interim analysis is performed and for each hypothesis a two-sided first-stage _{2} hypotheses with the smallest _{2} can be either a pre-fixed number or may depend on the first-stage results. Below we consider several choices for _{2}. In a second stage for each selected hypothesis _{2} observations are collected. _{2} is assumed to be fixed and does not depend on the number of selected hypotheses. We consider two different approaches to arrive at the final test decision: the “integrated approach”, where the test decision is based on the combined data of both stages and the “pilot approach”, where the test decision is based on the second-stage data only.

In the following we introduce several rules to determine the number of selected hypotheses _{2}.

Selection rules for two-stage designs

Selection according to a prefixed selection boundary _{1}

Two-stage designs have been proposed
_{1} is pre-specified and in the interim analysis all hypotheses with a first-stage _{1} are selected for the second stage. Then

where **1** is the indicator function which equals 1 if the condition in the parentheses is satisfied and 0 otherwise. Thus, _{2} is a random variable.

Pre-fixed number of hypotheses selected for the second stage - FNS design

With this rule the value of _{2} is fixed a priori and the _{2} hypotheses with the smallest first-stage

Selection based on an FDR threshold - FDRS design

In this approach all hypotheses which are significant according to the BH-procedure at a prefixed level _{1 } > _{2} is a random variable which depends on the first-stage results. If no hypothesis can be rejected at level _{1} in the interim analysis and thus be carried over to the second stage, _{2} is set to zero. In this case the whole experiment is stopped. Note that under the global null hypothesis, i.e. in the setting where all null hypotheses are true and thus _{0} = 1, this occurs with a probability of 1 - _{1}.

FDR control

In the subsections below we review the FDR controlling test procedure for two-stage designs where hypotheses are selected based on a prefixed selection boundary applied to the first-stage

Selection according to a prefixed selection boundary _{1}

Pilot approach

In the pilot approach the final test statistics are based on data from the second stage only. The first-stage data are used for selecting promising hypotheses only. To control the FDR of the experiment, at the end of the trial for each hypothesis a two-sided _{2}, is calculated, where

Integrated approach

If the data from both stages are to be used in the final test decision, one can account for the selection in the interim analysis by calculating sequential _{
i
},

and if

where _{
i
} denotes the standardized overall mean of the observations from both stages and
_{1}/2)-quantile of the standard normal distribution. If the stopping criterion is satisfied the sequential _{1}, …, _{
m
}.

In a two-stage procedure with fixed per hypothesis sample sizes _{1}, _{2} and a fixed selection boundary _{1} the sequential

If for the subset of true null hypotheses the observations are independent across hypotheses such that the sequential

Next we extend the test procedure to the FNS and FDRS design.

FNS design

Pilot approach

For the FNS selection rule the FDR control with the pilot approach is straightforward: the BH-procedure can be applied to the second-stage _{2} selected hypotheses. Because the first-stage data do not enter the final test statistics, FDR control is guaranteed under the assumption of positive regression dependency.

Integrated approach

To utilize the data from both stages for the final test decision, we propose to compute sequential _{1} (which is not defined for the FNS design) by the value of the largest first-stage _{1} is now data dependent. Because this is not accounted for in the calculation of the sequential _{1} converges almost surely to a fixed number. Thus, asymptotically _{1} is deterministic and for large _{2} the procedure becomes similar to the method with a prefixed selection boundary.

Note that with the integrated two-stage testing procedure, hypotheses that have not been selected in the interim analysis can in principle be rejected in the final test. Especially, if _{2} is small compared to the number of hypotheses for which the alternative holds and the effect sizes are large, hypotheses that were not selected at the interim analysis can be rejected at the end because for every hypothesis a sequential

FDRS design

Pilot approach

As for the FNS selection rule, if the BH-procedure is applied at nominal level _{2} selected hypotheses (computed from the observations of the second stage only), FDR control is guaranteed. However, as Benjamini and Yekutieli
_{1}, and, in a second stage, the selected hypotheses are tested at nominal level _{2}, the FDR of the second-stage test is actually controlled at level _{1}
_{2}
_{0}, given the test statistics at each stage are positively regression dependent
_{1} is applied, the FDR is still controlled at level _{0}

Integrated approach

Similar to the FNS rule we propose to compute sequential

Again, the resulting threshold _{1} is data dependent: we set _{1} = _{2}
_{1}/_{2} is a random variable. Then _{1} is approximately equal to the largest first-stage _{1} converges almost surely. Hence, in these settings _{1} is asymptotically deterministic.

Under the global null hypothesis _{1} does not converge and simulations (see the Results section) show that the FDR is actually inflated. Therefore, we suggest the following modification of the test procedure. Let _{
s
} > 0 denote a positive constant. In cases where less than _{
s
} hypotheses are selected by the FDRS selection rule the threshold _{1} used in (2) is set to the _{
s
}-smallest first-stage

Generalizations to other testing problems

The procedure can be directly generalized to two group comparisons, replacing the standardized means by the standardized mean between group differences. More generally, the sequential _{1} / (_{1} + _{2}) (resp. _{2} / (_{1} + _{2})) is then replaced by the correlation

Results

First we investigate the actual FDR of the proposed testing procedures for the FNS and FDRS selection rules. Additionally, to quantify the advantage in power of the integrated approach compared to the pilot approach, we report the mean number of rejected alternatives under different scenarios. We consider the one-sample _{0i
}: _{
i
} = 0 versus _{1i
}: _{
i
} ≠ 0,

**We report the simulation scenarios and results of the simulation study assessing the FDR of the FNS and FDRS design (modified procedure with m _{s}=6) for the case of independent test statistics as described in the results section of the manuscript.** For each scenario at least 1000 simulation runs were performed. For scenarios with lower

Click here for file

In the following we assume independence of test statistics across hypotheses. However, because this assumption is often not satisfied in genetic data, we also report simulations assuming several correlation structures.

All computations were performed using the statistical language R

Simulation results for the FNS procedure

Control of the error rate

Integrated approach: In all simulated scenarios the FDR is well controlled if _{2} > 5 (see Additional file
_{2} is chosen, the FDR may be inflated up to 0.11.

A heuristic explanation for this inflation is that for very small _{2} the _{2} for two particular scenarios.

Power values and error rates

**Power values and error rates. (A)** and **(C)** show the actual FDR for the FDRS and the FNS design, respectively, **(B)** and **(D)** the corresponding mean number of rejected alternatives for _{1}=6, _{2}=12, _{0} = 0.99, _{s} = 6. The effect sizes are

Pilot approach: For the pilot approach the FDR is controlled in all scenarios.

Mean number of rejected hypotheses

Table
_{0} values from 0.5 to 1 for the investigation of the power we consider settings where alternative hypotheses are sparse. These are settings where the advantage of two-stage designs that select promising hypotheses at interim analysis is expected to be largest.

**
m
**

**
m
**

**
m
**

**
m
**

**
π
**

**Δ = 1**

**Δ = 1.6**

**Δ = 1**

**Δ = 1.6**

**Δ = 1**

**Δ = 1.6**

The mean number of rejected alternatives for the integrated design and the improvement in percent compared to the pilot design (in parentheses) for independent scenarios with _{1} = 6, _{2} = 12 (20000 simulation runs per scenario for

0.01

.95

6.1 (17%;)

15.4 (58%;)

58.7 (19%;)

141.9 (46%;)

583.7 (19%;)

1406.8 (45%;)

0.01

.99

1.8 (13%;)

5.0 (2%;)

13.3 (19%;)

41.9 (3%;)

126.8 (20%;)

407.4 (3%;)

0.05

.95

12.5 (21%;)

26.7 (5%;)

117.6 (23%;)

260.3 (5%;)

1166.9 (23%;)

2590.3 (5%;)

0.05

.99

2.8 (25%;)

6.2 (4%;)

19.8 (35%;)

53.8 (6%;)

188.4 (36%;)

523.7 (6%;)

0.1

.95

14.9 (27%;)

30.1 (6%;)

139.9 (29%;)

293.9 (7%;)

1388.2 (29%;)

2926.5 (7%;)

0.1

.99

3.1 (33%;)

6.5 (6%;)

22.0 (47%;)

57.0 (9%;)

209.9 (49%;)

554.7 (9%;)

In all scenarios the integrated approach rejects more or the same number of alternative hypotheses than the pilot approach. The increase in rejections is up to 59%;. Figure
_{2} on the mean number of rejected alternatives for the integrated (black lines) and the pilot approach (grey lines): For Δ = 1 (solid lines) and very small _{2}, the number of rejected alternatives is very small but it clearly increases with _{2}. Here the difference to the pilot design is more distinct. For Δ = 1.6 the advantage of the integrated approach is only moderate.

Simulation results for the FDRS procedure

Control of the error rate

Integrated approach: For the original procedure (without the modified critical value) and _{0} < 0.8 the FDR is controlled for all considered values of _{1} and _{0} the FDR may be inflated, especially if the effect size under the alternative is low such that the expected number of selected hypotheses for the second stage is very small. The inflation is, however, moderate and the maximal FDR over all simulation scenarios is 0.073 instead of the nominal 0.05.

The simulations for the modified procedure show that across all scenarios the FDR is controlled for _{
s
} = 6 (see Figure
_{
s
} = 6. Note that for some of the parameter values the modified procedure is strictly conservative.

For the pilot approach FDR control follows by theoretical arguments in

Mean number of rejected hypotheses

Table
_{1} as expected. For small values of _{1}, the pilot and the integrated approach have similar power values. In some settings for lower

**
m
**

**
m
**

**
m
**

**
α
**

**
π
**

**Δ = 1**

**Δ = 1.6**

**Δ = 1**

**Δ = 1.6**

**Δ = 1**

**Δ = 1.6**

The mean number of rejected alternatives for the integrated design and the improvement in percent compared to the pilot design (in parentheses) for independent scenarios with _{1} = 6, _{2} = 12, _{
s
} = 6 (20000 simulation runs per scenario for

0.1

.95

2.5 (-1 %;)

18.3 (0%;)

17.7 (1%;)

171.7 (0%;)

166.2 (0%;)

1703.0 (0%;)

.99

0.4 (-4%;)

3.3 (0%;)

1.3 (-3%;)

23.1 (0%;)

7.4 (0%;)

219.7 (0%;)

0.2

.95

4.3 (1%;)

21.7 (1%;)

34.1 (2%;)

206.4 (0%;)

328.6 (2%;)

2047.1 (0%;)

.99

0.6 (-3%;)

3.9 (0%;)

2.3 (0%;)

28.9 (0%;)

15.8 (1%;)

275.7 (0%;)

0.5

.95

9.3 (8%;)

27.4 (2%;)

81.3 (7%;)

264.1 (2%;)

799.7 (7%;)

2625.7 (2%;)

.99

1.3 (4%;)

5.1 (1%;)

6.2 (5%;)

39.8 (1%;)

51.2 (5%;)

382.0 (1%;)

If the first-stage sample size is increased, the advantage of the integrated approach increases: E.g., for _{1} = _{2} = 9 and _{0} = 0.95, Δ = 1, _{1} = 0.5, the mean number of rejected hypotheses is 22%; larger for the integrated approach than for the pilot approach.

Correlated test statistics

Test statistics from genetic data are often stochastically dependent across hypotheses. In this section we study the impact of correlation between test statistics on the FDR and consider auto-correlation, block-correlation

For auto-correlation we consider an order among hypotheses and assume an autoregressive correlation structure. Here the correlation between the test statistics for hypotheses i and j is given by ^{|i-j|}. For block-correlation we assume that the test statistics are correlated in blocks of 20 hypotheses where the correlation between the test statistics within one block is _{2} ∈ {0.01_{0} ∈ {0.95, 0.99,1} with correlation coefficient

For block-correlation and auto-correlation the results are very similar to the independent case concerning the actual FDR. The mean number of rejected alternatives for the pilot and the integrated design are nearly identical (data not shown). For equi-correlated data the error rates of both selection procedures are maintained in all scenarios, even under the global null hypothesis. However, for most scenarios the procedure appears to be more conservative compared to the independent case. For scenarios with small

**
m
**

**
m
**

**
m
**

**
α
**

**
π
**

**Δ = 1**

**Δ = 1.6**

**Δ = 1**

**Δ = 1.6**

**Δ = 1**

**Δ = 1.6**

The mean number of rejected alternatives for the integrated design and the improvement in percent compared to the pilot design (in parentheses) with _{1} = 6, _{2} = 12,

0.1

.95

2.7 (3%;)

18.1 (0%;)

20.6 (5%;)

169.9 (0%;)

180.2 (5%;)

1682.2 (0%;)

.99

0.4 (0%;)

3.2 (0%;)

2.0 (12%;)

22.7 (0%;)

16.4 (19%;)

214.4 (1%;)

0.2

.95

3.9 (9%;)

21.5 (1%;)

30.6 (12%;)

203.4 (1%;)

300.6 (14%;)

2015.9 (1%;)

.99

0.6 (7%;)

3.9 (1%;)

2.9 (26%;)

28.3 (1%;)

26.0 (33%;)

269.5 (2%;)

0.5

.95

7.3 (22%;)

26.8 (3%;)

60.5 (28%;)

257.3 (4%;)

576.0 (29%;)

2554.7 (4%;)

.99

1.1 (25%;)

4.9 (3%;)

5.7 (6%;)

37.9 (4%;)

48.8 (78%;)

363.3 (4%;)

**
m
**

**
m
**

**
m
**

**
m
**

**
π
**

**Δ = 1**

**Δ = 1.6**

**Δ = 1**

**Δ = 1.6**

**Δ = 1**

**Δ = 1.6**

The mean number of rejected alternatives for the integrated design and the improvement in percent compared to the pilot design (in parentheses) with _{1} = 6, _{2} = 12, _{
s
} = 6 (20000 simulation runs per scenario for

0.01

.95

8.0 (13%;)

15.2 (53%;)

77.3 (13%;)

141.2 (43%;)

768.0 (13%;)

1392.8 (41%;)

.99

2.4 (0%;)

5.6 (1%;)

17.7 (0%;)

48.6 (1%;)

168.5 (0%;)

471.5 (1%;)

0.05

.95

14.4 (12%;)

28.5 (4%;)

135.4 (13%;)

278.9 (4%;)

1342.9 (13%;)

2783.1 (4%;)

.99

3.0 (15%;)

6.3 (3%;)

21.6 (21%;)

55.3 (4%;)

207.9 (21%;)

537.4 (5%;)

0.1

.95

15.8 (21%;)

30.6 (5%;)

149.0 (22%;)

299.7 (5%;)

1473.5 (22%;)

2990.8 (5%;)

.99

3.2 (28%;)

6.4 (5%;)

22.8 (39%;)

57.1 (8%;)

216.4 (41%;)

556.2 (8%;)

Real data application

We reanalysed the microarray data set by Tian

To obtain balanced group sizes for the re-analysis we arbitrarily selected 36 patients from the bone lytic lesions group. The samples were arbitrarily allocated to the two stages and the pilot and the integrated approach were applied for the FNS and the FDRS procedure and different parameters: _{1} = {6, 12} (_{2} = 36 - _{1}), _{2} = {10, 50, 100, 200}, _{1} = {0.1, 0.2, 0.5, 0.8}, _{
s
} = 10. In the first stage for all procedures a two-sided

Table
_{2} or _{1}, respectively. Only for small _{1} and _{1} = 6 the integrated and the pilot approach of the FDRS procedure reject approximately the same number of hypotheses. Note that no hypothesis was significant at the final test decision which was not considered in the second stage. Setting _{
s
} = 0 the results for the integrated FDRS procedure did not change.

**FNS**

**FDRS**

**
n
**

**
m
**

**Rejections**

**
α
**

**Rejections**

**m**
_{
2
}
^{
FDRS
}

Number of rejected hypotheses for the real data application for the integrated and the pilot design (the latter given in parentheses). In addition the number of hypotheses selected for the second stage for the FDRS design,

6 / 30

10

6 (1)

0.1

0 (0)

1

50

15 (10)

0.2

1 (1)

2

100

30 (12)

0.5

28 (21)

85

200

68 (30)

0.8

345 (132)

2291

12 / 24

10

8 (4)

0.1

3 (3)

3

50

33 (8)

0.2

51 (38)

84

100

60 (17)

0.5

398 (150)

1745

200

109 (37)

0.8

573 (99)

5887

Discussion and conclusion

In this paper we discussed several selection rules for two-stage designs, where after an interim analysis only promising hypotheses are considered in the second stage.

For the choice of the selection rule, different criteria may apply. With the FNS design, the total number of observations is known in advance, which facilitates the planning of resources. However, this design does not adapt to the number of hypotheses that show an effect in the interim analysis. The latter can be achieved with the FDRS design, where, on the other hand, the total number of observations is random and the planning of resources becomes more difficult. As an extension one can consider an FDRS design where the overall number of observations (across all hypotheses and both stages) is fixed and the observations allocated to the second stage are equally distributed among the selected hypotheses. This comes at the cost of a decreasing per hypothesis power if for a larger number of hypotheses the alternative holds.

For the FNS design the testing procedures provided a sound control in the considered scenarios where more than 5 hypotheses are selected for the second stage for independent as well as for correlated data. Also for the modified FDRS procedure FDR control is given in all scenarios for _{
s
} > 5. Comparing the integrated approaches for both selection rules with the corresponding pilot approaches showed an advantage of the integrated approach in many scenarios. This holds particularly for the FNS design but in many scenarios also for the FDRS design. The advantage of the integrated design increases with the proportion of observations allocated to the first stage. This is in line with earlier findings

On the other hand, using only the second-stage data for testing has the advantage of increased flexibility and simplicity. For example, the pilot FNS procedure controls the FDR even if the hypotheses for the second stage are selected in an arbitrary way. Furthermore, standard (non-sequential) tests can be applied and FDR control can be shown analytically under suitable assumptions.

In the simulations the BH-procedure was applied to the sequential _{0} < 1 as it controls the FDR actually at level _{0}
_{0} (see Additional file

**Results of a simulation study for two-stage designs where an adaptive test procedures is applied based on an estimator for the proportion of true null hypotheses.**

Click here for file

It is well known that two-stage designs may lead to a considerable improvement in efficiency compared to single-stage designs

**Two single-stage designs are compared to the results: For the first single-stage design the sample size for each hypothesis is n _{1}, for the second design the sample size is n_{1} + n_{2}.** For the first design we compare the gain in power of the integrated design and for the second design the attention lies on the reduction in costs.

Click here for file

Appendix

Asymptotic considerations

In this section we argue that asymptotically, for increasing number of hypotheses, the FNS and the FDRS selection rule are equivalent to a selection rule where hypotheses are selected based on a fixed threshold _{1}. Let
_{0} and _{1} denote the number of true null and alternative hypotheses, respectively. Consider the following assumptions:

1. The empirical distribution functions of the first-stage _{1} is a continuous strictly increasing function.

2.

For the FNS procedure assume that
_{0}
_{0})_{1}(_{1} is the _{2}/_{2}/_{2}/_{2}/^{2})

For the FDRS procedure _{1} = _{2}/_{0} < 1, it follows as in

Computation of the two-sided sequential

If the hypothesis _{
i
}, _{1}), the two-sided sequential

with
_{1}/2)-quantile of the standard normal distribution, respectively. If the first-stage _{1},

Competing interests

Both authors have no competing interests.

Author’s contributions

Both authors contributed equally to the development of the methods, the design of the simulations, and to writing the paper. SZ conducted the simulations and data analyses. All authors read and approved the final manuscript.

Authors’ information

The views expressed are those of the author (MP) and should not be understood or quoted as being made on behalf of the European Medicines Agency or its scientific Committees.

Acknowledgements

We would like to thank the two Reviewers for helpful suggestions and Julia Saperia for many helpful comments.

This work was supported by the Austrian Science Fund FWF (grant numbers T 401-B12 and P23167).