TNO Earth, Environmental and Life Sciences, PO Box 360, 3700 AJ Zeist, Netherlands

Department of Environment, Technology and Technology Management, University of Antwerp, Prinsstraat 13, 2000 Antwerp, Belgium

Abstract

Background

The False Discovery Rate (FDR) controls the expected number of false positives among the positive test results. It is not straightforward how to conduct a FDR controlling procedure in experiments with a factorial structure, while at the same time there are between-subjects and within-subjects factors. This is because there are

Findings

We propose a procedure resulting in a single

Conclusions

The proposed procedure is very easy to apply and is recommended for all designs with factors applied at different levels of the randomization, such as cross-over designs with added between-subjects factors.

Trial registration

Findings

The control of false positive test results has enjoined considerable attention in the statistical literature. For an overview of methods in case there are many comparisons among treatments, we refer to

Motivating example

Recently, a study involving human volunteers was conducted at TNO (Zeist, the Netherlands). The study has been carried out in compliance with the Helsinki Declaration, it has been approved by METOPP, Tilburg, the Netherlands, which is an independent centralized ethics committee, and it has been registered at Clinicaltrials.gov, number NCT00959790. The subjects were healthy, non-smoking males aged 18–45 years. All study participants signed an informed consent form. Subjects received financial compensation for their participation.

In the study, subjects from two body mass index (BMI) categories were recruited. Here, we work with the results of 14 obese subjects and 14 lean subjects. The BMI categories define a between-subjects factor at two levels.

Each of the subjects participated in the study during two consecutive periods. Two different diets were given to each subject, one in each period, according to a cross-over design. The diet defines a within-subjects factor, and its effect is to be evaluated against a random error within subjects.

On the last day of each period, subjects completed a physical exercise test. At three time points, blood samples were taken. This defines a within-period factor ‘time’, which is a repeated measurement factor.

Levels of 21 oxylipids were determined in the blood samples; the 168 samples were processed in a completely randomized order.

Statistical model

The data were studied using the following statistical model.

with

In formula (1), _{
p
q
r
} is the level of an oxylipid from subject _{
p
q
r
} and random contributions modeled with the terms _{2p
}, _{1p
q
}, and _{0p
q
r
}.

The expected value of the measurement _{
i
j
k
} is detailed in formula (2). We make a distinction between parameters, which are to be estimated from the data, and experimental variables, which indicate the BMI group, the diet, and the time point relevant to the observation. There are 11 parameters, given in Greek alphabet, and four experimental variables, given in Latin alphabet. First, the average difference between the lean and obese groups is modeled with parameter _{
p
}. This variable takes the value 1 if subject

The average difference between diet 1 and diet 2 is modeled with parameter _{
p
q
}. This variable takes the value 1 if subject

Next, the parameter

The parameters that model the average change over time are _{1} and _{2}, respectively (_{0} is taken to be zero). The corresponding experimental variables are _{1} and _{2}. The first of these takes the value of 1 at time point 1 and 0 otherwise; the second experimental variable takes the value of 1 at time point 2 and 0 otherwise. So the time changes are modeled relative to time point 0.

Further, the parameters _{
r
}, _{
r
} and _{
r
} model the interaction between BMI group and time, the interaction between diet and time and the three-factor interaction between BMI group, diet and time, respectively.

The three random terms in formula (1) model the random error between subjects, the random error within subjects and the random error within periods, respectively. We assume that the three random terms are independent of each other and normally distributed with variances

The subjects can be considered as random samples from two specific populations. Therefore, the 28 _{2p
} are independent and we can validly carry out an

Further, the subjects were randomly allocated to a treatment order. Therefore, the 28 differences _{1p1}−_{1p2} are independent and we can validly carry out

There could not be a random allocation of the time points to the blood samples. For this reason, the correlations between _{
p
q0} and _{
p
q1}, between _{
p
q0} and _{
p
q2}, and between _{
p
q1} and _{
p
q2} might not be equal. This would invalidate the analysis of variance

Sometimes, other assumptions on the random terms are reasonable, which may lead to other denominators of the

Analysis of variance

An analysis of variance for one of the oxylipids, namely arachidonic acid, is given in Table

**Error stratum**

**Source of variation**

**df**

**MS**

**
F
**

**
P
**

NOTE: Greenhouse-Geisser **
ε
** = 0.8103.

Between subjects

BMI

1

5.4860

9.98

0.004

error

26

0.5501

Wi thin subjects

diet

1

0.0091

0.05

0.8277

BMI × diet

1

0.6465

3.43

0.0756

error

26

0.1887

Within periods

time

2

4.7359

80.77

<0.001

BMI × time

2

0.0508

0.88

0.3999

diet × time

2

0.0448

0.76

0.4453

BMI × diet × time

2

0.1538

2.62

0.08963

error

104

0.0586

The first two columns of the table lists the three error strata and the 10 sources of variation present in the data. An error stratum collects all effects that are tested against the same variance; see

All the effects that are measured by contrasting subjects are in the between-subjects stratum. The difference between the groups, which constitutes the BMI main effect modeled with

Each of the two diets was given to each of the subjects. For this reason, the main effect of diet (modeled with

Finally, the three time points at which blood samples were taken define a third factor, time, whose main effect (modeled with _{1} and _{2}) is to be tested against a random error within periods. The interactions between BMI category and time (modeled with _{1} and _{2}), and between diet and time (modeled with _{1} and _{2}) are also tested against this random error. The same is the case for the three-factor interaction (_{1} and _{2}). All these effects are in the within-periods stratum.

Further columns in the table give the degrees of freedom (df) for each source of variation, the corresponding mean square (MS), the value of the individual _{
i
j
}), and the _{
i
j
}). The index

The four

Under an individual false positive error rate of 0.05, the outcome for the main effects of BMI and time are highly significant. There is no evidence that the main effect of diet or any interaction effect is statistically significant.

FDR in factorial experiments with a single stratum

A factorial structure of the study design permits the evaluation of main effects and interactions. For two factors and _{1}, say, are removed from further consideration, and we are left with _{1} variables not having a proven interaction among the factors. We could then consider applying the FDR procedure on 2(_{1}) main effect tests. However, it is unclear what the performance criteria of the joint first and second step are.

To circumvent the above problem, we propose to replace the three tests with one omnibus

The proposed replacement of individual statistical tests can be carried out easily if all the comparisons between the experimental groups are tested against one and the same error. This is the case if there is just one error stratum, but also if there are several strata while the effect tests involve only one stratum. However, the proposed replacement is not straightforward to apply when effects are tested in several strata. For example, in the motivating study, the error used to test the contrast between lean and obese is different from the error used to test the contrast between the diets. This issue is discussed next.

FDR in factorial experiments with several strata

We propose calculating a combined

1. Denote the number of error strata with

2. Let _{
i
} be the number of _{
i
j
} denote the _{
i
} denote the degrees of freedom of the denominator, and let _{
i
j
} denote the degrees of freedom of the numerator. Calculate _{
i
} degrees of freedom for the denominator.

3. Suppose that the combined _{
i
}. So _{
i
}∼

4. Combine the

5. The overall

6. Apply an FDR control method to the list of overall

7. For variables selected in step (6), study all _{
i
j
} to see which factors or interactions contributed to the significance of _{
E
}.

The procedure to combine

Step 6 in our procedure results in a set of variables with an expected fraction of at most _{
i
j
} studied in Step 7. In this aspect, our procedure is analogous to Fisher’s protected least significance difference procedure

Application

We apply the proposed procedure to the arachidonic acid response of the motivating example. In the between-subjects stratum, there is nothing to combine, because there is just a single test carried out in this stratum. Recall that the

The two

For the within-periods stratum, we multiply the mean squares for time, BMI × time, diet × time, and the three-factor interaction BMI × diets × time with 2, add up and divide by 8. This results in a combined mean square of 1.2463. This mean square is tested against the error mean square, giving an ^{−16}.

Finally, the three ^{2}=87.945. The reference distribution for this statistic is the ^{−17}.

All the overall

** P**-values for 21 oxylipids

**-values for 21 oxylipids.** Each circle represents an overal

Rejections for two FDR procedures

**Rejections for two FDR procedures.**P-values below lower line: rejected by the Benjamini-Hochberg procedure

Some authors would favor error-control methods that are more conservative than FDR. For example, the well-known Bonferroni correction would compare all 21 combined

We like to point out that both FDR controlling procedures are sensitive to strong negative correlations between the

As a final issue, we had an equal interest in all the oxylipids and all the model parameters. In case of variables or parameters of primary interest, one option is to include only these variables or parameters. This will make the procedure more powerful, because non-significant values of the

Availability of supporting data

The data set supporting the results of this article is included within the article and its additional file called _{
i
j
} values arranged in seven rows and 21 columns. The columns correspond to the oxylipids and the rows correspond to the seven statistical tests for each individual oxylipid. Next, the 21 values for the Greenhouse-Geisser epsilon statistic are given. Then we give the _{
E
}, as calculated in step 4 of the proposed procedure, and the corresponding overall

Abbreviations

BMI: Body mass index; FDR: False discovery rate.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

EDS formulated the proposed procedure and conducted a detailed analysis of the arachidonic acid response. CR wrote computer code to apply the proposed procedure. SW and MvE designed and conducted the oxylipid study. All authors read and approved the final manuscript.

Acknowledgements

We are grateful to two anonymous referees, whose comments prompted us to be more explicit in our statistical analysis.