Abstract
Background
In a timecourse microarray experiment, the expression level for each gene is observed across a number of timepoints in order to characterize the temporal trajectories of the geneexpression profiles. For many of these experiments, the scientific aim is the identification of genes for which the trajectories depend on an experimental or phenotypic factor. There is an extensive recent body of literature on statistical methodology for addressing this analytical problem. Most of the existing methods are based on estimating the timecourse trajectories using parametric or nonparametric mean regression methods. The sensitivity of these regression methods to outliers, an issue that is well documented in the statistical literature, should be of concern when analyzing microarray data.
Results
In this paper, we propose a robust testing method for identifying genes whose expression time profiles depend on a factor. Furthermore, we propose a multiple testing procedure to adjust for multiplicity.
Conclusions
Through an extensive simulation study, we will illustrate the performance of our method. Finally, we will report the results from applying our method to a case study and discussing potential extensions.
Background
The objective of a timecourse microarray experiment is to study the temporal dynamics of the expression profile for each gene. For many of these experiments, the primary objective is to identify genes for which these temporal profiles depend on a phenotypic, experimental or environmental factor. We mention three examples next. Wang and Kim [1] identify genes in Caenorhabditis elegans for which the expression level depends on the dauer state. Graham et al. [2] obtain RNA expressions from kidney tissue from patients ranging between 27 and 92 years old, to identify genes whose expression profiles are age dependent while adjusting for other phenotypic factors. Sekiguchi et al. [3] study the mRNA expression profiles of peripheral blood cells in patients with rheumatoid arthritis receiving a TNFα inhibitor drug. Henceforth, for notational brevity we refer to these temporal expression profiles as gene timetrajectories or simply timetrajectories whenever understood from the context.
There is an extensive recent body of literature on statistical methodology for identifying genes whose timetrajectories depend on a factor. We provide a brief summary of representative works. Park et al. [4] propose a permutationbased twoway ANOVA model. Luna and Li [5] propose a statistical framework based on a shapeinvariant additive error model utilizing periodically expressed guide genes. Storey et al. [6] estimate gene expression timetrajectories using splines and then approximate the null sampling distribution of the goodness of fit statistic using a bootstrap method. Sohn et al. [7] extend this method to carry out the inference using permutation resampling. Hong and Li [8] discuss a functional hierarchical model for detecting temporally differentially expressed genes between two experimental conditions for cross sectional designs, where the gene expression profiles are treated as functional data and modelled by basis function expansions. Finally, Angelini et al. [9] use a Bayesian hierarchical model along with Bayes factors for the inference.
One common concern with using the methods described in the above papers, which are based on estimating mean functions, is their sensitivity to outliers which is a common issue in most microarray, including timecourse, experiments. Such outliers issue can be common in timecourse microarray experiments. For example, our recent research [7] identified genes with potential outlier presence in the Caenorhabditis elegans dauer developmental data [1]. Figure 1 shows the observed and the fitted trajectory based on a natural spline basis of dimension four for nine genes with potential outlier presence. In this paper, we propose a robust testing method for identifying genes for which the geneexpression timetrajectories are different over time among K ≥ 2 groups. The timetrajectories will be estimated using a quantile regression method. The discrepancy between the timetrajectories under the null hypothesis, where the timetrajectories for all K groups coincide, and the alternative, where the timetrajectory for some of the groups differ from the others, is quantified using an Ftype goodness of fit statistic. Given that we are testing for a large number of genes, we will also propose a permutationbased multiple testing procedure to accurately control the familywise error rate (FWER).
Figure 1. The observations from control and experimental arms are represented by 'x' and 'o' respectively. The fitted trajectory based on a natural spline basis of dimension four is superimposed for each group (blue for control group and red for experimental group).
The remainder of this paper is organized as follows. In the next section, we provide a technical overview of the proposed method. Thereafter, we evaluate the procedure empirically using an extensive simulation study and by applying it to a published case study. We finalize the paper by a discussion of the empirical results and by considering extensions.
Methods
We propose a robust method for identifying among m genes those whose expression timetrajectories depend on an experimental or phenotypic factor with say K ≥ 2 groups or levels. We will assume that the expressions are observed at L distinct timepoints, say 0 ≤ t_{1 }< ... <t_{L }and denote the number of observations for level k ∈ {1, ..., K} of the factor at time l ∈ {1, ..., L} by n_{kl}. These timepoints are assumed to be in common among the genes. Furthermore, we will assume that at least one observation is observed for each group at each timepoint (i.e., n_{kl }≥ 1). Let (y_{kli1}, ..., y_{klim}) denote the expression measurements for m genes at time t_{l}(l = 1, ...,L) from subject i(= 1, ..., n_{kl}) belonging to group k(= 1, ...,K). For a given gene j, we will assume that the expression profile for group k will follow a distribution whose conditional median at time t is g_{kj}(t). For each gene j, following the discussions in [10] and [11], we will estimate each of the K conditional median functions by considering a class of quantile smoothing splines as solutions to
Where . Here denotes the second derivative if g_{kj }evaluated at t > 0. The parameter λ is the smoothing parameter. Here denotes a class of "smooth" functions. Each function g_{kj }is estimated based on the expressions belonging to group k for gene j. For gene j the hypotheses of interest are formulated as testing
Versus
Note that under the null, the function g will no longer depend on k. As such, under the null we will consider estimating a single timetrajectory for each gene as a solution to
The function g_{j }is estimated based on all observations of gene j regardless of group membership by pooling within each timepoint. Following [11] (appendix A.9), we will employ a prespecified pdimensional linear spline basis, say W(t_{l}) = [W_{1}(t_{l}), ..., Wp(t_{l})], common to all m genes. The unknown parameters for gene j are denoted by β_{kj }= [β_{0,kj}, β_{1,kj}, ..., β_{p,kj }]. As discussed in [11] (see chapters 6 and 7) the estimation can be carried out efficiently using linear programming methods. Additional technical details for this optimization problem, including details about the family , are found in section 7.2.1 of [11]. The corresponding residual error sum for group k is then given by
where are the corresponding estimators of the parameters. Under the null, the corresponding residual error is given by
We reject H_{j }in favor of for large values of the Ftype statistic
We note that the principal objective is not to test the marginal hypotheses H_{j }versus . Rather, we primarily aim to test the global null hypothesis versus . To this end we will generate the null joint distribution of the test statistic using a permutation resampling method. Under the ℍ_{0 }, for all the genes, the observations within each timepoints are exchangeable. As such, a permutation sample under the null can be obtained by permuting the observation within each timepoint. Let n_{.l }denote the number of observations at time l. The number of all possible permutations
is all but small samplesize cases prohibitively large. Therefore, we will approximate the exact sampling distribution using B random permutations. The FWER is defined by the probability of rejecting any null hypothesis H_{j }under ℍ_{0 }. We will employ a singlestep multiple testing procedure controlling the FWER as described in [12] and [13]. The algorithm is summarized as follows.
1. Compute the the Ftest statistics ) from the observed (nonpermuted) data.
2. From the bth (b = 1, ..., B) permutation sample compute a permutation replicate .
3. Singlestep procedure to control the FWER:
(a) From the bth permutation data, calculate .
(b) For gene j, calculate the adjusted pvalue as .
(c) For a specified FWER level α, consider gene j significant if .
Results
We investigate the performance of our method by conducting an extensive simulation study. This will be followed by a discussion of the application of our method to the Caenorhabditis elegans dauer developmental data discussed in [1]. These discussions will be limited to the twosample case (i.e., K = 2). For notation brevity, we will refer to genes whose timetrajectory depends on the factor as prognostic and nonprognostic otherwise. Similarly, we will refer to the genes whose corresponding FWER Pvalue is less than the given nominal level as significant or nonsignificant otherwise. For all of these illustrations, as in [6], we will employ a Bspline basis with knots placed at the 0, 1/(p  1),2/(p  1), ..., (p  2)/(p  1), 1 quantiles of the observed time points pooled across both samples. For these illustration we set p = 4. Our method is developed within the framework of regression for estimation, and permutation resampling for the inference. For additional notational brevity we will adopt the acronym PERM to refer to our method. To put the discussions in comparative perspective, alongside our results, we will provide those obtained by the permutation method of [7] and the bootstrap method by [6]. Given that these are regression methods, we will denote them by PERM and BOOT respectively. All of these analyses are carried out using the R statistical environment [14]. The function rq from the quantreg package [11] is used to estimate the quantile regression functions.
Simulation Study
For the simulation study, the expressions will be generated from an outlier contaminated additive error model of the form
The first term, μ_{kj}(t_{l}), denotes the timetrajectory function at time t_{l }for group k. For nonprognostic genes, we will set μ_{1j}(t) = μ_{2j}(t) = 0 while for prognostic genes, we will specify μ_{1j}(t) = 0 and μ_{2j}(t) = 1.5e^{t }respectively. The error terms are mutually independent and identically distributed according to a standard normal law. The second term in the model, a_{klij }, represents the random outlier which will assume a value of 4, 0, or 4 with probabilities π/2, 1  π and π/2 respectively. In the case of a normal law, the mean and median coincide. As such, in the absence of the outlier effect (i.e., a_{klij }= 0 almost surely, or equivalent π = 0) the quantile function g_{kj}(t) = μ_{kj}(t) for all t > 0.
For these simulations, we will adopt a design similar to the Caenorhabditis elegans dauer developmental data, by choosing 11 time points t = 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, and 12. We will generate four replicates at each timepoint from each group (i.e., n_{1l}= n_{2l }= 4 for all l).
To evaluate the FWER, we generate m = 200 nonprognostic genes. A block exchangeable correlation structure with correlation coefficient ρ(= 0, 0.3 or 0.6) and block sizes of 10, is imposed on the measurement errors. The null distribution of the test statistic is approximated from B = 200 resampling (permutation or bootstrap) replicates. Empirical FWER is computed as the proportion of samples rejecting ℍ_{0 }by our testing procedure at a twosided FWER level of 0.05 among N = 200 simulations. Simulation results are reported in Table 1.
Table 1. Illustration of the empirical FWER based on a nominal twosided 0.05 level, with m = 200 genes, N = 200 simulation samples and B = 200 resampling replications per sample. The outlier law realizes values 4,0 or 4 with probabilities π/2, 1  π and π/2 respectively. Under simulation model 1, the outlier effect is added to all timepoints. Under simulation model 2, the outlier effect is only added to the first and last timepoints.
As shown in Table 1, under the simulation model 1, where the outlier effects are identically distributed over the timepoints, all three methods control the type I error rate. Under simulation model 2, where only the first and last timepoints are contaminated by the outlier effect, the BOOT method however fails to control the type I error rate. In this case, the type I error rate based on the BOOT method is seemingly inflated by a factor of three when π = 0.05 or by a factor of six when π = 0.1. This can be explained by the fact that the parametric regression bootstrap is based on the assumption of homoscedasticity of the error terms.
Under simulation model 1, the error terms as the sum of the outlier and measurement error components, although no longer normally distributed, are identically distributed within and among all timepoints and groups. Under simulation model 2, the error terms are no longer homoscedastic. As such, it is not surprising that the BOOT method may not be adequate in this situation.
To investigate the global power (i.e., the probability of rejecting at least one null hypothesis) of this procedure, we generate 10 prognostic genes and 190 nonprognostic genes. A correlation structure similar to that of the FWER case is specified. The corresponding results are reported in Table 2.
As illustrated in Table 2, in the absence of an outlier effect, the power for our method is slightly lower than the other two methods. As this effect becomes more pronounced, our method gains an advantage with respect to power. A similar trend is observed under simulation model 2. One should note that the power listed for BOOT under this scenario is not the power at the twosided FWER level of 0.05, but rather the power at the inflated type I error rate as observed in Table 1. As such, it is erroneous to conclude that that the BOOT method is more powerful.
Table 2. Illustration of the empirical Power based on a nominal twosided 0.05 FWER level, with m = 200 genes, N = 200 simulation samples and B = 200 resampling replications per sample. The outlier law realizes values 4,0 or 4 with probabilities π/2, 1  π and π/2 respectively. Under simulation model 1, the outlier effect is added to all timepoints. Under simulation model 2, the outlier effect is only added to the first and last timepoints.
we evaluated the empirical power and FWER under the simulation model 1 and 2 for π = 2. The result of π = 2 have a similar trend the results of π = 0.05 and π = 0.1 under simulation model 1 and 2. We also evaluated the different proportions for 20 prognostic genes (10%) and 180 nonprognostic genes (90%) and 5 prognostic genes (2.5%) and 195 nonprognostic genes (97.5%). These results have a similar trend the result of 10 prognostic genes (5%) and 190 nonprognostic genes (95%).
Case Study
In this section, we will summarize the results from applying our proposed method to the Caenorhabditis elegans dauer data. Wang and Kim [1] use cDNA microarrays to profile gene expression timetrajectories during the transition from the dauer state to the nondauer state (experimental group) and after feeding of starved first laval stage worms (control group). The cDNA microarray expressions were measured on m = 18, 556 genes. For the experimental group, the worms are harvested at 0, 1.5, 2, 3, 4, 5, 6, 7, 8, 10, and 12 hours after feeding with three to four replicates at each timepoint. For the control group, are harvested 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, and 12 hours after feeding withe four replicates at each timepoint. For this illustration, we have set the t = 1.5 timepoint in the experimental group to t = 1. This data set is available for download from http://cmgm.stanford.edu/~kimlab/dauer/ webcite.
For the case study based on the Wang and Kim [1] data, the number of significant genes, identified at a given FWER ∈ (0, 0.2] based on each of the three methods, is shown in Figure 2. We note that for smaller FWER levels (less than 0.05), the BOOT method identifies the largest while the PERM method identifies the smallest number of significant genes. This is reversed for larger FWER levels. The number of significant genes identified by the PERM method is consistently between the numbers identified by the other two methods. The set of significant genes identified by PERM is, however, not a proper subset of genes identified by methods. In other words, the PERM method is identifying potentially novel genes missed by the other two methods.
Figure 2. The number of significant genes at a given FWER level (from 0 to 0.2) based on the PERM, BOOT and PERM adjusted Pvalues.
We provide a Venn diagram on the set of of genes identified as significant at the 0.05 FWER level in Figure 3. There are 1974 significant genes in common among the three methods. The PERM identifies 168 genes not identified by the other two methods. We rank these 168 genes according to the difference of their respective Pvalues from the PERM and PERM methods and show the top nine genes according to this ranking in Figure 4. As illustrated in the simulation study, the mean model by regression is not robust to the outliers. There are 379 genes identified as significant by the BOOT method only. The top nine genes, based on the magnitude of the differences of the corresponding 379 BOOT and PERM FWER Pvalues, are shown in Figure 5. As illustrated in the simulation study, BOOT may be severely anticonservative if the error terms are heterogeneous over time [7]. The supplementary material provides the biological properties of 40 genes (out of 168) that are identified only by PERM [see Additional file 1].
Figure 3. Venn diagrams for the number of significant genes at the 0.05 FWER level based on the PERM, BOOT and PERM adjusted Pvalues.
Figure 4. Observed expressions and estimated expression trajectories for the top four genes, from the [1]data, among the 168 genes identified by PERM but neither by PERM nor by BOOT, at the 0.05 FWER level, are shown here. For each of these 168 genes, we calculate the absolute difference between the PERM and PERM FWER Pvalues. The top nine genes ranked according to the magnitude of these differences are selected. The observations from the control and experimental arms are represented by 'x' (blue) and 'o' (red) respectively while the groupwise quantile and mean regression curves are denoted by dashed and solid lines respectively. The adjusted Pvalue by PERM is provided for each gene.
Figure 5. Observed expressions and estimated expression trajectories for the top four genes, from the [1]data, among the 379 genes identified by BOOT but neither by PERM nor by PERM, at the 0.05 FWER level, are shown here. For each of these 379 genes, we calculate the absolute difference between the PERM and BOOT FWER Pvalues. The top nine genes ranked according to the magnitude of these differences are selected. The observations from the control and experimental arms are represented by 'x' (blue) and 'o' (red) respectively while the groupwise quantile and mean regression curves are denoted by dashed and solid lines respectively. The adjusted Pvalue by PERM is provided for each gene.
Additional file 1. Properties of 40 genes that are discovered only by PERM. The data provided the biological properties of 40 genes that are discovered only by PERM.
Format: XLS Size: 1.4MB Download file
This file can be viewed with: Microsoft Excel Viewer
Discussion
In these discussions, we have assumed that any difference, including vertical shifts, among the time trajectories, are biologically relevant and of interest. In some applications, one may want to ignore vertical shifts, as these may be often caused by batch effects, and primarily focus on genes for which there are actual differences among the time trends. The procedures we have discussed can be easily modified to accommodate this situation. To this end, for gene , the K sample means for each group are computed. The algorithm is then applied to the centered versions of the observed expressions . Although the illustration presented in this paper have been limited to the twosample case (K = 2), as shown in the methods section, the method can be extended to the case where K > 2. The method can be easily extended to account for multiplicity by controlling a falsediscovery rate (FDR). The unadjusted permutation Pvalue for gene j, based on the notation in the algorithm presented in the methods section, is . FDR adjusted Pvalues can then be computed based on these unadjusted Pvalues. We finally note that the method can also be applied in the onesample cases. In this setting, one is interested in identifying genes whose timetrajectories are timedependent. The marginal hypotheses are formulated as testing H_{j }: g_{j}(t) = c_{j }for all t > 0 versus H_{j }: g_{j}(t) ≠ c_{j }for some t > 0 for a constant say c_{j }. As under the null, all of the expressions are exchangeable, the null sampling distribution is generated by permuting all observed expressions for a given gene. The corresponding null residual error is obtained as where is the sample mean for the n expressions observed for gene j. For many regressions problems, the target function to be estimated is the mean of the distribution of the outcome conditional on a set of covariables. In a timecourse microarray experiment, this would correspond to the mean of the expression profile over time. In this paper, we have proposed to estimate the conditional quantile, rather than the conditional mean, of the distribution of the outcome variable a as function of time. Specifically, we use the special case of the median. Consider the standard additive mean regression problem of the form Y_{i }= g(t_{i}) + ϵ_{i}, where g(t) is the conditional mean of Y at time t and ϵ is a meanzero error term. One criterion that is often used to find an estimate of g is to minimize Σ_{i}(Y_{i } g(t_{i}))^{2}. Restricting this optimization to the set of linear functions yields the standard leastsquares estimate. Optimizing over the set of all "smooth" functions yields an estimator that interpolates the observations. As a balancing act between these two extremes, one may consider optimizing the following criterion
where ∫(g''(t))^{2}dt is a so called penalty term. The amount of smoothing is determined by the parameter λ ∈ (0, ∞). The estimation procedure used in this paper is based on a similar regularization approach where the terms (y_{i } g(t_{i}))^{2 }are replaced by ρ(y_{i } g(t_{i})) and the penalty term ∫(g''(t))^{2}dt is replaced ∫g''(t)dt.
Conclusion
We proposed a robust method for identifying genes whose time trajectories depend on a phenotypic or experimental factor. Furthermore, we proposed a multiple testing procedure to adjust for multiplicity. Our method is based on regression type estimator. Through an extensive simulation study, we observed our method accurately control the FWER and the mean model by regression is not robust to the outliers.
Authors' contributions
IS proposed the research project. IS and KO performed statistical analysis and wrote the manuscript. SLG and SHJ contributd to the research and critically revised the manuscript. SK conducted the biological interpretation of the statistical analysis results. All authors read and approved the final manuscript.
Acknowledgements
The authors thank reviewers and the associated editor for insightful comments, which substantially improved the paper. Partial support for this research was provided by grant from the National Cancer Institute, CA142538 [KO, SHJ, SLG].
References

Wang J, Kim S: Global analysis of dauer gene expression in Caenorhabditis elegans.
Development 2003, 130:16211634. PubMed Abstract  Publisher Full Text

Rodwell GE, Sonu R, Zahn JM, Lund J, Wilhelmy J, Wang L, Xiao W, Mindrinos M, Crane E, Segal E, Meyers B, Brookes J, Davis R, Higgins J, Owen A, Kim S: A Transcriptional Profile of Aging in the Human Kidney.
PLoS Biol 2004, 2(12):e427. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sekiguchi N, Kawauchi S, Furuya T, Inaba N, Matsuda K, Ando S, Ogasawara M, Aburatani H, Kameda H, Amano K, Abe T, Ito S, Takeuchi T: Messenger ribonucleic acid expression profile in peripheral blood cells from RA patients following treatment with an antiTNFα monoclonal antibody, infliximab.
Rheumatology 2008, 47:780788. PubMed Abstract  Publisher Full Text

Park T, Yi S, Lee S, Lee SY, Yoo DH, Ahn JI, Lee YS: Statistical tests for identifying differentially expressed genes in timecourse microarray experiments.
Bioinformatics 2003, 19:694703. PubMed Abstract  Publisher Full Text

Luan Y, Li H: Modebased methods for identifying periodically regulated genes based on time course microarray gene expression data.
Bioinformatics 2004, 20:332339. PubMed Abstract  Publisher Full Text

Storey JD, Xiao W, LeeK JT, Tompkins RG, Davis RW: Significance analysis of time course microarray experiments.
PNAS 2005, 102:1283712842. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sohn I, Owzar K, George SL, Kim S, Jung SH: A Permutation Based Multiple Testing method for Time Course Microarray Experiment.
BMC bioinformatics 2009, 10:336. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Hong F, Li H: Functional Hierarchical Models for Identifying Genes with Different TimeCourse Expression Profiles.
Biometrics 2006, 62:534544. PubMed Abstract  Publisher Full Text

Angelini C, De CD, Mutarelli M: A Bayesian Approach to Estimation and Testing in Timecourse Microarray Experiments.
Statistical Applications in Genetics and Molecular Biology 2007, 6(1):24. Publisher Full Text

Koenker R, Ng P, Portnoy S: Quantile smoothing splines.
Biometrika 1994, 81(4):673680. Publisher Full Text

Westfall PH, Young SS: Resamplingbased Multiple Testing: Examples and Methods for Pvalue Adjustment. Wiley: New York; 1993.

Jung SH, Bang H, Young S: Sample size calculation for multiple testing in microarray data analysis.
Biostatistics 2005, 6:157169. PubMed Abstract  Publisher Full Text

R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.
2008.
ISBN 3900051070