For gene expression data obtained from a time-course microarray experiment, Liu et al.  developed a new algorithm for clustering genes with similar expression profiles over time. Performance of their proposal was compared with three other methods including the order-restricted inference based methodology of Peddada et al. [2,3]. In this note we point out several inaccuracies in Liu et al.  and conclude that the order-restricted inference based methodology of Peddada et al. (programmed in the software ORIOGEN) indeed operates at the desired nominal Type 1 error level, an important feature of a statistical decision rule, while being computationally substantially faster than indicated by Liu et al. .
Application of ORIOGEN to the well-known breast cancer cell line data of Lobenhofer et al.  revealed that ORIOGEN software took only 21 minutes to run (using 100,000 bootstraps with p = 0.0025), substantially faster than the 72 hours found by Liu et al.  using Matlab. Also, based on a data simulated according to the model and parameters of simulation 1 (σ2 = 1, M = 5) in  we found that ORIOGEN took less than 30 seconds to run in stark contrast to Liu et al. who reported that their implementation of the same algorithm in R took 2979.29 seconds. Furthermore, for the simulation studies reported in , unlike the claims made by Liu et al. , ORIOGEN always maintained the desired false positive rate. According to Figure three in Liu et al.  their algorithm had a false positive rate ranging approximately from 0.20 to 0.70 for the scenarios that they simulated.
Our comparisons of run times indicate that the implementations of ORIOGEN's algorithm in Matlab and R by Liu et al.  is inefficient compared to the publicly available JAVA implementation. Our results on the false positive rate of ORIOGEN suggest some error in Figure three of Liu et al. , perhaps due to a programming error.
A short-series time-course microarray experiment induces a natural constraint on the mean expression of a gene or a probe over time. Thus one may expect a systematic pattern to the mean expression of a gene as long as the time points are not too far apart to lose the biological relevance of a time-course experiment. For example, for some genes the mean expression may monotonically increase (or decrease) over time, whereas some others may display an (inverted) umbrella shaped pattern, etc. Although, one may consider using a parametric model to describe the pattern of expression across time points, a simple nonparametric approach can be used to express the pattern of expression across time using mathematical inequalities (known as order restrictions). This strategy was first exploited in  and subsequently software called ORIOGEN (Ordered Restricted Inference for Ordered Gene ExpressioN) was developed (). It has been publicly available at ORIOGEN - Order Restricted Inference for Ordered Gene ExpressioN http://www.niehs.nih.gov/research/resources/software/oriogen/index.cfm webcite since 2005, and an upgraded version (2.2.1) has been available since February 1, 2007.
Liu et al.  introduce an interesting alternative methodology for clustering genes using order-restricted inference methodology. Their strategy differs from that of [2,3] in that they avoid the bootstrap computation of p-value for identifying significant genes. Instead, they use an information theoretic model selection criterion to assign genes to temporal patterns (clusters). Once genes are clustered, they evaluate the reliability of each cluster using a bootstrap algorithm along the lines of Kerr and Churchill . Liu et al.  make several statements about our methodology [2,3] that are erroneous and require clarification.
The methodology of Peddada et al. , implemented in ORIOGEN , controls the Type I error rate (false positive rate) at any pre-specified level. The methodology of Liu et al. , implemented in ORICC, does not control Type I error rate explicitly. It thereby risks attributing differential expression through time to an excessive proportion of genes whose expression does not truly change. Given the large multiple testing problem inherent in microarray analyses (e.g., the Human Affymetrix chip has ~45,000 probes), use of procedures that control Type I error, or a related quantity like the false discovery rate (FDR), is crucial.
Throughout their paper, Liu et al.  claim that Peddada et al. methodology is excessively computationally intensive. The many bootstrap samples needed for precise estimation of small p values is indeed computationally demanding; but the run times mentioned by Liu et al. for our methodology seemed extremely long in our experience. On page 2 they assert that their implementation of "Peddada's method" required 72 hours to analyze the breast cancer cell-line data of . They do not state how many bootstrap samples they used and what p-value they used in their analysis. Although they cite our publicly available software ORIOGEN, they appear to have written their own code rather than using ORIOGEN. They state that they implemented our methodology (calling it "Peddada's methodology") in Matlab (page 2) and again in R (page 5). Their performance estimates were not based on the ORIOGEN software that is freely available from our website. We were surprised that the authors made the effort to re-code our algorithm in two different languages when our software is available without charge.
To examine their claims about run times, we implemented ORIOGEN on the breast cancer cell-line data of , using 100,000 bootstraps, with a p-value of 0.0025. We found that ORIOGEN took only 21 minutes to run, not 72 hours as stated in  for their Matlab implementation. We also used ORIOGEN to analyze data simulated exactly as described on pages 5-6 of . The authors claim on page 7 that their implementation of "Peddada's method" in R took 2979.29 seconds to analyze the simulated data when σ2 = 1 and M = 5; however, for the same simulation conditions, ORIOGEN took about 30 seconds to run. For these analyses, we employed a Dell desktop PC with an Intel Xeon CPU 2.33 GHz with 3.00 GB of RAM. ORIOGEN was developed by two professional computer programmers who tested it repeatedly before making it public. These exceptionally large discrepancies in run times between ORIOGEN and Liu et al.'s implementation of its algorithm lead us to conclude that either Liu et al. either misinterpreted details of our methodology or their coding of it is extremely inefficient.
False positive rates
In Figure three, Liu et al.  compare the false positive rate of our method with theirs. They claim to have run our method at a level of significance of 0.025 (for each simulated gene) using 200 bootstraps (page 6). In their simulation study they consider 200 "null" genes and 2000 "non-null" genes and define false positive rate to be the proportion of null genes that are declared significant. Based on their Figure three, they report that our method can have a false positive rate as high as 0.50 at the nominal rate of 0.025. This result is incorrect. We generated the 2200 genes according to the patterns described by the authors on page 9 which included 200 "null genes". Exactly as in , we implemented ORIOGEN with a level of significance of 0.025 and 200 bootstraps for the 6 patterns of σ2 and as many replicates as in . We found that ORIOGEN always performed at the desired nominal level of 0.025, as it was designed to do. It appears that Liu et al.  misinterpreted our methodology, made some programming errors in coding it, or miscalculated/misreported the false positive rates.
Most statisticians and bioinformaticians recognize that maintaining a pre-specified false positive rate is an important requirement for statistical testing procedures. In fact, scientists want to avoid reporting an excessive number of genes as differentially expressed when they are not. During the past decade, much research has been devoted to developing sound methods for controlling false discovery rates in gene expression microarray studies. In contrast to ORIOGEN, which maintains the nominal false positive rate and provides estimates of q-values often used to control false discovery, ORICC, the method proposed by Liu et al., does not control the false positive rate or the false discovery rate at a pre-specified level. Consequently, ORICC can sometimes have an unusually high false positive rate, as high as 0.70 according to Figure three of . In the bottom right panel of Figure three of , the lowest false positive rate that ORICC achieved for the simulated data is 0.20.
Other erroneous statements
(a) Liu et al. make an incorrect assertion regarding the universal domination property of order-restricted maximum likelihood estimator (MLE) (page 14 of ). They assert that the order-restricted MLE universally dominates the unrestricted MLE and wrongly attribute this theoretical property to Hwang and Peddada . Hwang and Peddada did not prove the result in the generality stated in ; they proved it only for independently normally distributed data when the means satisfy a monotone order. On the contrary, the main emphasis of  is to demonstrate that the order-restricted MLE may actually perform poorly under certain conditions. Hence in  Hwang and Peddada introduced an alternative to the order-restricted MLE for estimation of parameters satisfying constraints. ORIOGEN uses this new estimation procedure instead of the order-restricted MLE. When the order restriction is monotonic, the two procedures coincide.
(b) Liu et al. also state incorrectly on page 5 that "Peddada's method then carried out a bootstrap-based likelihood ratio test". Because ORIOGEN is not working with order-restricted MLE's to begin with, there is no explicit likelihood with which to construct a likelihood-based test. Consequently, its test statistic is more along the lines of a "Wald" type statistic and not a likelihood ratio. In the same sentence as stated above, on page 5, Liu et al. suggest that Peddada et al. use the bootstrap-likelihood ratio test to decide a gene's best matched profile. This statement is not correct. Peddada et al. used the bootstrap to select significant genes but assigned genes to profiles using a goodness-of-fit criterion.
A Simulation study
In their simulation study, Liu et al.  considered 200 "null" (or non-differentially expressed) genes and 2000 "non-null" (differentially expressed/true positive) genes. Thus at most 10% are non-differentially expressed whereas the overwhelming majority, about 90%, are differentially expressed. If this were the true nature of the data, then a biologist may want to skip any formal statistical methodology and take all 2200 genes - this selection rule will assure him/her 100% discovery of true genes at a small price of at most 10% false discovery rate (FDR). From our experience, it would be more realistic to expect that most genes in a microarray study would be non-differentially expressed.
In this simulation study we almost mimicked the simulation experiment of Liu et al.  with the major exception that we considered 12000 null (or non-differentially expressed) genes and 4000 non-null (true positives) genes. Thus, 25% are true positives and about 75% are true nulls.
We generated our data according to the model and the parameters used in Liu et al. except that we have more null genes than non-null genes as commonly observed in gene expression studies:
In the above model, for a gene g, g = 1,2,...,16000, is the observed expression of the jth replicate, j = 1,2,...,8, in the ith treatment group, i = 1,2,...,6, and is the true mean expression of gth gene in the ith treatment group. We considered 2 patterns of variance σ2(= 0.2,1.)' within each pattern of described below.
Pattern 1: (Null) μg =(0,0,0,0,0,0)' - 6000 samples corresponding to each variance pattern, and hence 12000 null genes.
Pattern 2: (Increasing) μ = (0,0.5,1,1.5,2,2.5)' - 200 samples corresponding to each variance pattern, and hence 400 increasing genes.
Pattern 3: (Decreasing) μ = (0,-0.5,-1,-1.5,-2,-2.5)' - 200 samples corresponding to each variance pattern, and hence 400 decreasing genes.
Pattern 4: (Umbrella Peak at 2) μ = (0,0.5,0,-0.5,-1,-1.5)' - 200 samples corresponding to each variance pattern, and hence 400 umbrella pattern genes.
Pattern 5: (Inverted Umbrella Min at 2) μ = (0,-0.5,0,0.5,1,1.5)' - 200 samples corresponding to each variance pattern, and hence 400 inverted umbrella pattern genes.
Pattern 6: (Umbrella Peak at 3) μ = (0,0.5,1,0.5,0,-0.5)' - 200 samples corresponding to each variance pattern, and hence 400 umbrella pattern genes.
Pattern 7: (Inverted Umbrella Min at 3) μ = (0,-0.5,-1,-0.5,0,0.5)' - 200 samples corresponding to each variance pattern, and hence 400 inverted umbrella pattern genes.
Pattern 8: (Umbrella Peak at 4) μ = (0,0.5,1,1.5,1,0.5)' - 200 samples corresponding to each variance pattern, and hence 400 umbrella pattern genes.
Pattern 9: (Inverted Umbrella Min at 4) μ = (0,-0.5,-1,-1.5,-1,-0.5)' - 200 samples corresponding to each variance pattern, and hence 400 inverted umbrella pattern genes.
Pattern 10: (Umbrella Peak at 5) μ = (0,0.5,1,1.5,2,1.5)' - 200 samples corresponding to each variance pattern, and hence 400 umbrella pattern genes.
Pattern 11: (Inverted Umbrella Min at 5) μ = (0,-0.5,-1,-1.5,-2,-1.5)' - 200 samples corresponding to each variance pattern, and hence 400 inverted umbrella pattern genes.
Thus the total number of genes considered in this simulation study is 16000 consisting of 12000 null and 4000 non-null.
We applied ORICC, by downloading the software from the website provided in , and ORIOGEN 2.2.1. We applied ORIOGEN using a p-value cut off (or level of significance) of 0.01 and the reclassification p-value of 0.90 for patterns. Since we are using a cut-off of 0.01, it is sufficient to run ORIOGEN using 10,000 bootstraps. ORICC does not have any such controls. Results of our simulation study are summarized in Table 1.
Table 1. Comparison of ORIOGEN and ORICC using a simulated data. (due to Peddada et al.)
As expected, ORIOGEN performed at the desired Type I error rate (0.008 ~0.01) and consequently provided a better control of FDR than ORICC did. ORICC had an unacceptably high Type I error and FDR (28% and 47%, respectively). Consequently, it is not surprising that ORICC had a higher power (almost 99%) and more correct cluster assignments than ORIOGEN. It is interesting to note, however, that among the correctly identified true positive genes, ORIOGEN did a better job of pattern assignment than ORICC (87.5% vs 84%). Also, the overall error rate (as defined in Liu et al. ) for ORICC was almost four times as large as ORIOGEN (26.5% versus 7.4%). We would like to point out we have performed several simulation studies assuming the majority of genes are null and considering various patterns of means and variances; we find qualitatively similar results as above for all of them. One main drawback with the simulation study reported in  is that it was too narrow. It did not span a sufficient range for the proportion of null genes in the underlying population and, thereby, led to an overly generous assessment of ORICC's total-error performance. The lower total error of ORICC compared to ORIOGEN as reported in  hinges on the proportion of nulls being small (in our view, unrealistically small) in the Liu et al. simulation scenarios.
In general we agree with the spirit of Liu et al.: there is a great opportunity to use order-restricted inference methodology for analyzing time-course and dose-response studies. It would certainly be important to improve existing methodology and to evaluate the reliability of cluster assignment of genes according to their time-course profile. Such research could prove useful for identifying genes that participate together in various biological processes. We however, conclude that the methodology proposed in  can potentially be subject to a very high (a) false positive rate, (b) false discovery rate and (c) overall error rate.
SDP, DMU and SFH contributed equally in writing of this manuscript. SDP and SFH performed the numerical computations. All authors read and approved the final manuscript
SDP and DMU are members of the Biostatistics Branch, National Institute of Environmental Health Sciences Research Triangle Park, NC 27709. They may be contacted at Peddada@niehs.nih.gov, Umbach@niehs.nih.gov, respectively.
SFH is a Software Engineer at SRA International Inc., Durham, NC 27713 and he may be contacted at Shawn_Harris@sra.com.
The research of SDP and DMU was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Z01 ES101744-04). The authors thank the editors for giving them the opportunity to respond to the article by Liu et al. .
Response to Peddada et al
Tianqing Liu, Nan Lin, Ningzhong Shi and Baoxue Zhang
Corresponding author: Baoxue Zhang
We are delighted to see a thorough correspondence to our paper from Dr. Peddada and his coauthors. In the correspondence, Peddada et al. made several comments about our paper on the comparison between their clustering approach, ORIOGEN and our ORICC algorithm in .
The first comment is that the computation time reported in about ORIOGEN is not accurate, and it could be we either misinterpreted details of ORIOGEN or our coding is extremely inefficient.
We first need to point out that, in our paper, we compare between ORICC and Peddada's method that refers to the algorithm in . ORIOGEN is based on the algorithm in but with some slight modification. Our experience, in both simulation and real data analysis, suggested that ORIOGEN and the original algorithm in give very similar clustering results. However, the early paper is written in more details and allows our own implementation of the algorithm in Matlab and R. We have carefully examined our coding and found no error.
We believe that the difference between our reported computation time and that in Peddada et al.'s correspondence is mainly due to the efficiency of different computer languages. This is because the bootstrap procedure in Peddada's method requires a very large number of iterations, and JAVA is much more efficient than Matlab or R in looping. A fair comparison of the computational efficiency between two methods should be made on the same platform. Therefore, we implemented both methods in R and reported the computation time in our paper. The computation time of Peddada's method reported on page 3 in is based on our early implementation of the method in Matlab and it used 100,000 bootstrap samples. We also tried ORIOGEN (with 100,000 bootstrap samples), implemented in JAVA by Peddada et al. , on our computer to analyze the breast cancer cell line data in , and the computation time was 2877 seconds, whereas it only took ORICC (implemented in R) 15.69 seconds. Our computer is a workstation with a 2.30 GHz AMD Athlon(tm) 64 × 2 Dual Core 4400+processor and a 2.00 GB memory. Note that this breast cancer cell line data has been processed and contains just about 1900 genes. Its size is relatively small compared to most current microarray studies. We further applied ORICC and ORIOGEN (with 100,000 bootstrap samples) to a simulated data set that contains 5500 genes. ORIOGEN was implemented using 100,000 bootstraps, with a p-value of 0.0025. The run time for ORICC and ORIOGEN is 74.03 seconds versus 21,104 seconds. We agree with Peddada et al. that the JAVA software ORIOGEN is a very efficient implementation of the algorithm. However, for most microarray studies nowadays involving more than 5,000 genes, without a super powerful computer, it can still take very long for the analyst to obtain the clustering result.
The second comment is that the false positive rates reported in Figure three in were incorrect. We thank Peddada et al. for carefully reading our paper and pointing out this mistake. In our paper, we mistakenly stated the p-value threshold used for Peddada's method. The threshold was 0.5 instead of 0.025. We repeated all simulations in our paper involving Peddada's method using a p-value threshold of 0.025 and the results are presented in Figures 1, 2, 3 and 4 at the end of this report. Figures 1, 2 and 3 are for Simulation 1 in and obtained by imposing the error rate for Peddada's method using threshold 0.025 on Figures 2, 3 and 4 in . Figure 4 is for Simulation 2 in and gives Rand's C statistics for the clusters given by Peddada's method using threshold 0.025. As pointed out by Peddada et al. in their correspondence, Peddada's method controls the false positive rate under the nominal level, i.e. the p-value threshold (See Figure 2). However, lower false positive rates are often at the price of increased false negative rates and also higher overall error rates (See Figures 1 and 3). Though a threshold of 0.5 seems unreasonable for p-values, it does offer an overall better clustering result than using 0.025 as the threshold. This is further confirmed by Rand's statistics in Figure 4. Except the comparison in false positive rates (Figure 2) is different from what we stated in our paper, other conclusions in our paper remain unchanged. It is worth noting that Peddada's method can achieve any false positive rate by using the corresponding p-value threshold.
Figure 1. Simulation 1: The overall error rate of Peddada's method and the one-stage ORICC algorithm. (due to Liu et al.)
Figure 2. Simulation 1: The false positive rate of Peddada's method and the one-stage ORICC algorithm. (due to Liu et al.)
Figure 3. Simulation 1: The false negative rate of Peddada's method and the one-stage ORICC algorithm. (due to Liu et al.)
Figure 4. Simulation 2: clustering precision of Peddada's method using a threshold of clustering 0.025.
In the situation of clustering microarray data, the clustering result often serves as a hypothesis generating tool, and gives the analysis a more exploratory flavor. Therefore, unlike in the classical hypothesis testing scenario, false negative and false positive are equally important in evaluating the clustering accuracy. Peddada et al. [8,9] treat the clustering problem more in a hypothesis testing way, therefore put more emphasis on controlling the false positive rate. Though our ORICC method is also based on order-restricted inference and has a similar structure to Peddada's method, we view the clustering problem as a model selection problem. By using a consistent model selection criterion, the false positive rate of our ORICC method approaches zero as the number of replicate arrays increases, whereas that of Peddada's method remains around the p-value threshold.
Peddada et al. also had two other comments on our inaccurate description of the optimality of the order-restricted MLE and the test used in Peddada's method. We appreciate their insight and agree that the description should be changed according to their suggestion.
Peddada et al. also reported a new simulation study based on a large number simulated null and non-null gene expressions, in which our one-stage ORICC algorithm was shown to have an inferior performance to the ORIOGEN algorithm. This motivated us to further explore the property of our ORICC algorithms. We repeated the same simulation with a varying number of null genes using ORIOGEN, one-stage ORICC and two-stage ORICC algorithms, and we found that, when null genes consist of the majority of all the genes, two-stage ORICC provides a much more satisfactory performance than one-stage ORICC, and has similar performance to ORIOGEN. See results in Table 2. It is worth noting that, in this simulation, the computational advantage of our algorithms remains, especially for the two-stage ORICC. For example, when there are 40000 null genes and 4000 non-null genes, the computational time is 128 minutes, 10.5 minutes and 3.0 minutes for ORIOGEN, one-stage ORICC and two-stage ORICC, respectively, on a workstation with a 2.30 GHz AMD Athlon(tm) 64 × 2 Dual Core 4400+processor and a 2.00 GB memory. Another issue worth mentioning is that ORIOGEN's performance depends on pre-specified p-value cutoff of 0.01 and reclassification p-value of 0.90. On a real data set, finding a proper choice of these cutoffs may be not easy.
Table 2. Comparison of ORIOGEN, one-stageORICC and two-stage ORICC. (due to Liu et al.)
To summarize, we thank Dr. Peddada and his colleagues for their insightful discussion, which motivated us to achieve a deeper understanding about the property of ORIOGEN, one-stage ORICC and two-stage ORICC. We hope that our work, their work and discussion, and this report will stimulate further work on clustering microarray data using order-restricted inference.
The authors are partly supported by the National Science Foundation of China (No.10871037).
Molec Endocrin 2002, 15:1215-1229. Publisher Full Text
Ann Statist 1994, 22:67-93. Publisher Full Text
Molec Endocrin 2002, 15:1215-1229. Publisher Full Text