Abstract
Background
A microarray study may select different differentially expressed gene sets because of different selection criteria. For example, the foldchange and pvalue are two commonly known criteria to select differentially expressed genes under two experimental conditions. These two selection criteria often result in incompatible selected gene sets. Also, in a twofactor, say, treatment by time experiment, the investigator may be interested in one gene list that responds to both treatment and time effects.
Results
We propose three layer ranking algorithms, pointadmissible, lineadmissible (convex), and Pareto, to provide a preference gene list from multiple gene lists generated by different ranking criteria. Using the public colon data as an example, the layer ranking algorithms are applied to the three univariate ranking criteria, foldchange, pvalue, and frequency of selections by the SVMRFE classifier. A simulation experiment shows that for experiments with small or moderate sample sizes (less than 20 per group) and detecting a 4fold change or less, the twodimensional (pvalue and foldchange) convex layer ranking selects differentially expressed genes with generally lower FDR and higher power than the standard pvalue ranking. Three applications are presented. The first application illustrates a use of the layer rankings to potentially improve predictive accuracy. The second application illustrates an application to a twofactor experiment involving two dose levels and two time points. The layer rankings are applied to selecting differentially expressed genes relating to the dose and time effects. In the third application, the layer rankings are applied to a benchmark data set consisting of three dilution concentrations to provide a ranking system from a long list of differentially expressed genes generated from the three dilution concentrations.
Conclusion
The layer ranking algorithms are useful to help investigators in selecting the most promising genes from multiple gene lists generated by different filter, normalization, or analysis methods for various objectives.
Background
Recent advances in DNA microarray technology provide exciting tools for studying the expression levels of thousands of distinct genes simultaneously. A common data analysis approach is to identify a subset of key genes from the original gene set that express differentially under different experimental conditions with a goal to determine the underlying relationship between samples and genes or gene clusters. The relationship is used to identify biological functions or to predict specific biological or therapeutic outcomes from the subset of key genes. Selection of differentially expressed genes can be separated into two steps. The first step is to calculate a discriminatory score that will rank the genes in order of evidence of differential expressions. The second step is to determine a cutoff (threshold) from the ranked scores to divide the genes into two lists: the differentially expressed and the nondifferentially expressed genes. The genes above the threshold are selected as differential expressions. Criteria for determining the threshold cutoff should depend on the objective of the experiment. For instance, if the objective is to identify a small number of truly differentially expressed genes for further study, then a stringent criterion such as controlling either the familywise or the false discovery error rate may be appropriate. However, if the purpose is to determine functional relationships among genes that have been affected by treatments or to develop a genomic biomarker classifier, criteria that do not eliminate as many genes may be more appropriate since the omission of informative genes would have a much more serious consequence than the inclusion of noninformative genes. In all applications, the first step of gene ranking is the more important of the two. Foldchange and pvalue are two common approaches to selecting differentially expressed genes when the experiment consists of two conditions (normal versus tumor). In the foldchange approach, a gene is said to be differentially expressed if the ratio in absolute value of the expression levels between the two classes exceeds a certain threshold, e.g., a 2fold or 3fold change. These genes are selected as differential expressions. This approach is deficient in some aspects as it does not account for the variability of the expression levels among genes. For example, genes with larger variances have a good chance of exhibiting larger foldchanges even if they are not differentially expressed. The pvalue ranking is an alternative approach for gene ranking. The pvalue is the probability outcome from a statistical testing procedure that there is no difference between two conditions for an individual gene. A small pvalue is evidence of differential expressions. One common problem encountered in the use of the pvalue ranking is that a gene with small fold change can have a very small pvalue (below the pvalue threshold) because of a very small standard deviation. These two ranking criteria often result in selecting different lists of differentially expressed genes.
One important application in microarray experiments is to develop a prediction model to discriminate different biologic phenotypes or to predict the diagnostic category or prognostic stage of a patient. Because thousands of gene are involved, many genes are often noisy in nature and many are irrelevant for prediction; the use of all predictors can suppress or reduce the performance of a classification algorithm. The prediction model is often based on a selected gene set from a pvalue ranking criterion (e.g., [1]). Alternatively, genes can be ranked according to its predictive accuracy (discriminatory ability) by performing genebygene prediction. The wrapper approach is an alternative gene selection method; the wrapper approach finds a subset of genes and evaluates its relevance while building the prediction model. For example, the classification tree (CTree) constructs a binary hierarchical classifier [2] through recursively partitioning parent nodes into two child nodes. In each node, CTree searches all possible predictors and selects the predictors that minimize overall true impurity. Guyon et al. [3] proposed a recursive feature elimination (RFE) procedure for the support vector machine (SVM) classification algorithm. The SVMRFE method uses the magnitude as a ranking criterion to select feature predictors. A strategy of applying the wrapper approach to gene ranking is to examine the frequency of selections in the cross validation [4,5]. The most frequently selected genes are presumed to be most relevant to the sample distinction. The gene ranking criteria described above for prediction purposes would result in different lists; also, these ranking criteria are different from either the foldchange or pvalue ranking criterion. As discussed, there are two general objectives in gene selection. One objective is to develop a classifier or predictive model for class prediction, and the other is to identify differentially expressed genes for a followup study. These two objectives are not mutually exclusive. For example, the set of differentially expressed genes identified presumably for the second objective can be used to develop a classifier (e.g., [1,6]) for the first objective. Different gene selection procedures often result in different rankings, even for the same objective such as the pvalue and foldchange criteria to identify differentially expressed genes. Because of thousands of genes involved and difficulty in the validation, a gene ranking procedure that strikes a balance among several ordering criteria will be useful for microarray data analysis. Furthermore, many microarray experiments have involved two or more factors and/or more than two experimental conditions. In the Applications section below, we consider a twofactor microarray study to identify biological effects of radiation exposure on gene expression. The experiment consists a control and dose groups. The RNA samples are extracted from the control and exposed cells at 4 hours and 24 hours. Statistical analysis would consist of a comparison between control and dose groups to investigate the radiation effect on gene expression, and a comparison between the two time points to investigate the time effect. Each comparison will result in one gene list according to the pvalues from the respective statistical test. A gene list that accounts for both ranking criterions will be useful for investigating the most important genes that respond to the treatment effect as well as time effect.
Given the wide uses of microarray technology, selection of differential expressed genes is one of the most important objectives in microarray data analysis. In a simple experiment with two experimental conditions, different experimental objectives or different analysis methods can generate different lists of differentially expressed genes. For experiments with more than two experimental conditions, the analysis will generate different gene lists from different test hypotheses.
This paper proposes three layer ranking algorithms for gene ranking with multiple ranking criteria, where each individual criterion constitutes its ordering of preference for selection. The presentation is limited to two and three univariate ranking criteria.
Results
Example
The colon cancer data set [7] consists of 2000 human genes with highest minimal intensity across 40 tumor and 22 normal colon tissue samples. A goal of a data analysis is to select a set of genes that express differently between the normal samples and cancer samples. We consider three criteria to select the set of marker genes: foldchange, pvalue, and frequency of selections by the SVMRFE classifier. The foldchange and pvalue were computed for each gene. The pvalues were computed based on 100,000 permutations using the tstatistic with unequal variances. When a tie occurs, the tstatistic is used to break the tie. The SVMRFE method was used to rank the discriminatory power of a gene using a 10fold crossvalidation method. Briefly, the entire data set was divided into 10 subsets (6 or 7 samples per subset) of roughly equal size. The SVMRFE was trained with a selection of 'eight' optimal genes on the 9 (= 101) subsets (either 56 or 55 samples) together and then applied to the remaining subset as the test data set.
The classification rule is iterated 10 times to complete an analysis of entire data set. The entire process was repeated 250 times each time the 62 samples were randomly partitioned into 10 subsets. The frequency of selections for the 2000 genes over the 2500 replicates were calculated as the 3rd selection criterion.
The colon data set has been analyzed extensively by many researchers using various gene selection and/or classification procedures (e.g., [5,6,8,9]). The classification accuracy rates reported from various Vfold crossvalidations, where V = 2, 3, 5, 10, and 62, are between 70% to 89%. The average accuracy rate in our 250 10fold crossvalidations by the SVMRFE is 84.6% for selecting 8 genes. The accuracy rates are 85.3% and 87.6% for selecting 16 and 32 genes, respectively. These accuracy rates are comparable or better than most of reported results in the literature.
Table 1 shows the pvalues (Pval), foldchange (FC), frequency (FQ) and their univariate rankings for the top 50 pvalue ranked genes. Note that the frequency ranking only has 47 categories; 1926 genes have never been selected. It can be seen that the three ranking criteria give very different gene lists. Among those top 50 pvalue ranked genes, 29 and 21 are in the top 50 lists ranked by the foldchange and frequency, respectively. The foldchange and pvalue rankings have a better agreement because they both measure the foldchange except that pvalue ranking adjusts for the variability of the foldchange across arrays.
Table 1. OneDimensional Ranks for Colon Data set
TwoDimensional Layer Ranking: Foldchange and pvalue
The volcano plot (e.g., [10]) has been used to summarize the two ranking schemes. The volcano plot is a plot of the pvalue (log_{10}p) versus foldchange (FC). The genes on the upper left or upper right corners represent small pvales with large fold changes (Figure 1). On the other hand, the genes on the upper middle have small foldchanges, but, having small pvalues, these genes will not be on the top list by the foldchange ranking. Similarly, the gene on the lower left or lower right have large foldchanges, but, with larger pvalues, these genes will not be on the top list by the pvalue ranking. The layerranking algorithms take all these genes into account.
Figure 1. Volcano Plot for Colon Data set. Volcano plot for the 2000 genes from the colon cancer data. The xaxis is the foldchange value and y axis is  log_{10 }pvalue, based on 50,000 permutations. Using the absolute 2fold change and pvalue 0.001 as the threshold cutoff, forty five genes in the upper left and upper right are selected.
Applying layer ranking procedures, the numbers of layers obtained from the pointadmissible, convex, and Pareto algorithms are 1268, 403, and 206, respectively. Table 2 lists the top 49 genes according to the pointadmissible ranking (the next layer consists of 4 genes). The three layer rankings generally show good agreements except that some genes may rank one layer higher or lower. Note that the differences between two adjacent gene rankings are no more than 2. For the 49 genes, the numbers of layers for the pointadmissible, convex, and Pareto algorithms are 26, 13, and 10, respectively. The 49 genes include all top 29 ranked genes by the foldchange ranking, top 31 by the pvalue. Table 2 shows some differences between the layer rankings and the pvalue ranking or the foldchange ranking. Gene T47377, which is ranked 35th by the pvalue having a large foldchange 3.72 (rank 3rd), is ranked 8th by the pointadmissible (3rd layer) and Pareto (2nd layer) algorithms and is ranked 6th (2nd layer) by the convex algorithm. On the other hand, gene H08393 has a small pvalue (ranked 5th) with small foldchange 2.34 (ranked 35th), and is ranked in the 12, 11, and 10 by the three layer algorithms.
Table 2. Comparison of TwoDimensional Ranking for Colon Data set
ThreeDimensional Layer Ranking: Foldchange, pvalue, and frequency
The numbers of layers obtained from the 3dimensional pointadmissible, convex, and Pareto algorithms are 74, 394, and 11, respectively. The convex ranking produces more layers than the pointadmissible due to the discreetness of the frequency that 1926 genes (all ranked 47th) has 0 occurrence. In general, as the dimension increases, the number of layers decrease.
Table 3 lists the top 46 genes according to the pointadmissible ranking (the next layer consists of 6 genes). For the 46 genes, the numbers of layers for the pointadmissible, convex, and Pareto algorithms are 15, 9, and 5, respectively. The three layer rankings generally show good agreements. The differences between two adjacent gene rankings are no more than 2. Similar to the 2dimensional layer ranking, genes that are ranked first in the onedimension are ranked in the first layer in the 3dimension. The 46 genes include all top 25 ranked genes by the univariate pvalue ranking, top 25 by the foldchange, and top 17 by the frequency. Furthermore, 39 of the 46 genes are selected by the foldchange and pvalue criteria shown in Table 2. The other 7 genes are L11706, T51261, H55916, J03210, U22055, H64489 and R62549, they all have high pvalues and low foldchanges, but, higher frequencies.
Table 3. Comparison of ThreeDimensional Ranking for Colon Data set
Both gene T47377 (a larger pvalue) and gene H08393 (a smaller foldchange) discussed previously in Table 2 have high frequencies (Table 1), both are ranked in the top 10 in Table 3. Gene D31885 and gene T86749, which are ranked in layer #18 in Table 2, are not listed in Table 3 because of 0 frequency.
Simulation Experiment
We conducted a simulation experiment to compare the twodimensional (pvalue and foldchange) layer rankings to the univariate pvalue ranking for selection of differentially expressed genes. The topranked genes from the univariate pvalue ranking and the three layer rankings were evaluated based on the false discovery rate (FDR) of 5% [11]. The experiment considered to m = 1000 genes, in which m_{1 }= 50 or 100 genes were differentially expressed with a constant effect size (δ), ranging from 1.0 to 2.0 with an increment of 0.2, as well as 2.5 and 3.0. Note that the δ of 1 represents a 2fold change. The number of arrays in each group was 10 or 15. The data were sampled from a normal distribution under an independent model or a correlated model. For the correlated model, we considered a block compound symmetry (CS) correlation structure [12], in which there were 100 blocks and each block consists of 10 dependent genes with a pairwise correlation coefficient ρ. We assumed that the first m_{1}/10 blocks corresponded to the m_{1 }differentially expressed genes. Therefore, the m × m variancecovariance matrix Σ of a block CS structure for the simulation model consisted of 100 equal blocks, and each block Σ_{i }has a CS structure with variances of 1 and a common correlation ρ; that is,
where each Σ_{i }had an equicorrelated structure,
where ρ = 0.3 or 0.6 was the common correlation coefficient. In each array, expression data for the nondifferentially expressed genes in each block were generated from the multivariate normal distribution N(0, Σ_{i}), and expression data for the differentially expressed genes in each block were generated from the multivariate normal distribution N(δ1, Σ_{i}), where 1 denoted a 10dimensional unity vector. For each simulated data set, the pvalues and fold changes were computed. Three layerranked gene lists were then generated. The pvalues were computed based on tstatistic from 50,000 random permutations. The simulation was repeated 500 times for each combination of m_{1}, n, δ, and ρ.
We evaluated the same number of top ranked genes selected by the pvalue ranking and by the three layer rankings in terms of the control of the FDR and the ability (power) to detect differential expression. The four ranking procedures were evaluated as follows. First, the differentially expressed genes from the pvalue ranking were selected using the Benjamini and Hochberg (BH) FDRcontrolled procedure [11]. The same number of top ranked genes, then, was selected as differentially expressed genes for each layer ranking. For each selected gene set in each simulation, the numbers of false positives (V_{k}) and true positives (T_{k}) were counted, k = 1, ..., 500. The FDR and the power were estimated by
In practice, it is possible that multiple genes are ranked in the same layer selected for differential expression. In order to eliminate ties and select the same number of top ranked genes as the pvalue ranking, we calculated a secondary score based on the average of ranks of  log_{10}(pvalue) and foldchange for each gene in the last layer. We then selected genes with the largest scores. The secondary score is a measure that have incorporated the magnitudes of the pvalue and foldchange rankings.
Table 4 shows the estimated FDR's and powers from the univariate pvalue ranking and twodimensional layer rankings for the independent model (ρ = 0). The estimated FDR's from the pvalue ranking are generally approximately close to the 5% significance level with several exceptions when n = 10. In particular, the FDR is 5.4% when n = 10 for δ = 1.0 and 1.2. The FDR estimates from the layer rankings have the FDR of 5.1% or 5.2% in several cases when δ is greater than 2 (a 4 foldchange). The estimated powers from the three layer rankings are all higher than the powers from the pvalue ranking. The convex ranking appears to perform more consistent than either the pointadmissible or Pareto ranking in maintaining the FDR level. The FDR estimates are all below the significance level for δ ≤ 2. The simulation results for the correlated models (Tables 5 and 6) are similar to the results of the independent model (Table 4). The estimated FDR's from the pvalue ranking are close to the significance 5% level with few exceptions when n = 10 and δ's are small. The convex layer ranking performs well in terms of maintaining the FDR level.
Table 4. Simulation Results for the Independent Models
Table 5. Simulation Results for the Correlated Models with ρ = 0.3
Table 6. Simulation Results for the Correlated Models with ρ = 0.6
Tables 4, 5, 6 show that all three layer rankings exhibit higher power than the pvalue ranking. However, the FDR estimates from the Pareto ranking often exceed the significance level when δ is greater than 2. As discussed, the pointadmissible ranking generally produces the most layers, while the Pareto produces the fewest layers. Genes with high pvalue rankings or high foldchange rankings will likely be ranked higher by the Pareto than by either the pointadmissible or convex ranking. For example, a nondifferentially expressed gene may have a large foldchange because of a large variance. This gene is likely be ranked higher, and be selected as differential expression by the Pareto ranking than by the pointadmissible or convex ranking. Likewise, a differentially expressed gene is more likely to be selected by the Pareto ranking than by either the pointadmissible or convex ranking. With regard to the pointadmissible and convex rankings, their FDR estimates are all below the significance level when δ ≤ 2. The convex ranking gives a slightly higher power than the point admissible ranking.
As seen in Tables 4, 5, 6, the FDR estimates from the pvalue ranking can exceed the significance level for small effect sizes when the sample size is 10. On the other hand, the FDR estimates from the layer rankings increase as the effect size δ increases when the sample size is 15. In general, when the effect sizes are large (equivalently, the sample sizes are large), the power of the tstatistic approaches to 1. The BH procedure would select all truly differentially expressed genes from the pvalue ranking while maintaining the FDR at the significance level. That is, the pvalue ranking will outperform the layer rankings in terms of the control of the FDR, particularly, when the nondifferentially expressed genes have large variances. Finally, we do not consider the null model of no difference between two groups. Since the BH procedure controls the FDR, by selecting the same number of genes the layer ranking procedures will have the FDR controlled under the null model.
Three Applications
(1) Colon Data set
We further used the colon cancer data to illustrate a possible application of the layer ranking algorithms for improving predictive accuracy in classification. There were three univariate ranking criteria, three 2dimensional layer rankings, and one 3dimensional layer rankings. For each ranking criterion, the top 8, 16, and 32 ranked genes were used for prediction using the SVM classification (without gene selection). In order to select a prespecified number of genes (8, 16, or 32) the gene(s) from the last layer was randomly selected when there are ties. For each gene set, we used the 10fold cross validation to estimate predictive accuracy. Note that crossvalidation performed after gene selection process is known as internal crossvalidation (e.g., the SVM classifier), whereas crossvalidation prior to gene selection is known the external crossvalidation [8]. For a fixed number of genes, the internal crossvalidation should have higher accuracy rates than the external crossvalidation.
Table 7 shows the predictive accuracy rates for fifteen ordering criteria. It can be seen that (1) the more number of genes used the higher accuracy rates can be reached; (2) frequency criterion outperforms pvalue and foldchange criteria in general; (3) twodimensional rankings for pvalue and foldchange in general outperforms their corresponding univariate rankings; (4) twodimensional Pareto ranking improves prediction accuracy for frequency using 16 genes.
Table 7. Classification Results for Colon Data set
(2) Ionizing Radiation Data set
In this example, we used the layer ranking algorithms to identify genes that show most differentially expressed in two experimental factors. The experiment was conducted to study the effects of ionizing radiationexposed human lymphoblastoid TK6 cells on gene expression [13]. In this experiment, TK6 cells were exposed to 5, 10, and 20 Gy ionizing radiation and cultured for 4 and 24 hours after exposure. RNA was hybridized to the Phase1 Human350 microarray (Phase 1 Molecular Toxicology, Santa Fe, NM) spotted with 350 human cDNA probes. This twocolor array was designed for detection of differential expression profiles relative to toxicological pathways. The backgroundsubtracted intensities were normalized according to Lowess methodology in the log_{2 }scale, and a dyebias correction was applied to the resulting data, as described in [14].
For illustrative purpose, we only used the 5 and 10 Gy ionizing radiation data. Statistical analysis consisted of a comparison between the 5 and 10 Gy dose groups and a comparison between the 4 and 24 hours after exposure. In each comparison, the differentially expressed genes were ranked according to the pvalues using the permutation ttest. The layer ranking algorithms were applied to both pvalue rankings. Table 8 shows the top 10 layers, according to the pointadmissible algorithm, for the three layer rankings and the two pvalue rankings, where P_{1 }and P_{2 }represent the ranking for the dose and time effects, respectively. It can be seen that dose and time effects show two distinct rankings. Each pvalue ranking identifies a set of genes that show differences in expression on a single biological factor. The layer rankings provide a list of 'significant' genes that account for both dose and time effects simultaneously. Many of the top hits have been reported to be important in lymphoblastoid cell lines exposed to ionizing radiation provides confidence that the analysis produces meaningful results.
Table 8. Rankings on two pvalues for Ionizing Radiation Data set
(3) Dilution Data set
In this example, we applied the threedimensional layer ranking criteria to a subset of the dilution data set of Gene Logic and the data are available at [15]. This study used two sources of cRNA, human liver tissue and central nervous system (CNS) cell lines. Samples were hybridized to HGU95Av2 GeneChips arrays from Affymetrix at various dilution and mixture levels. We considered the data from the three concentrations 7.5, 10.0, and 20.0 μg. Five replicate arrays are available for each concentration with a total of 30 arrays. The data were extracted, normalized and summarized using the "Affy" package from Bioconductor. For preprocessing methods, we used MAS 5.0 for background correction and PM correction, the quantile normalization method for normalization of the probe level, and the RMA method for summarization of probe intensities, which was suggested by [16].
Since the two samples are biologically distinct, it is expected that many genes will show differential expressions between the two samples. The relative abundance of each gene is proportional to its dilution concentration. However, the expression ratios between the pure samples (fold changes) are relative and should not vary with the amount of cRNA. If a gene expresses differently between the liver and CNS samples at 20.0 μg concentration, then we expect the same gene would show a difference in expression at other concentrations. In this application, the layer ranking algorithm is used to provide a ranking system from a long list of differentially expressed genes generated from three dilution concentrations.
The liver and CNS samples were compared for each of the three concentrations using the SAM procedure [17]. The level of significance was set at the false discovery rate of 0.01. Let G_{1}, G_{2}, and G_{3 }be the sets of the differentially expressed genes identified from the concentrations 20.0, 10.0, and 7.5 μg, respectively. Figure 2 summaries the number of differentially expressed genes selected from the three comparisons. The overlapping regions denote the number of genes that were differentially expressed in two or all three concentration concentrations. There were 1034 genes which show consistently differential expressions in the two RNA samples in all three concentrations. We then applied the threedimensional convex layer ranking algorithm to three pvalue lists obtained from the SAM procedure. Ninetyfive layers were obtained. The layer rankings of 1034 differentially expressed genes are shown in Figure 3. The other two layer ranking algorithms yield similar results (data not shown).
Figure 2. Results of Dilution Data Set. Venn diagram of the dilution data set demonstrating the agreement in the number of differentially expressed genes selected from the three concentrations using SAM procedure at the 0.01 FDR level.
Figure 3. 3D convex layer ranking for Dilution Data Set. Scatter plot of three pvalue lists for the 1034 differentially expressed genes selected from Figure 2. Colors represent the layers of genes identified by the threedimensional convex layer ranking algorithm.
Table 9 provides the Kenall's τ correlation matrix of the pvalue rankings of 1034 genes for the three concentrations with the three proposed threedimensional layer rankings. Ideally, the pvalue rankings of the 1034 genes identified as changes in expression between the liver and CNS samples should have high agreement across the three RNA concentrations. It is apparent that there are many discordances among the 20 μg 10 μg, and 7.5 μg pvalue rankings (Figure 3 and Table 9). In contrast, the three layer ranking algorithms improve the comparability (Table 9); they provide compromising gene rankings by combining information across the three RNA concentrations. The layer ranking algorithms can help an investigator to pick a few of topranked genes with confidence across multiple listings for further investigation.
Table 9. Kendall's τ correlation among six rankings
Discussion and Conclusion
Recently, the MicroArray Quality Control consortium suggested: "Foldchange ranking plus a nonstringent Pvalue cutoff can be used as a baseline practice for generating more reproducible signature gene lists" [18]. Many researchers have questioned this approach [19]. The pvalue ranking ensures the control of significance level, while the foldchange ranking may provide a better ranking when sample size (or effect size) is small. The layerranking algorithms provide a gene list that reconciles the pvalue and foldchange rankings implied by the volcano plot. The simulation shows that for experiments with small or moderate sample sizes (less than 20 per group) and detecting a 4fold change or less, the layer ranking selects differentially expressed genes with generally lower FDR and higher power than the pvalue ranking. For large sample sizes or effect sizes, the pvalue ranking will outperform the layer rankings.
We illustrate three additional applications of the layer ranking algorithms. In the colon data example, we illustrate an application of using layer rankings for improving predictive accuracy. Because of a large number of genes involved, the gene selection becomes one of the most important steps in the development of a prediction model. An analysis by Michiels et al. [20] showed that the list of genes identified as predictors of cancer prognosis was highly unstable. The selected gene set strongly depended on the selected patients in the training set. In this example, we consider the three gene selection criteria: pvalue, foldchange, and frequency of selections. Table 8 indicates that the improvement of predictive accuracy of the layer rankings over the pvalue rankings appears marginal. The simulation indicates that when the sample size is large, the layer rankings can exceed the significant level. This may not be a problem for prediction purposes, since the omission of informative genes generally has a much more serious consequence on predictive accuracy than the inclusion of noninformative genes. We are currently investigating different univariate selection criteria in conjunction with layer ranking algorithms to improve predictive accuracy. In general, the pvalue can be calculated in many different ways such as the parametric ttest, permutation ttest, or SAM method. Frequency of selections can be calculated by other classification algorithms such as CTree [2] or Random Forest [21]. In addition, instead of selecting 8 optimal genes in each cross validation, we may select 64 genes or more so that each gene has higher probability been selected. Those genes that have never been selected are unlikely to be differentially expressed. The 2dimensional pvalue and frequency layer rankings may be useful to filter out a small number of nondifferentially expressed genes. The genes in the bottom layers may be the candidates for filtering out.
In the ionizing radiation example, we apply the layer ranking algorithms to a twofactor experiment. The algorithms can be used in the onefactor experiment with more than two conditions. Consider an experiment to study effect of p53 genotype on gene expression profiles. The experiment consists of three mouse genotypes: wildtype (+/+), knockout (/), and heterozygous (+/). Statistical analysis typically consists of a comparison among the three genotypes. A gene list ranked according the pvalues from the Fstatistic can be obtained using either permutation or parametric approach. An important followup analysis is the comparisons between the knoutout and wildtype mouse and between the heterozygous and wildtype mouse. The Dunnett's test is frequently used to generate the differentially expressed gene lists for the two comparisons. However, the investigator is often interested in the genes that show differences in both comparisons. (Note that the significant genes identified in the Ftest may be insignificant in both Dunnett's tests.) One approach is to select the genes that are significant in both gene lists at a given pvalue cutoff. However, when the number of common genes is large, the investigator must select a subset of genes from the two criteria (the dilution example). The layer ranking algorithm can be used to provide a list of the most "important" genes that account for both objectives simultaneously for followup investigation. In the dilution example, we illustrate the strength of the threedimensional layer ranking algorithms for combining discordant results derived from three concentration groups. The set of probes that are consistently identified at different RNA concentrations is ranked according to compatibility between differential expression profiles in three concentration groups.
In summary, a microarray experiment can generate different gene lists by different filter, normalization, or analysis methods for different study objectives. The layer ranking algorithm can be useful to help investigators to select the most promising genes from multiple gene lists.
Methods
Let S = {p_{i }= (x_{i}, y_{i})  i = 1,..., m} denote the set of points under consideration, where x_{i }> 0. For example, x is the fold change in absolute value and y is (log_{10 }p). BarndorffNielsen and Sobel [22] proposed a layer ranking criterion for ordering multivariate data. The layer ranking divides S into disjoint sets (layers) of different ranks, the points in the same layer have the same rank. In this paper, we present three layer ranking criteria based on the principle of the first quadrantadmissible [22]. A point, p_{i }= (x_{i}, y_{i}), is called first quadrantadmissible in S if there does not exist any point p = (x, y) such that x > x_{i }and y > y_{i}. Conversely, a point, p_{i }= (x_{i}, y_{i}), is dominated by another point p_{j }in S if (x_{i }<x_{j }and y_{i }≤ y_{j}) or (x_{i }≤ x_{j }and y_{i }<y_{j}). Three layer ranking algorithms are described below.
Pointadmissible layer
A point (x_{i}, y_{i}) is called rth layer (first quadrant) admissible (r = 1,2, ...) [22] if there are exactly (r  1) points (x, y) such that x > x_{i }and y > y_{i}. Let S_{r }denote the set of rth layer admissible (r = 1, 2,...). Each observation is either rth layer admissible (r = 1, 2, ...) or inadmissible; that is, S = S_{1 }∪ S_{2 }∪ ... . For each point p_{i }= (x_{i}, y_{i}) in S, let r denote the number of points p = (x, y)'s for which x > x_{i }and y > y_{i}. The point p_{i }is assigned to the (r + l)th layer (r = 0,1,...).
Lineadmissible (convex) layer
A line segment is called (first quadrant) admissible in S if every point on the line segment is first quadrantadmissible in S. The 1st lineadmissible layer is obtained by finding the admissible points that are connected by line(s) with nonpositive or infinite slopes (the minimum convex set). The rth layer is obtained similarly by stripping off the points on the (r  1)th layers (r = 2, 3...).
Pareto layer
The Pareto layer (front) was introduced by [23] for gene ranking. The 1st Pareto layer consists of all points not dominated by other points. The rth layer is obtained similarly by striping off the points on the (r  l)th layers (r = 2, 3...).
The three layer ranking algorithms are illustrated in Figure 4. Figure 4(a) consists of 15 points (ao). In the point admissible layer ranking, for a given point, say, l, the number of points with both coordinates greater than l (the number of points in the upper right) is 7. The point l is assigned to the 8th layer (Figure 4(b)). Note that the point admissible layer may have empty layers, for example, 4th and 5th layers. Figures 4(c) and 4(d) show the lineadmissible layers and Pareto layers, respectively. In Figure 4(c), all line segments have negative slopes. The point e is dominated by the line segment connected points c and d. Therefore, e is assigned to the next (3rd) layer. In Figure 4(d), e has a larger xcoordinate than c, and a larger ycoordinate than d. Therefore, e is not dominated by c or d; and c, d, and e are all assigned to the 2nd layer. The subscripts in Figure 4(a) represent layers for which the point is assigned by the three algorithms. For example, the point n has the subscript 865 indicating that n is assigned to the 8th, 6th and 5th layers by the pointadmissible, convex, and Pareto algorithms, respectively. Note that the empty layers for the rth layer algorithm are removed and the nonempty layers are renumbered. The numbers of layers for the three ranking algorithms are 9, 7, and 6 (point o). For continuous measurements such as foldchange or pvalue, the pointadmissible ranking generally produces the most layers, while the Pareto ranking produces the fewest layers.
Figure 4. Three Ranking Methods. Illustration of three layer ranking algorithms for 15 points. (a) Fifteen data points (ao) on a twodimensional space. The subscripts represent the layers for which the point is assigned by the three algorithms. For example, the subscript 865 indicates that point n is assigned to 8th, 6th and 5th layers for (b) Pointadmissible algorithm, (c) Convex algorithm, and (d) Pareto algorithm respectively.
Extending the previous pointadmissible layer and Pareto layer to three or more dimensional situations is straightforward. For threedimensional examples, a point p_{i }= (x_{i}, y_{i}, z_{i}) is assigned to the (r + 1)th layer if there are r points p = (x, y, z)'s such that x > x_{i}, y > y_{i}, and z > z_{i}. Similarly, a point p_{i }= (x_{i}, y_{i}, z_{i}) belongs to the 1st Pareto layer if no other points dominate it, i.e.,
{(x, y, z)x ≥ x_{i}, y ≥ y_{i}, z ≥ z_{i}}\{(x, y, z)x = x_{i}, y = y_{i}, z = z_{i}}
is the null set. Then points are assigned to the rth Pareto layer (r = 2,3,...) recursively in the same way by striping off the points on the 1st,...,(r1)th layers.
To find a higherdimensional convex layer, an algorithm for determining the convex polytope (convex hull in an arbitrary dimension) is needed and interested readers are referred to Chapter 11 of [24]. One popular implementation can also be found at [25]. The 1st convex layer is obtained as the intersection of the points lying on the convex polytope and on the 1st Pareto layer. The next step is to recursively strip off the points on the 1st,..,(r1)th layers, and points are assigned to the rth convex layer (r = 2,3,...) if they lie on the resultant convex polytope and on the resultant 1st Pareto layer.
Availability and requirements
A multiple ordering procedure for gene selection written in R with various options is freely available.
Project name: multiple ordering gene selection
Project home page: http://gap.stat.sinica.edu.tw/Software/mvo.R webcite
Operating systems: any OS that supports the R environment
Programming languages: R
License: free
Authors' contributions
JJC and CHC conceived the study, developed the methodology, and wrote the manuscript. CHC and ST developed and implemented the main algorithm. CAT implemented the SVM algorithm and conducted the simulation experiment. CAT and ST performed the analysis. All authors read and approved the final manuscript.
Acknowledgements
The authors are grateful to two anonymous reviewers for much helpful comments and guidance on improving this paper. Part of this work was done when James J. Chen visited the Institute of Statistical Science, Academia Sinica, Taiwan. Chunhouh Chen's research was partially supported by The Genomic and Proteomic Program, Academia Sinica, Taiwan (94B0021). The views presented do not necessarily represent the views of the U.S. Food and Drug Administration.
References

Liu H, Li J, Wong L: A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns.
Genome Informatics 2002, 13:5160. PubMed Abstract  Publisher Full Text

Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and Regression Trees. New York: Chapman & Hall; 1984.

Guyon I, Weston J, Barnhill S, Vapnik V: Gene selection for cancer classification using support vector machines.
Machine Learning 2002, 46:389422. Publisher Full Text

Li L, Weinberg C, Darden T, Pedersen L: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method.
Bioinformatics 2001, 17:11311142. PubMed Abstract  Publisher Full Text

Cho JH, Lee D, Park JH, Lee IB: Gene selection and classification from microarray data using kernel machine.
FEBS Letters 2004, 571:9398. PubMed Abstract  Publisher Full Text

Tsai CA, Chen CH, Lee TC, Ho IC, Yang UC, Chen JJ: Gene selection for sample classifications in microarray experiments.
DNA and Cell Biology 2004, 23:607614. Publisher Full Text

Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.
Proceedings of National Academy of Sciences 1999, 96:67456750. Publisher Full Text

Ambroise C, McLachlan GJ: Selection Bias in Gene Extraction on the Basis of Microarray GeneExpression Data.
Proceedings of National Academy of Science 2002, 99:65626566. Publisher Full Text

Dettling M: BagBoosting for tumor classification with gene expression data.
Bioinformatics 2004, 20:35833593. PubMed Abstract  Publisher Full Text

Jin W, Riley RM, Wolfinger RD, White KP, PassadorGurgel G, Gibson G: The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster.
Nat Genet 2001, 29:389395. PubMed Abstract  Publisher Full Text

Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing.
Journal of the Royal Statistical Society B 1995, 57:289300.

Jung SH: Sample size for FDRcontrol in microarray data analysis.
Bioinformatics 2005, 21:30973104. PubMed Abstract  Publisher Full Text

Akerman GS, Rosenzweig BA, Domon OE, Tsai CA, McGarrity LJ, Bishop M, MacGregor JT, Sistare FD, Chen JJ, Morris SM: Alterations in the gene expression profiles and the DNA damage response in ionizing radiationexposed TK6 cells.
Environmental and Molecular Mutagenesis 2005, 45:188205. Publisher Full Text

Rosenzweig BA, Pine PS, Domon OE, Morris SM, Chen JJ, Sistare FD: Dyebias correction in duallabeled cDNA microarray gene expression measurements.
Environmental Health Perspectives 2004, 112:480487. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

The dilution data set [http://www.genelogic.com] webcite

Choe SE, Boutros M, Michelson AM, Church GM, Halfon MS: Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset.
Genome Biology 2005, 6:R16. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Tusher V, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response.
Proceedings of the National Academy of Sciences 2001, 98(9):51165121. Publisher Full Text

MAQC Consortium: The MicroArray Quality Control (MAQC) project shows inter and intraplatform reproducibility of gene expression measurements.
Nature Biotechnology 2006, 24:11511169. PubMed Abstract  Publisher Full Text

Michiels S, Koscielny S, Hill C: Prediction of cancer outcome with microarrays: A multiple random validation strategy.
Lancet 2005, 365:488492. PubMed Abstract  Publisher Full Text

Machine Learning 2001, 45(1):532. Publisher Full Text

BarndorffNielsen O, Sobel M: On the distribution of the number of admissible points in a vector random sample.
Theory of Probability and its Applications 1966, 11:249269. Publisher Full Text

Fleury G, Hero AO, Yoshida S, Carter T, Barlow C, Swaroop A: Pareto analysis for gene filtering in microarray experiments.
European Signal Processing Confersence (EUSIPSO), Toulouse, France 2002.

de Berg M, van Kreveld M, Overmars M, Schwarzkopf O: Computational Geometry: Algorithms and Applications. Berlin: SpringerVerlag; 2000.

The Geometry Center, Minneapolis MN [http://www.qhull.org] webcite