Abstract
Background
Ranked gene lists from microarray experiments are usually analysed by assigning significance to predefined gene categories, e.g., based on functional annotations. Tools performing such analyses are often restricted to a category score based on a cutoff in the ranked list and a significance calculation based on random gene permutations as null hypothesis.
Results
We analysed three publicly available data sets, in each of which samples were divided in two classes and genes ranked according to their correlation to class labels. We developed a program, Catmap (available for download at http://bioinfo.thep.lu.se/Catmap webcite), to compare different scores and null hypotheses in gene category analysis, using Gene Ontology annotations for category definition. When a cutoffbased score was used, results depended strongly on the choice of cutoff, introducing an arbitrariness in the analysis. Comparing results using random gene permutations and random sample permutations, respectively, we found that the assigned significance of a category depended strongly on the choice of null hypothesis. Compared to sample label permutations, gene permutations gave much smaller pvalues for large categories with many coexpressed genes.
Conclusions
In gene category analyses of ranked gene lists, a cutoff independent score is preferable. The choice of null hypothesis is very important; random gene permutations does not work well as an approximation to sample label permutations.
Background
In genomewide microarray experiments, it is possible to analyse the relevance of many different categories of genes, obtained from prior knowledge in the form of database annotations or from other experiments. These gene annotation analyses can unravel new information about pathways and cellular functions responsible for different phenotypes. Computational tools aiding in this process have recently been developed [18], most notably for annotations based on the Gene Ontology (GO) [9]. Generally, category relevance is calculated as the pvalue of a score, thus being dependent on both the choice of score and the choice of null hypothesis.
In microarray analyses such as clustering, which provide defined subsets of genes with no internal ranking, it is natural to base the score on the number of category genes in the relevant subset. However, ranking of genes appear in many techniques for microarray analysis, such as correlation of gene expression to target profiles [10] and scoring of genes by their ability to discriminate between experimental conditions [1113]. A separation of relevant and irrelevant genes can easily be constructed from ranked gene lists by introducing a cutoff, but the choice of cutoff becomes somewhat arbitrary and information in the list is lost. Tools addressing this problem, by using rankbased scores that are independent of a rank cutoff, have adopted the KolmogorovSmirnov score [1417], and a minimized cutoffbased pvalue [7,8], which optimizes the cutoff for each category. The Wilcoxon rank sum [18], investigated here, serves the same purpose.
To calculate a pvalue for the assigned score, a set of gene lists, ranked according to a chosen null hypothesis, are needed. The simplest choice of null hypothesis is just random gene permutations, and for some rankbased scores, the pvalue can then be calculated analytically, without explicitly performing the permutations. However, the random gene permutations null hypothesis assumes independence of gene expression over biological samples, and the pvalue is thus a combination of the pvalue of how important the category is and the pvalue for the genes of the category being coexpressed. When category genes behave similarly over a wide range of experimental conditions, the coexpression does not indicate relevance of the category for the question under study. In many analyses, a more appropriate null hypothesis is therefore sample label permutations, in which a set of ranked gene lists are generated based on the gene expression correlations to randomly permuted target values of the samples. This approach accounts for correlations between category genes and gives pvalues that are bounded from below by the number of possible permutations of the samples in the data set. The latter is particularly important in data sets with few samples. Despite this, publicly available tools for gene annotation analysis are restricted to gene permutations [18].
We present a program, Catmap, for gene category analysis based on ranked gene lists. The program uses either the number of genes above a cutoff or the Wilcoxon rank sum as score, and the significance of the score can be calculated from a user supplied set of ranked lists, thus allowing for sample label permutations. Furthermore, the program calculates corrections for multiple category testing, using permutation results to assess an effective number of independent categories, which enables Catmap to estimate very small multiple category pvalues, that would otherwise have been computationally infeasible. The input to the program is two files and some arguments. The first file contains the biologically relevant ranked list of genes and, if needed, additional ranked gene lists drawn from the null hypothesis. The second file contains the categories and their corresponding genes. The input arguments can either be specified on the command line or in a settings file, and are as follows: 1) a choice between the cutoff score the Wilcoxon rank sum score; 2) a choice of null hypothesis, which can be either the above mentioned usersupplied ranked lists or random gene permutations; 3) the number of permutations used in multiple category testing. If zero, no multiple category testing is performed.
The output of Catmap is two files. The main output file contains all the categories, one on each line ordered according to their significance. The line of a category contains the pvalue, the multiple comparison pvalue, the false discovery rate, the ROC area (which is a normalized way to represent the Wilcoxon rank sum), the number of genes in the category, and the 25th, 50th, and 75th percentiles of the ranks. The other output file, the companion file, contains all the categories, with all the genes and their ranks listed below. Each line contains a gene and its rank. The program can be downloaded at [19], where file format specification and example files are accessible as well.
Results and discussion
Comparing cutoff independent and cutoffbased score functions
We analysed the breast cancer data set of van 't Veer et al. [13] with a cutoffbased score function, using different cutoffs. Table 1 presents results for 15 categories with low pvalues from cutoff independent scoring, showing that the pvalue depends strongly on the choice of cutoff. This is further illustrated by the very different cutoffs at which the minimized cutoffbased pvalue was obtained. A table with all categories is provided as a supplement [see 2].
Table 1. Category pvalues for cutoffbased and cutoff independent score functions. The 15 GO categories with the lowest Wilcoxon rank sum pvalues from the ranked gene list, based on the data set of van't Veer et al., which comprises 5224 genes in total. Three columns show pvalues for cutoff based score functions, with cutoffs at position 100, 300 and 600 in the list. The columns "min p" and "cutoff" give the minimal cutoff based pvalue and the cutoff where this minimum was attained. The column "WRS" gives the pvalue calculated with the Wilcoxon rank sum as score function and random permutation of genes as null hypothesis, and the column marked #genes indicates the total number of genes in the category. A full table (sorted by WRS pvalue) is given as supplementary information [see 2]. The supplementary table also contains the ranking of each category using the different methods and the 25th, 50th and 75th percentiles of those genes in the ranked list.
Additional File 2. supplement to Table 1.
Format: CSV Size: 445KB Download file
Compared to the variations between the cutoffbased alternatives, the results shown in Table 1 are in reasonable agreement for two cutoff independent pvalues, using the Wilcoxon rank sum and the minimized cutoffbased pvalue, respectively. The pvalue based on the Wilcoxon rank sum was most often larger than the minimal cutoffbased pvalue. Since the latter is biased by a minimization process, it must be interpreted as a score, rather than a pvalue, thus requiring additional analyses to find statistical significance [7,8].
Comparing null hypotheses
Using the Wilcoxon rank sum, we compared the results of different null hypotheses. Three publicly available data sets were examined [11,13,20]. As can be seen in Figure 1, pvalues based on gene permutations tend to be lower than those based on sample label permutations. For categories with small pvalues, there are remarkable differences, in particular for large categories with more than 20 genes. Since the gene permutation null hypothesis assumes independent genes, we expect a GO category whose genes are uncorrelated to have roughly the same pvalue under the two different null hypotheses, whereas a significant category whose genes are highly correlated will get a lower pvalue using the gene permutation null hypothesis. To illustrate this coexpression effect, we picked two large categories, "carboxylic acid metabolism" and "M phase", which are encircled in Figure 1. In the data set of van 't Veer et al. [13], "carboxylic acid metabolism" has similar pvalues for the two null hypotheses, while "M phase" has a pvalue of 10^{7}using gene permutations but the much higher pvalue of 3 · 10^{2 }using sample label permutations. As seen in Figure 2, the most highly ranked genes of "M phase" are indeed more coexpressed than the most highly ranked genes of "carboxylic acid metabolism".
Figure 1. Comparing null hypotheses. Comparison of pvalues obtained by sample label permutations and gene permutations, using the data set of van 't Veer et al. [13] (left), Golub et al. [11] (middle), and Alon et al. [20] (right). Sample label permutation results were obtained with 100.000 permutations for the van 't Veer et al. data set and with 10.000 permutations for the other data sets. Gene permutation results were calculated as described in Methods. Red, green and blue colours represent categories with 1 to 5, 6 to 20, and over 20 genes, respectively. Encircled boxes in the left figure represent the categories "M phase" and "carboxylic acid metabolism", which are further discussed in the text.
Figure 2. Effects of coexpression on different null hypotheses. Expression profiles, over the 97 samples in van 't Veer et al. [13], of the 12 most highly ranked genes in the "M phase" category (left) and 13 most highly ranked genes in the "carboxylic acid metabolism" category (right), respectively. Some genes were inverted since the ranking was based on absolute correlation values to metastasis class. The metastasis free samples are to the left of the vertical line, and within each metastasis class, samples are ordered in increasing average expression of the examined genes. The expressions of each gene was normalized to zero average across samples. The narrower band of expressions in the left figure illustrates the higher Pearson correlation of M phase genes. Average absolute Pearson correlation between gene expressions was 0.74, with standard deviation of 0.16, for the M phase genes, and 0.44, with standard deviation of 0.27, for carboxylic acid metabolism genes.
In Table 2, the ranks of categories for the different null hypotheses are compared. There are distinct differences, with only a small overlap among top ten categories. One can clearly see the tendency for the gene permutation null hypothesis to find categories with very many genes, as discussed above. A table with all categories is provided in the supplement [see 3].
Table 2. Comparison of different cutoff independent approaches. The top ten categories and their corresponding ranks for each of the the four methods: Wilcoxon rank sum (WRS) with sample label permutation null hypothesis (s.l.p.), WRS with gene permutation null hypothesis (g.p.), the KolmogorovSmirnov score (KS) as used in GSEA [17], and the minimal cutoffbased pvalue (minp) [7, 8]. Percentile columns indicate the position of the 25th, 50th and 75th percentile in the ranked gene list which comprises 5224 genes and is based on the data set of van't Veer et al.. The last column indicates the number of genes in each category. The full table is available as a supplementary file [see 3].
Additional File 3. supplement to Table 2.
Format: CSV Size: 223KB Download file
Table 2 also shows category ranks obtained with two alternative cutoff independent score functions: the KolmogorovSmirnov score as used in GSEA [17] and the minimal cutoffbased pvalue used in FuncAssociate [7] and iGA [8]. These two alternatives do not calculate individual pvalues for categories, but ranks categories based on the chosen score. Nevertheless, they give results similar to those obtained with the Wilcoxon rank sum and gene permutation. This is expected, since the minimized pvalue is calculated with gene permutations, and the score adopted in GSEA [17] ranks categories similarly to what a KolmogorovSmirnov pvalue, based on gene permutations, would do. It should be noted that GSEA, FuncAssociate, and iGA calculate multiple hypotheses corrected pvalues, but these do not change the ranking of categories.
There is a possible difference (which does not reveal itself in Table 2) between the KolmogorovSmirnov score and minimized pvalue score on one hand, and the Wilcoxon rank sum on the other, in the treatment of categories for which only a subset of genes have expressions correlating significantly with the question under study. The important genes being in the top of the ranked list will give the category a good score with all three score functions, provided the remaining, seemingly insignificant, genes are distributed in the ranked list as expected by random. However, if these less important genes lie higher in the list than expected by random (though not high enough to affect the KolmogorovSmirnov or minp scores), the category will be considered more important by the Wilcoxon rank sum. Reversely, if the less important category genes prevail in the bottom of the list, the Wilcoxon rank sum score function will deem the category as unimportant, while the other two scores will give the category a high significance, based on the top ranked genes alone. Whether seemingly insignificant genes being ranked better or poorer than explainable by random expectations should be observed or ignored is of course a matter of taste, and a possibility is to use several score functions, that may complement each other. The differences are, however, much smaller than those related to choice of null hypothesis, as revealed in Table 2.
Multiple category testing
The more categories that are being tested, the more likely it is that at least one category gets a very small pvalue by chance. To better evaluate the statistical significance of the best scoring categories, we used Catmap to calculate false discovery rates and familywise error rates by permutation tests. This also gave us an effective number of independent categories, N_{eff}, as described in Methods.
The GO contains many small categories which would be reasonable to ignore in a study aiming at biological conclusions, and they were included in Figure 1 mainly to highlight the differences between the null hypotheses. When performing multiple category testing, we restricted the study to large categories, containing more than 20 genes. We tested the 3 subontologies (biological process, molecular function, and cellular component) both separately and together.
As expected from the discussion above, several categories with coexpressed genes got small p_{multiple }and small false discovery rates with random gene permutations. In contrast, when using sample label permutations, the smallest p_{multiple }was obtained in the data set of van 't Veer et al. [13] for the biological process category "organic acid metabolism", which contained 83 genes and had p = 3 · 10^{4 }and p_{multiple }= 0.02. Interestingly, organic acid metabolism is known in the literature to be relevant for breast cancer [21,22]. For this data set and the biological process categories, there was a 38% false discovery rate among the top 15 categories.
For all 3 subontologies, the effective number of categories, N_{eff}, was around half of the full number of categories, N. In the data set of van 't Veer et al. [13] the numbers were N_{eff }= 83 versus N = 166 for biological process, N_{eff }= 69 versus N = 119 for molecular function, and N_{eff }= 22 versus N = 42 for cellular component. For all categories together the real number of large categories was N = 327 whereas N_{eff }= 152. Using random gene permutations for the same data set and categories, we got N_{eff }= 170. The fact that N_{eff }for the two null hypotheses are so close is a general phenomena that we see in all our examples (data not shown). Furthermore, for all data sets and ontologies studied, N_{eff }was approximately half of the total number of categories. If this is a general feature for GO categories, the simple Bonferroni correction would not be totally unreasonable for small pvalues.
Figure 3 shows that the fit with an effective number of categories was good; in the range where permutations results were available it did not deviate more than a factor of two. The example in Figure 3 was obtained with 100.000 sample label permutations, and minimal pvalues were found for 1000 random gene lists.
Figure 3. Fitting an effective number of independent categories. The multiple category pvalue, p_{multiple}, versus pvalue for the data set of van 't Veer et al. [13], using 327 large Gene Ontology categories with more than 20 genes. The yellow band shows 95% confidence interval of sample label permutation results, based on 1000 random lists, and the blue curves show the results of Equation (1), with the fitted N_{eff }= 152 (solid line), the total number of categories N = 327 (dashed line), and also the Bonferroni correction (dotted line).
It should be noted that whenever several ranked lists are examined as part of a project, this additional source of multiple hypotheses testing should also be corrected for. An example of such a correction, for cutoffbased score functions, is presented by Corà et al. [23].
Conclusions
We developed a computer program for calculating the significance of gene categories in a ranked list of genes. Corrections for multiple category testing can be performed by the program. To investigate the properties of different scores and null hypotheses, we analyzed three publicly available data sets [11,13,20]. Commonly [16], a subset of relevant genes is defined from a ranked gene list by introducing a cutoff in the list. Our results show that the obtained pvalues of biologically relevant categories depend strongly on the choice of cutoff. The cutoff independent Wilcoxon rank sum score overcomes the problem, representing an alternative to the KolmogorovSmirnov score [1417] and the minimized cutoffbased pvalue [7,8]. The ranking of categories for the three cutoff independent scores are very similar.
Though sample label permutations in many situations represent a better null hypothesis than gene permutations, available gene annotation analysis tools are restricted to the latter. Our implementation allows for both null hypotheses, and we find that both the pvalues and the ranking of categories depend strongly on the choice of null hypothesis. Compared to sample label permutations, gene permutations gave much smaller pvalues for large categories with many coexpressed genes.
Methods
Algorithm
The implemented algorithm treats the categories sequentially and independently. As score function for category relevance, the program uses either the Wilcoxon rank sum or the number of genes above a given cutoff in the ranked list. The latter is implemented for method comparison and for the case of a defined subset of relevant genes, without internal ranking.
For the case of the Wilcoxon rank sum, the user can supply a set of ranked lists distributed according to an appropriate null hypothesis, or request random permutations of genes as the null hypothesis. In the latter case, the significance of the score is calculated analytically by the program, using either an exact calculation by an iterative method, a Gaussian approximation, or a continuous volume approximation. The program chooses method based on a balance between accuracy and computation time. Details are presented in supplementary information [see 1].
Additional File 1. pvalues for the Wilcoxon rank sum score.
Format: PDF Size: 73KB Download file
This file can be viewed with: Adobe Acrobat Reader
For the case of the cutoffbased score function, the pvalue of category relevance is determined with Fisher's exact test [24], corresponding to randomly permuted genes as null hypothesis.
When N independent categories are tested simultaneously, familywise error rate simply means calculating the probability,
p_{multiple }(q) = 1  (1  q)^{N}, (1)
that at least one category has a pvalue below any given number q by chance. For correlated categories, we make the assumption that the same functional form describes p_{multiple }(q), with N replaced by an effective number of independent categories N_{eff}. We find N_{eff }by generating a number, K, of ordered lists under the null hypothesis and calculating the pvalues of all categories. We fit N_{eff }using the maximum likelihood estimation
where p_{k }is the minimal pvalue for the k'th ordered list.
The false discovery rate for the j highest ranked categories is found by counting the number of pvalues from K permuted lists lower than the pvalue of the j:th category and divide this number with K · j. For the case of sample label permutations, when a user supplied set of ranked gene lists are used to represent the null hypothesis, the first K lists are used to find N_{eff }and false discovery rates, and the remaining lists are used to calculate pvalues for each of the K lists.
Implementation
The algorithm is implemented in the Perl program Catmap.pl and is released under the GNU General Public License (GPL). Catmap.pl, together with user instructions, is available for download at [19].
Public data sets
Using Catmap, we analysed three publicly available data sets with gene annotations from the Gene Ontology.
The data set of van 't Veer et al. [13] consists of 97 patients with primary sporadic breast cancer, of which 46 had metastases within five years following treatment. Quality filtering was performed as described in [13], and rendered about 5,000 genes which were ranked according to their absolute Pearson correlation to metastasis class. A Gene Ontology analysis of the data set has previously been performed with the 231 top genes as the subset of important genes and random gene permutations [25].
The data set of Golub et al. [11] consists of bone marrow samples from leukemia patients, 27 with AML and 11 with ALL. The published data contains expression levels for 5000 genes, which after removal of genes with no variance across samples rendered 4812 genes which were ranked according to their absolute Pearson correlation to leukemia type.
The data set of Alon et al. [20] consists of 40 tumour and 22 normal colon tissue samples. The 2000 genes in the published data set were ranked according to their absolute Pearson correlation to tissue type.
Gene ontology associations
All genes were first mapped to corresponding UniGene clusters [26]. For the data set of Golub et al. [11] this mapping was given from chip annotation files provided by Affymetrix, whereas for the other data sets [13,20], the mapping was done via GenBank accession numbers. GO annotations for UniGene clusters were extracted with ACID [27], and completed by back propagating all lower level associations on the GO graph.
Authors' contributions
TB and MK implemented the algorithms in Catmap. All authors participated in the design of the study, prepared, read, and approved the final manuscript.
Acknowledgments
MK and TB are grateful for financial support from the Swedish Foundation for Strategic Research. PE was supported by the Swedish Foundation for Strategic Research through the Lund Center for Stem Cell Biology and Cell Therapy. The authors also thank Kasper Astrup Eriksen, Peter Johansson, Henrik Johansson, Carsten Peterson and Markus Ringnér for fruitful discussions.
References

Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, Bussey KJ, Riss J, Barrett JC, Weinstein JN: Gominer: a resource for biological interpretation of genomic and proteomic data.
Genome Biol 2003, 4:R28. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Robinson MD, Grigull J, Mohammad N, Hughes TR: Funspec: a webbased cluster interpreter for yeast.
BMC Bioinformatics 2002, 3:35. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Khatri P, Draghici S, Ostermeier GC, Krawetz SA: Profiling gene expression using ontoexpress.
Genomics 2002, 79:266270. PubMed Abstract  Publisher Full Text

Doniger SW, Salomonis N, Dahlquist KD, Vranizan K, Lawlor SC, Conklin BR: Mappfinder: using gene ontology and genmapp to create a global geneexpression profile from microarray data.
Genome Biol 2003, 4:R7. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Beissbarth T, Speed T: GOstat: Find statistically overrepresented Gene Ontologies within a group of genes.
Bioinformatics 2004, 20:14641465. PubMed Abstract  Publisher Full Text

Hosack DA, Dennis G Jr, Sherman BT, Lane HC, Lempicki RA: Identifying biological themes within lists of genes with EASE.
Genome Biol 2003, 4:R70. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Berriz GF, King OD, Bryant B, Sander C, Roth FP: Characterizing gene sets with funcassociate.
Bioinformatics 2003, 19:25022504. PubMed Abstract  Publisher Full Text

Breitling R, Amtmann A, Herzyk P: Iterative Group Analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experiments.
BMC Bioinformatics 2004, 5:34. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, IsselTarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. the gene ontology consortium.
Nat Genet 2000, 25:2529. PubMed Abstract  Publisher Full Text

Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen B, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycleregulated genes of the yeast saccharomyces cerevisiae by microarray hybridization.
Mol Biol Cell 1998, 9:32733297. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Science 1999, 286:531537. PubMed Abstract  Publisher Full Text

Khan J, Wei JS, Ringnér M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.
Nat Med 2001, 7:673679. PubMed Abstract  Publisher Full Text

van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer.
Nature 2002, 415:530536. PubMed Abstract  Publisher Full Text

Kolmogorov AN: Sulla determinazione empirica di una legge di distribuzione.

Smirnov NV: On the estimation of the discrepancy between empirical curves of distribution for two independent samples.

Jensen LJ, Knudsen S: Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation.
Bioinformatics 2000, 16:326333. PubMed Abstract  Publisher Full Text

Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC1alpharesponsive genes involved in oxidative phosphorylati on are coordinately downregulated in human diabetes.
Nat Genet 2003, 34(3):26773. PubMed Abstract  Publisher Full Text

Catmap website [http://bioinfo.thep.lu.se/Catmap] webcite

Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.
Proc Natl Acad Sci USA 1999, 96:67456750. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Kuhajda FP: Fattyacid synthase and human cancer: new perspectives on its role in tumor biology.
Nutrition 2000, 16:202208. PubMed Abstract  Publisher Full Text

KumarSinha C, Ignatoski KW, Lippman ME, Ethier SP, Chinnaiyan AM: Transcriptome analysis of her2 reveals a molecular connection to fatty acid synthesis.
Cancer Res 2003, 63:132139. PubMed Abstract  Publisher Full Text

Cora D, Di Cunto F, Provero P, L Silengo P, Caselle M: Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrepresented upstream motifs.
BMC Bioinformatics 2004, 5:57. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Fisher RA: The use of multiple measurements in taxonomic problems.

Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA: Global functional profiling of gene expression.
Genomics 2003, 81:98104. PubMed Abstract  Publisher Full Text

UniGene [http://www.ncbi.nlm.nih.gov/UniGene] webcite

Ringnér M, Veerla S, Andersson S, Staaf J, Häkkinen J: ACID: a database for microarray clone information.
Bioinformatics 2004, 20:23052306. PubMed Abstract  Publisher Full Text