Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Research article

Identification of metagenes and their Interactions through Large-scale Analysis of Arabidopsis Gene Expression Data

Tyler J Wilson, Liming Lai, Yuguang Ban and Steven X Ge*

Author Affiliations

Department of Mathematics and Statistics, South Dakota State University, Box 2220, Brookings, SD, 57007, USA

For all author emails, please log on.

BMC Genomics 2012, 13:237  doi:10.1186/1471-2164-13-237

Published: 13 June 2012

Additional files

Additional file 1:

metagenes. Excel spreadsheet containing the gene lists for each metagene as well as the gene ranks returned by NMF.

Format: XLSX Size: 105KB Download file

Open Data

Additional file 2:

GOBP. Excel spreadsheet containing GSEA Enrichment results for Biological Processes.

Format: XLSX Size: 81KB Download file

Open Data

Additional file 3:

GOCC. Excel spreadsheet containing GSEA Enrichment results for Cellular Components.

Format: XLSX Size: 56KB Download file

Open Data

Additional file 4:

GOMF. Excel spreadsheet containing GSEA Enrichment results for Molecular Functions.

Format: XLSX Size: 68KB Download file

Open Data

Additional file 5:

Figure S1. A plot showing the average correlation for 1000 randomly selected genes before and after the Empirical Bayes method was applied to adjust for batch effects. S2. A plot showing heat maps of the consensus matrix for k = {5,10,15,20}. S3. A plot showing heat maps of the consensus matrix for k = {30,35,40,45}. S4. A histogram of metagene coefficients, showing the δ=0.2 cut-off. Genes with coefficients greater than δ were included in the metagenes, and those with coefficients less than this were excluded. S5. Metagene correlation network for the gene ontology: cellular components. Each node in this network represents a metagene. The size of each node is proportional to the activity of the metagene within the dataset. The width of lines between a pair of nodes is proportional to the strength of the correlation between them. Positive correlations are denoted by red lines, and negative correlations by green lines. Only Spearman correlations with a p-value less than 10-12 are visible. The pie slices within each node represent the amount of enrichment for specific gene ontologies (the NES score). S6. Metagene correlation network for the gene ontology: molecular functions. Each node in this network represents a metagene. The size of each node is proportional to the activity of the metagene within the dataset. The width of lines between a pair of nodes is proportional to the strength of the correlation between them. Positive correlations are denoted by red lines, and negative correlations by green lines. Only Spearman correlations with a p-value less than 10-12 are visible. The pie slices within each node represent the amount of enrichment for specific gene ontologies (the NES score). S7. This heat map shows the z-values for all metagenes for the pathogen series in the dataset. Red indicates a metagene is more active in an experimental series, and green indicates it is suppressed. S8. Intersection p-values between clusters in the pathogen network from Atias, and the metagenes active in the pathogen series of the AtGenExpress dataset. Bright green cells represent significant statistical overlap. The intensity of the cells represents a log-10-transformed p-value returned by the hypergeometric test. S9. The NES score plotted in this heat map is a measure of metagene enrichment within a specific gene ontology involved with cellular components. Bright red cells indicate high enrichment. S10. The NES score plotted in this heat map is a measure of metagene enrichment within a specific gene ontology involved with molecular functions. Bright red cells indicate high enrichment.

Format: DOCX Size: 929KB Download file

Open Data

Additional file 6:

kruskal_wallis. Excel spreadsheet containing z-values returned by the Kruskal-Wallis test on the encoding coefficients.

Format: XLSX Size: 30KB Download file

Open Data