Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Italian Society of Bioinformatics (BITS): Annual Meeting 2005

Open Access Research article

Mining published lists of cancer related microarray experiments: Identification of a gene expression signature having a critical role in cell-cycle control

Giacomo Finocchiaro12, Francesco Mancuso12 and Heiko Muller12*

Author Affiliations

1 European Institute of Oncology, Via Ripamonti 435, 20141 Milan, Italy

2 IFOM Foundation, Via Adamello 16, 20139 Milan, Italy

For all author emails, please log on.

BMC Bioinformatics 2005, 6(Suppl 4):S14  doi:10.1186/1471-2105-6-S4-S14

Published: 1 December 2005



Routine application of gene expression microarray technology is rapidly producing large amounts of data that necessitate new approaches of analysis. The analysis of a specific microarray experiment profits enormously from cross-comparing to other experiments. This process is generally performed by numerical meta-analysis of published data where the researcher chooses the datasets to be analyzed based on assumptions about the biological relations of published datasets to his own data, thus severely limiting the possibility of finding surprising connections. Here we propose using a repository of published gene lists for the identification of interesting datasets to be subjected to more detailed numerical analysis.


We have compiled lists of genes that have been reported as differentially regulated in cancer related microarray studies. We searched these gene lists for statistically significant overlaps with lists of genes regulated by the tumor suppressors p16 and pRB. We identified a highly significant overlap of p16 and pRB target genes with genes regulated by the EWS/FLI fusion protein. Detailed numerical analysis of these data identified two sets of genes with clearly distinct roles in the G1/S and the G2/M phases of the cell cycle, as measured by enrichment of Gene Ontology categories.


We show that mining of published gene lists in the absence of numerical detail about gene expression levels constitutes a fast, easy to perform, widely applicable, and unbiased route towards the identification of biologically related gene expression microarray datasets.