Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Research article

Comparison and consolidation of microarray data sets of human tissue expression

Jenny Russ1 and Matthias E Futschik12*

  • * Corresponding author: Matthias E Futschik mfutschik@ualg.pt

  • † Equal contributors

Author Affiliations

1 Institute for Theoretical Biology, Charité, Humboldt University, Invalidenstrasse 43, 10115 Berlin, Germany

2 Institute for Biotechnology and Bioengineering (Laboratório Associado), Centre for Molecular and Structural Biomedicine, University of Algarve, 8005-139 Faro, Portugal

For all author emails, please log on.

BMC Genomics 2010, 11:305  doi:10.1186/1471-2164-11-305

Published: 14 May 2010

Abstract

Background

Human tissue displays a remarkable diversity in structure and function. To understand how such diversity emerges from the same DNA, systematic measurements of gene expression across different tissues in the human body are essential. Several recent studies addressed this formidable task using microarray technologies. These large tissue expression data sets have provided us an important basis for biomedical research. However, it is well known that microarray data can be compromised by high noise level and various experimental artefacts. Critical comparison of different data sets can help to reveal such errors and to avoid pitfalls in their application.

Results

We present here the first comparison and integration of four freely available tissue expression data sets generated using three different microarray platforms and containing a total of 377 microarray hybridizations. When assessing the tissue expression of genes, we found that the results considerably depend on the chosen data set. Nevertheless, the comparison also revealed statistically significant similarity of gene expression profiles across different platforms. This enabled us to construct consolidated lists of platform-independent tissue-specific genes using a set of complementary measures. Follow-up analyses showed that results based on consolidated data tend to be more reliable.

Conclusions

Our study strongly indicates that the consolidation of the four different tissue expression data sets can increase data quality and can lead to biologically more meaningful results. The provided compendium of platform-independent gene lists should facilitate the identification of novel tissue-specific marker genes.