Log on / register
Feedback | Support | My details
Open AccessResearch article

Combining gene expression data from different generations of oligonucleotide arrays

Kyu-Baek Hwang* 1 email, Sek Won Kong* 2,3 email, Steve A Greenberg4,5 email and Peter J Park5,6 email

1School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea

2Molecular Medicine, Beth Israel Deaconess Medical Center, 330 Brookline Ave, Boston, MA 02215, USA

3Bauer Center for Genomics Research, Harvard University, 7 Divinity Ave, Cambridge, MA 02138, USA

4Department of Neurology, Brigham and Women's Hospital, 75 Francis Street, Boston, MA 02115, USA

5Children's Hospital Informatics Program, 300 Longwood Ave, Boston, MA 02115, USA

6Harvard-Partners Center for Genetics and Genomics, 77 Avenue Louis Pasteur, Boston, MA 02115, USA

author email corresponding author email* Contributed equally

BMC Bioinformatics 2004, 5:159doi:10.1186/1471-2105-5-159

Published: 25 October 2004

Abstract

Background

One of the important challenges in microarray analysis is to take full advantage of previously accumulated data, both from one's own laboratory and from public repositories. Through a comparative analysis on a variety of datasets, a more comprehensive view of the underlying mechanism or structure can be obtained. However, as we discover in this work, continual changes in genomic sequence annotations and probe design criteria make it difficult to compare gene expression data even from different generations of the same microarray platform.

Results

We first describe the extent of discordance between the results derived from two generations of Affymetrix oligonucleotide arrays, as revealed in cluster analysis and in identification of differentially expressed genes. We then propose a method for increasing comparability. The dataset we use consists of a set of 14 human muscle biopsy samples from patients with inflammatory myopathies that were hybridized on both HG-U95Av2 and HG-U133A human arrays. We find that the use of the probe set matching table for comparative analysis provided by Affymetrix produces better results than matching by UniGene or LocusLink identifiers but still remains inadequate. Rescaling of expression values for each gene across samples and data filtering by expression values enhance comparability but only for few specific analyses. As a generic method for improving comparability, we select a subset of probes with overlapping sequence segments in the two array types and recalculate expression values based only on the selected probes. We show that this filtering of probes significantly improves the comparability while retaining a sufficient number of probe sets for further analysis.

Conclusions

Compatibility between high-density oligonucleotide arrays is significantly affected by probe-level sequence information. With a careful filtering of the probes based on their sequence overlaps, data from different generations of microarrays can be combined more effectively.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.