BMC Bioinformatics
|
Viewing options:Associated material:Related literature:- Articles citing this article
- Other articles by authors
- Related articles/pages
Tools:Post to:
|
Methodology articleCorrelation test to assess low-level processing of high-density oligonucleotide microarray dataAlexander Ploner1 , Lance D Miller2 , Per Hall1 , Jonas Bergh3 and Yudi Pawitan1  1
Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden 2
Genome Institute of Singapore, Singapore 3
Department of Oncology and Pathology, Cancer Center Karolinska, Radiumhemmet, Karolinska Institutet and University Hospital, Stockholm author email corresponding author email
BMC Bioinformatics 2005,
6:80doi:10.1186/1471-2105-6-80 Abstract
Background
There are currently a number of competing techniques for low-level processing of oligonucleotide array data. The choice of technique has a profound effect on subsequent statistical analyses, but there is no method to assess whether a particular technique is appropriate for a specific data set, without reference to external data.
Results
We analyzed coregulation between genes in order to detect insufficient normalization between arrays, where coregulation is measured in terms of statistical correlation. In a large collection of genes, a random pair of genes should have on average zero correlation, hence allowing a correlation test. For all data sets that we evaluated, and the three most commonly used low-level processing procedures including MAS5, RMA and MBEI, the housekeeping-gene normalization failed the test. For a real clinical data set, RMA and MBEI showed significant correlation for absent genes. We also found that a second round of normalization on the probe set level improved normalization significantly throughout.
Conclusion
Previous evaluation of low-level processing in the literature has been limited to artificial spike-in and mixture data sets. In the absence of a known gold-standard, the correlation criterion allows us to assess the appropriateness of low-level processing of a specific data set and the success of normalization for subsets of genes. |