BMC Bioinformatics Volume 7
|
Viewing options:Associated material:Related literature:- Articles citing this article
- Other articles by authors
- Related articles/pages
Tools:Post to:
|
 Research articleHow to decide? Different methods of calculating gene expression from short oligonucleotide array data will give different resultsFrank F Millenaar1 , John Okyere2 , Sean T May2 , Martijn van Zanten1 , Laurentius ACJ Voesenek1 and Anton JM Peeters1  1Plant Ecophysiology, Institute of Environmental Biology, Faculty of Science, Utrecht University, Sorbonnelaan 16, 3584 CA Utrecht, The Netherlands 2Nottingham Arabidopsis Stock Centre (NASC), Plant Science Division, University of Nottingham, Sutton Bonington Campus, Loughborough LE12 5RD, UK author email corresponding author email
BMC Bioinformatics 2006,
7:137doi:10.1186/1471-2105-7-137 Abstract
Background
Short oligonucleotide arrays for transcript profiling have been available for several years. Generally, raw data from these arrays are analysed with the aid of the Microarray Analysis Suite or GeneChip Operating Software (MAS or GCOS) from Affymetrix. Recently, more methods to analyse the raw data have become available. Ideally all these methods should come up with more or less the same results. We set out to evaluate the different methods and include work on our own data set, in order to test which method gives the most reliable results.
Results
Calculating gene expression with 6 different algorithms (MAS5, dChip PMMM, dChip PM, RMA, GC-RMA and PDNN) using the same (Arabidopsis) data, results in different calculated gene expression levels. Consequently, depending on the method used, different genes will be identified as differentially regulated. Surprisingly, there was only 27 to 36% overlap between the different methods. Furthermore, 47.5% of the genes/probe sets showed good correlation between the mismatch and perfect match intensities.
Conclusion
After comparing six algorithms, RMA gave the most reproducible results and showed the highest correlation coefficients with Real Time RT-PCR data on genes identified as differentially expressed by all methods. However, we were not able to verify, by Real Time RT-PCR, the microarray results for most genes that were solely calculated by RMA. Furthermore, we conclude that subtraction of the mismatch intensity from the perfect match intensity results most likely in a significant underestimation for at least 47.5% of the expression values. Not one algorithm produced significant expression values for genes present in quantities below 1 pmol. If the only purpose of the microarray experiment is to find new candidate genes, and too many genes are found, then mutual exclusion of the genes predicted by contrasting methods can be used to narrow down the list of new candidate genes by 64 to 73%. |