From hybridization theory to microarray data analysis: performance evaluation
Institute for Theoretical Physics, KULeuven, Celestijnenlaan 200D, B-3001 Leuven, Belgium
BMC Bioinformatics 2011, 12:464 doi:10.1186/1471-2105-12-464Published: 2 December 2011
Several preprocessing methods are available for the analysis of Affymetrix Genechips arrays. The most popular algorithms analyze the measured fluorescence intensities with statistical methods. Here we focus on a novel algorithm, AffyILM, available from Bioconductor, which relies on inputs from hybridization thermodynamics and uses an extended Langmuir isotherm model to compute transcript concentrations. These concentrations are then employed in the statistical analysis. We compared the performance of AffyILM and other traditional methods both in the old and in the newest generation of GeneChips.
Tissue mixture and Latin Square datasets (provided by Affymetrix) were used to assess the performances of the differential expression analysis depending on the preprocessing strategy. A correlation analysis conducted on the tissue mixture data reveals that the median-polish algorithm allows to best summarize AffyILM concentrations computed at the probe-level. Those correlation results are equivalent to the best correlations observed using popular preprocessing methods relying on intensity values. The performances of each tested preprocessing algorithm were quantified using the Latin Square HG-U133A dataset, thanks to the comparison of differential analysis results with the list of spiked genes. The figures of merit generated illustrates that the performances associated to AffyILM(medianpolish), inferred from the present statistical analysis, are comparable to the best performing strategies previously reported.
Converting probe intensities to estimates of target concentrations prior to the statistical analysis, AffyILM(medianpolish) is one of the best performing strategy currently available. Using hybridization theory, probe-level estimates of target concentrations should be identically distributed. In the future, a probe-level multivariate analysis of the concentrations should be compared to the univariate analysis of probe-set summarized expression data.