Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Research article

A statistical framework for consolidating "sibling" probe sets for Affymetrix GeneChip data

Hua Li1*, Dongxiao Zhu23 and Malcolm Cook1

Author Affiliations

1 Bioinformatics Center, Stowers Institute for Medical Research, 1000 E 50th St, Kansas City, MO 64110, USA

2 Department of Computer Science, University of New Orleans, New Orleans, LA 70148, USA

3 Research Institute for Children, Children's Hospital, New Orleans, LA 70118, USA

For all author emails, please log on.

BMC Genomics 2008, 9:188  doi:10.1186/1471-2164-9-188

Published: 24 April 2008

Abstract

Background

Affymetrix GeneChip typically contains multiple probe sets per gene, defined as sibling probe sets in this study. These probe sets may or may not behave similar across treatments. The most appropriate way of consolidating sibling probe sets suitable for analysis is an open problem. We propose the Analysis of Variance (ANOVA) framework to decide which sibling probe sets can be consolidated.

Results

The ANOVA model allows us to separate the sibling probe sets into two types: those behave similarly across treatments and those behave differently across treatments. We found that consolidation of sibling probe sets of the former type results in large increase in the number of differentially expressed genes under various statistical criteria. The approach to selecting sibling probe sets suitable for consolidating is implemented in R language and freely available from http://research.stowers-institute.org/hul/affy/ webcite.

Conclusion

Our ANOVA analysis of sibling probe sets provides a statistical framework for selecting sibling probe sets for consolidation. Consolidating sibling probe sets by pooling data from each greatly improves the estimates of a gene expression level and results in identification of more biologically relevant genes. Sibling probe sets that do not qualify for consolidation may represent annotation errors or other artifacts, or may correspond to differentially processed transcripts of the same gene that require further analysis.