BMC Bioinformatics

official impact factor 3.03

Open Access Methodology article

Data analysis issues for allele-specific expression using Illumina's GoldenGate assay

Matthew E Ritchie1*, Matthew S Forrest2, Antigone S Dimas3,4, Caroline Daelemans5, Emmanouil T Dermitzakis4, Panagiotis Deloukas2 and Simon Tavaré6

Author Affiliations

1 Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria, 3052, Australia

2 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

3 Wellcome Trust Center for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, UK

4 Department of Genetic Medicine and Development, University of Geneva Medical School, 1 Rue Michel-Servet, Geneva, 1211, Switzerland

5 Department of Obstetrics and Gynecology, Institute for Women's Health, University College London, 86-96 Chenies Mews, London, WC1E 6HX, UK

6 Department of Oncology, University of Cambridge, CRUK Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK

For all author emails, please log on.

BMC Bioinformatics 2010, 11:280 doi:10.1186/1471-2105-11-280

Published: 26 May 2010

Abstract

Background

High-throughput measurement of allele-specific expression (ASE) is a relatively new and exciting application area for array-based technologies. In this paper, we explore several data sets which make use of Illumina's GoldenGate BeadArray technology to measure ASE. This platform exploits coding SNPs to obtain relative expression measurements for alleles at approximately 1500 positions in the genome.

Results

We analyze data from a mixture experiment where genomic DNA samples from pairs of individuals of known genotypes are pooled to create allelic imbalances at varying levels for the majority of SNPs on the array. We observe that GoldenGate has less sensitivity at detecting subtle allelic imbalances (around 1.3 fold) compared to extreme imbalances, and note the benefit of applying local background correction to the data. Analysis of data from a dye-swap control experiment allowed us to quantify dye-bias, which can be reduced considerably by careful normalization. The need to filter the data before carrying out further downstream analysis to remove non-responding probes, which show either weak, or non-specific signal for each allele, was also demonstrated. Throughout this paper, we find that a linear model analysis of the data from each SNP is a flexible modelling strategy that allows for testing of allelic imbalances in each sample when replicate hybridizations are available.

Conclusions

Our analysis shows that local background correction carried out by Illumina's software, together with quantile normalization of the red and green channels within each array, provides optimal performance in terms of false positive rates. In addition, we strongly encourage intensity-based filtering to remove SNPs which only measure non-specific signal. We anticipate that a similar analysis strategy will prove useful when quantifying ASE on Illumina's higher density Infinium BeadChips.