BMC Bioinformatics

official impact factor 3.03

Open Access Highly Access Software

BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets

Mark J Alston4,1*, John Seers2, Jay CD Hinton1,5 and Sacha Lucchini3

Author Affiliations

1 Foodborne Bacterial Pathogens, Institute of Food Research, Norwich Research Park, Norwich, NR4 7UA, UK

2 Bioinformatics & Statistics, Institute of Food Research, Norwich Research Park, Norwich, NR4 7UA, UK

3 Integrated Biology of the GI Tract, Institute of Food Research, Norwich Research Park, Norwich, NR4 7UA, UK

4 Current address: The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK

5 Current address: Department of Microbiology, Moyne Institute of Preventive Medicine, School of Genetics and Microbiology, Trinity College, Dublin 2, Ireland

For all author emails, please log on.

BMC Bioinformatics 2010, 11:73 doi:10.1186/1471-2105-11-73

Published: 3 February 2010

Abstract

Background

The development of DNA microarrays has facilitated the generation of hundreds of thousands of transcriptomic datasets. The use of a common reference microarray design allows existing transcriptomic data to be readily compared and re-analysed in the light of new data, and the combination of this design with large datasets is ideal for 'systems'-level analyses. One issue is that these datasets are typically collected over many years and may be heterogeneous in nature, containing different microarray file formats and gene array layouts, dye-swaps, and showing varying scales of log2- ratios of expression between microarrays. Excellent software exists for the normalisation and analysis of microarray data but many data have yet to be analysed as existing methods struggle with heterogeneous datasets; options include normalising microarrays on an individual or experimental group basis. Our solution was to develop the Batch Anti-Banana Algorithm in R (BABAR) algorithm and software package which uses cyclic loess to normalise across the complete dataset. We have already used BABAR to analyse the function of Salmonella genes involved in the process of infection of mammalian cells.

Results

The only input required by BABAR is unprocessed GenePix or BlueFuse microarray data files. BABAR provides a combination of 'within' and 'between' microarray normalisation steps and diagnostic boxplots. When applied to a real heterogeneous dataset, BABAR normalised the dataset to produce a comparable scaling between the microarrays, with the microarray data in excellent agreement with RT-PCR analysis. When applied to a real non-heterogeneous dataset and a simulated dataset, BABAR's performance in identifying differentially expressed genes showed some benefits over standard techniques.

Conclusions

BABAR is an easy-to-use software tool, simplifying the simultaneous normalisation of heterogeneous two-colour common reference design cDNA microarray-based transcriptomic datasets. We show BABAR transforms real and simulated datasets to allow for the correct interpretation of these data, and is the ideal tool to facilitate the identification of differentially expressed genes or network inference analysis from transcriptomic datasets.