Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

Goldsurfer2 (Gs2): A comprehensive tool for the analysis and visualization of genome wide association studies

Fredrik Pettersson1*, Andrew P Morris1, Michael R Barnes2 and Lon R Cardon13

Author Affiliations

1 Dept Bioinformatics, Wellcome Trust Centre, Oxford, UK

2 Molecular Discovery Informatics, GlaxoSmithKline Pharmaceuticals, Harlow, Essex, UK

3 Fred Hutchinson Cancer Research Center, Seattle, Washington, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9:138  doi:10.1186/1471-2105-9-138

Published: 4 March 2008

Abstract

Background

Genome wide association (GWA) studies are now being widely undertaken aiming to find the link between genetic variations and common diseases. Ideally, a well-powered GWA study will involve the measurement of hundreds of thousands of single nucleotide polymorphisms (SNPs) in thousands of individuals. The sheer volume of data generated by these experiments creates very high analytical demands. There are a number of important steps during the analysis of such data, many of which may present severe bottlenecks. The data need to be imported and reviewed to perform initial quality control (QC) before proceeding to association testing. Evaluation of results may involve further statistical analysis, such as permutation testing, or further QC of associated markers, for example, reviewing raw genotyping intensities. Finally significant associations need to be prioritised using functional and biological interpretation methods, browsing available biological annotation, pathway information and patterns of linkage disequilibrium (LD).

Results

We have developed an interactive and user-friendly graphical application to be used in all steps in GWA projects from initial data QC and analysis to biological evaluation and validation of results. The program is implemented in Java and can be used on all platforms.

Conclusion

Very large data sets (e.g. 500 k markers and 5000 samples) can be quality assessed, rapidly analysed and integrated with genomic sequence information. Candidate SNPs can be selected and functionally evaluated.