Log on / register
Feedback | Support | My details
Open AccessMethodology article

A hierarchical and modular approach to the discovery of robust associations in genome-wide association studies from pooled DNA samples

Paola Sebastiani1 email, Zhenming Zhao1 email, Maria M Abad-Grau2 email, Alberto Riva3 email, Stephen W Hartley1 email, Amanda E Sedgewick4 email, Alessandro Doria5 email, Monty Montano6 email, Efthymia Melista6 email, Dellara Terry7 email, Thomas T Perls7 email, Martin H Steinberg6 email and Clinton T Baldwin6 email

Department of Biostatistics, Boston University School of Public Health, Boston 02118 MA, USA

Department of Software Engineering, University of Granada, Granada 18071, Spain

Department of Molecular Genetics, University of Florida at Gainesville, Gainesville 32611 FL, USA

Bioinformatics Program, Boston University School of Engineering, Boston 02116 MA, USA

Joslin Diabetes Center, Harvard Medical School, Boston 02215 MA, USA

Department of Medicine, Boston University School of Medicine, Boston 02118 MA, USA

Geriatric Section, Boston Medical Center, Boston 02118 MA, USA

author email corresponding author email

BMC Genetics 2008, 9:6doi:10.1186/1471-2156-9-6

Published: 14 January 2008

Abstract

Background

One of the challenges of the analysis of pooling-based genome wide association studies is to identify authentic associations among potentially thousands of false positive associations.

Results

We present a hierarchical and modular approach to the analysis of genome wide genotype data that incorporates quality control, linkage disequilibrium, physical distance and gene ontology to identify authentic associations among those found by statistical association tests. The method is developed for the allelic association analysis of pooled DNA samples, but it can be easily generalized to the analysis of individually genotyped samples. We evaluate the approach using data sets from diverse genome wide association studies including fetal hemoglobin levels in sickle cell anemia and a sample of centenarians and show that the approach is highly reproducible and allows for discovery at different levels of synthesis.

Conclusion

Results from the integration of Bayesian tests and other machine learning techniques with linkage disequilibrium data suggest that we do not need to use too stringent thresholds to reduce the number of false positive associations. This method yields increased power even with relatively small samples. In fact, our evaluation shows that the method can reach almost 70% sensitivity with samples of only 100 subjects.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.