Open Access Highly Accessed Methodology article

Analysis of genome-wide association study data using the protein knowledge base

Sara Ballouz12, Jason Y Liu1, Martin Oti3, Bruno Gaeta2, Diane Fatkin45, Melanie Bahlo6 and Merridee A Wouters7*

Author Affiliations

1 Structural and Computational Biology Division, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, 2010, Australia

2 School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, 2052, Australia

3 Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands

4 School of Medical Sciences, University of New South Wales, Kensington, NSW, 2052, Australia

5 Molecular Cardiology and Biophysics Division, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, 2010, Australia

6 Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia

7 School of Life and Environmental Sciences, Deakin University, Geelong, VIC, 3217, Australia

For all author emails, please log on.

BMC Genetics 2011, 12:98  doi:10.1186/1471-2156-12-98

Published: 13 November 2011

Additional files

Additional file 1:

Gentrepid validation gene sets and additional benchmarking results.

Table S1 OMIM phenotype associated genes used as seeds for the seeded mode and as the known disease gene validation set.

Table S2 Genes included in the known validation set.

Table S3 Genes included in the WTCCC validation set.

Table S4 Specificity, Sensitivity and Enrichment ratios for validation sets across all phenotypes.

Table S5 LD versus naïve clustering.

Table S6 Comparison of the number of significant Gentrepid predictions between LD and adjacent gene selection sets.

Table S7 Total numbers of significant predictions across Gentrepid, GRAIL and WebGestalt.

Table S8 Specificity, Sensitivity and Enrichment ratios for WTCCC validation set for Gentrepid, GRAIL and WebGestalt.

Format: DOCX Size: 69KB Download file

Open Data

Additional file 2:

Figure S1 Q-Q plots of expected values of the associated trend test p-values versus observed generated for each phenotype in black and uniform distribution in grey.

Format: PDF Size: 112KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Figure S2 ROC curves for Gentrepid on known and WTCCC validation sets. CPS is represented by the dashed lines, CMP by the filled lines. The colors indicate the SNP-to-gene mapping set used. The first column from the left are the results for the known validation set using seeded mode, The second column are the known validation set under ab initio. The third column is the WTCCC validation set seeded results. And the fourth column the WTCCC set, ab initio. The top panels are the HS sets. The next set of panels the MHS set, the third MWS and the bottom panels the WS set. The grey line in each plot represents what a random guess should give. CPS is above the line for most cases. CMP is below. CPS with the 0.1 Mbp or adjacent set performs the best.

Format: PDF Size: 92KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data