BMC Bioinformatics

official impact factor 3.03

This article is part of the supplement: Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)

Open Access Research

Comparison of automated candidate gene prediction systems using genes implicated in type 2 diabetes by genome-wide association studies

Erdahl T Teber1, Jason Y Liu1, Sara Ballouz1, Diane Fatkin1,2 and Merridee A Wouters1,2*

Author Affiliations

1 Victor Chang Cardiac Research Institute, 384 Victoria St, Darlinghurst, 2010, NSW, Australia

2 School of Medical Sciences, University of New South Wales, Sydney, Australia

For all author emails, please log on.

BMC Bioinformatics 2009, 10(Suppl 1):S69 doi:10.1186/1471-2105-10-S1-S69

Published: 30 January 2009

Abstract

Background

Automated candidate gene prediction systems allow geneticists to hone in on disease genes more rapidly by identifying the most probable candidate genes linked to the disease phenotypes under investigation. Here we assessed the ability of eight different candidate gene prediction systems to predict disease genes in intervals previously associated with type 2 diabetes by benchmarking their performance against genes implicated by recent genome-wide association studies.

Results

Using a search space of 9556 genes, all but one of the systems pruned the genome in favour of genes associated with moderate to highly significant SNPs. Of the 11 genes associated with highly significant SNPs identified by the genome-wide association studies, eight were flagged as likely candidates by at least one of the prediction systems. A list of candidates produced by a previous consensus approach did not match any of the genes implicated by 706 moderate to highly significant SNPs flagged by the genome-wide association studies. We prioritized genes associated with medium significance SNPs.

Conclusion

The study appraises the relative success of several candidate gene prediction systems against independent genetic data. Even when confronted with challengingly large intervals, the candidate gene prediction systems can successfully select likely disease genes. Furthermore, they can be used to filter statistically less-well-supported genetic data to select more likely candidates. We suggest consensus approaches fail because they penalize novel predictions made from independent underlying databases. To realize their full potential further work needs to be done on prioritization and annotation of genes.