Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Mining the Gene Wiki for functional genomic knowledge

Benjamin M Good1, Douglas G Howe2, Simon M Lin3, Warren A Kibbe3 and Andrew I Su1*

Author Affiliations

1 Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA

2 The Zebrafish Model Organism Database, University of Oregon, 5291, University of Oregon, Eugene, OR 97403, USA

3 Department of Biomedical Informatics, Northwestern University, 750 North Lake Shore Drive, Chicago, IL 60611, USA

For all author emails, please log on.

BMC Genomics 2011, 12:603  doi:10.1186/1471-2164-12-603

Published: 13 December 2011

Abstract

Background

Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology.

Results

Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses.

Conclusions

The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.