Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: The International Conference on Intelligent Biology and Medicine (ICIBM) – Genomics

Open Access Research

GLAD4U: deriving and prioritizing gene lists from PubMed literature

Jérôme Jourquin12, Dexter Duncan1, Zhiao Shi34 and Bing Zhang12*

Author affiliations

1 Department of Biomedical Informatics, Vanderbilt University School of Medicine, 400 Eskind Biomedical Library, 2209 Garland Avenue, Nashville, TN 37232, USA

2 Department of Cancer Biology, Vanderbilt University School of Medicine, 2220 Pierce Avenue, PRB771, Nashville, TN 37232, USA

3 Advanced Computing Center for Research & Education, Vanderbilt University, Nashville, TN 37240, USA

4 Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37240, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2012, 13(Suppl 8):S20  doi:10.1186/1471-2164-13-S8-S20

Published: 17 December 2012

Abstract

Background

Answering questions such as "Which genes are related to breast cancer?" usually requires retrieving relevant publications through the PubMed search engine, reading these publications, and creating gene lists. This process is not only time-consuming, but also prone to errors.

Results

We report GLAD4U (Gene List Automatically Derived For You), a new, free web-based gene retrieval and prioritization tool. GLAD4U takes advantage of existing resources of the NCBI to ensure computational efficiency. The quality of gene lists created by GLAD4U for three Gene Ontology (GO) terms and three disease terms was assessed using corresponding "gold standard" lists curated in public databases. For all queries, GLAD4U gene lists showed very high recall but low precision, leading to low F-measure. As a comparison, EBIMed's recall was consistently lower than GLAD4U, but its precision was higher. To present the most relevant genes at the top of a list, we studied two prioritization methods based on publication count and the hypergeometric test, and compared the ranked lists and those generated by EBIMed to the gold standards. Both GLAD4U methods outperformed EBIMed for all queries based on a variety of quality metrics. Moreover, the hypergeometric method allowed for a better performance by thresholding genes with low scores. In addition, manual examination suggests that many false-positives could be explained by the incompleteness of the gold standards. The GLAD4U user interface accepts any valid queries for PubMed, and its output page displays the ranked gene list and information associated with each gene, chronologically-ordered supporting publications, along with a summary of the run and links for file export and functional enrichment and protein interaction network analysis.

Conclusions

GLAD4U has a high overall recall. Although precision is generally low, the prioritization methods successfully rank truly relevant genes at the top of the lists to facilitate efficient browsing. GLAD4U is simple to use, and its interface can be found at: http://bioinfo.vanderbilt.edu/glad4u webcite.