Log on / register
Feedback | Support | My details
Open AccessHighly AccessResearch article

PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval

Jimmy Lin1,2 email

1National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA

2The iSchool, University of Maryland, College Park, Maryland, USA

author email corresponding author email

BMC Bioinformatics 2008, 9:270doi:10.1186/1471-2105-9-270

Published: 6 June 2008

Abstract

Background

Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed® search interface, a MEDLINE® citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web.

Results

We conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics.

Conclusion

The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.