Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2010

Open Access Proceedings

A robust approach to optimizing multi-source information for enhancing genomics retrieval performance

Qinmin Hu12, Jimmy Xiangji Huang13* and Jun Miao12

Author Affiliations

1 Information Retrieval and Knowledge Management Research Lab, York University, Toronto, ON, M3J1P3, Canada

2 Department of Computer Science & Engineering, York University, Toronto, ON, M3J1P3, Canada

3 School of Information Technology, York University, Toronto, ON, M3J1P3, Canada

For all author emails, please log on.

BMC Bioinformatics 2011, 12(Suppl 5):S6  doi:10.1186/1471-2105-12-S5-S6

Published: 27 July 2011

Abstract

Background

The users desire to be provided short, specific answers to questions and put them in context by linking original sources from the biomedical literature. Through the use of information retrieval technologies, information systems retrieve information to index data based on all kinds of pre-defined searching techniques/functions such that various ranking strategies are designed depending on different sources. In this paper, we propose a robust approach to optimizing multi-source information for improving genomics retrieval performance.

Results

In the proposed approach, we first consider a common scenario for a metasearch system that has access to multiple baselines with retrieving and ranking documents/passages by their own models. Then, given selected baselines from multiple sources, we investigate three modified fusion methods in the proposed approach, reciprocal, CombMNZ and CombSUM, to re-rank the candidates as the outputs for evaluation. Our empirical study on both 2007 and 2006 genomics data sets demonstrates the viability of the proposed approach for obtaining better performance. Furthermore, the experimental results show that the reciprocal method provides notable improvements on the individual baseline, especially on the passage2-level MAP and the aspect-level MAP.

Conclusions

From the extensive experiments on two TREC genomics data sets, we draw the following conclusions. For the three fusion methods proposed in the robust approach, the reciprocal method outperforms the CombMNZ and CombSUM methods obviously, and CombSUM works well on the passage2-level when compared with CombMNZ. Based on the multiple sources of DFR, BM25 and language model, we can observe that the alliance of giants achieves the best result. Meanwhile, under the same combination, the better the baseline performance is, the more contribution the baseline provides. These conclusions are very useful to direct the fusion work in the field of biomedical information retrieval.