This article is part of the supplement: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2010
A robust approach to optimizing multi-source information for enhancing genomics retrieval performance
1 Information Retrieval and Knowledge Management Research Lab, York University, Toronto, ON, M3J1P3, Canada
2 Department of Computer Science & Engineering, York University, Toronto, ON, M3J1P3, Canada
3 School of Information Technology, York University, Toronto, ON, M3J1P3, Canada
BMC Bioinformatics 2011, 12(Suppl 5):S6 doi:10.1186/1471-2105-12-S5-S6Published: 27 July 2011
The users desire to be provided short, specific answers to questions and put them in context by linking original sources from the biomedical literature. Through the use of information retrieval technologies, information systems retrieve information to index data based on all kinds of pre-defined searching techniques/functions such that various ranking strategies are designed depending on different sources. In this paper, we propose a robust approach to optimizing multi-source information for improving genomics retrieval performance.
In the proposed approach, we first consider a common scenario for a metasearch system that has access to multiple baselines with retrieving and ranking documents/passages by their own models. Then, given selected baselines from multiple sources, we investigate three modified fusion methods in the proposed approach, reciprocal, CombMNZ and CombSUM, to re-rank the candidates as the outputs for evaluation. Our empirical study on both 2007 and 2006 genomics data sets demonstrates the viability of the proposed approach for obtaining better performance. Furthermore, the experimental results show that the reciprocal method provides notable improvements on the individual baseline, especially on the passage2-level MAP and the aspect-level MAP.
From the extensive experiments on two TREC genomics data sets, we draw the following conclusions. For the three fusion methods proposed in the robust approach, the reciprocal method outperforms the CombMNZ and CombSUM methods obviously, and CombSUM works well on the passage2-level when compared with CombMNZ. Based on the multiple sources of DFR, BM25 and language model, we can observe that the alliance of giants achieves the best result. Meanwhile, under the same combination, the better the baseline performance is, the more contribution the baseline provides. These conclusions are very useful to direct the fusion work in the field of biomedical information retrieval.