Identification of highly related references about gene-disease association
Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan
BMC Bioinformatics 2014, 15:286 doi:10.1186/1471-2105-15-286Published: 25 August 2014
Curation of gene-disease associations published in literature should be based on careful and frequent survey of the references that are highly related to specific gene-disease associations. Retrieval of the references is thus essential for timely and complete curation.
We present a technique CRFref (Conclusive, Rich, and Focused References) that, given a gene-disease pair < g, d>, ranks high those biomedical references that are likely to provide conclusive, rich, and focused results about g and d. Such references are expected to be highly related to the association between g and d. CRFref ranks candidate references based on their scores. To estimate the score of a reference r, CRFref estimates and integrates three measures: degree of conclusiveness, degree of richness, and degree of focus of r with respect to < g, d>. To evaluate CRFref, experiments are conducted on over one hundred thousand references for over one thousand gene-disease pairs. Experimental results show that CRFref performs significantly better than several typical types of baselines in ranking high those references that expert curators select to develop the summaries for specific gene-disease associations.
CRFref is a good technique to rank high those references that are highly related to specific gene-disease associations. It can be incorporated into existing search engines to prioritize biomedical references for curators and researchers, as well as those text mining systems that aim at the study of gene-disease associations.