Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

A novel method for gathering and prioritizing disease candidate genes based on construction of a set of disease-related MeSH® terms

Toshihide Ono12 and Satoru Kuhara1*

Author Affiliations

1 Department of Genetic Resources Technology, Faculty of Agriculture, Kyushu University, 6-10-1 Hakozaki Higashi-ku, Fukuoka 812-8581, Japan

2 Institute of Biomedical Innovation, Otsuka Pharmaceutical Co. Ltd., 463-10 Kagasuno Kawauchi-cho, Tokushima 771-0192, Japan

For all author emails, please log on.

BMC Bioinformatics 2014, 15:179  doi:10.1186/1471-2105-15-179

Published: 10 June 2014

Abstract

Background

Understanding the molecular mechanisms involved in disease is critical for the development of more effective and individualized strategies for prevention and treatment. The amount of disease-related literature, including new genetic information on the molecular mechanisms of disease, is rapidly increasing. Extracting beneficial information from literature can be facilitated by computational methods such as the knowledge-discovery approach. Several methods for mining gene-disease relationships using computational methods have been developed, however, there has been a lack of research evaluating specific disease candidate genes.

Results

We present a novel method for gathering and prioritizing specific disease candidate genes. Our approach involved the construction of a set of Medical Subject Headings (MeSH) terms for the effective retrieval of publications related to a disease candidate gene. Information regarding the relationships between genes and publications was obtained from the gene2pubmed database. The set of genes was prioritized using a “weighted literature score” based on the number of publications and weighted by the number of genes occurring in a publication. Using our method for the disease states of pain and Alzheimer’s disease, a total of 1101 pain candidate genes and 2810 Alzheimer’s disease candidate genes were gathered and prioritized. The precision was 0.30 and the recall was 0.89 in the case study of pain. The precision was 0.04 and the recall was 0.6 in the case study of Alzheimer’s disease. The precision-recall curve indicated that the performance of our method was superior to that of other publicly available tools.

Conclusions

Our method, which involved the use of a set of MeSH terms related to disease candidate genes and a novel weighted literature score, improved the accuracy of gathering and prioritizing candidate genes by focusing on a specific disease.