A novel method for gathering and prioritizing disease candidate genes based on construction of a set of disease-related MeSH® terms
1 Department of Genetic Resources Technology, Faculty of Agriculture, Kyushu University, 6-10-1 Hakozaki Higashi-ku, Fukuoka 812-8581, Japan
2 Institute of Biomedical Innovation, Otsuka Pharmaceutical Co. Ltd., 463-10 Kagasuno Kawauchi-cho, Tokushima 771-0192, Japan
BMC Bioinformatics 2014, 15:179 doi:10.1186/1471-2105-15-179Published: 10 June 2014
Understanding the molecular mechanisms involved in disease is critical for the development of more effective and individualized strategies for prevention and treatment. The amount of disease-related literature, including new genetic information on the molecular mechanisms of disease, is rapidly increasing. Extracting beneficial information from literature can be facilitated by computational methods such as the knowledge-discovery approach. Several methods for mining gene-disease relationships using computational methods have been developed, however, there has been a lack of research evaluating specific disease candidate genes.
We present a novel method for gathering and prioritizing specific disease candidate genes. Our approach involved the construction of a set of Medical Subject Headings (MeSH) terms for the effective retrieval of publications related to a disease candidate gene. Information regarding the relationships between genes and publications was obtained from the gene2pubmed database. The set of genes was prioritized using a “weighted literature score” based on the number of publications and weighted by the number of genes occurring in a publication. Using our method for the disease states of pain and Alzheimer’s disease, a total of 1101 pain candidate genes and 2810 Alzheimer’s disease candidate genes were gathered and prioritized. The precision was 0.30 and the recall was 0.89 in the case study of pain. The precision was 0.04 and the recall was 0.6 in the case study of Alzheimer’s disease. The precision-recall curve indicated that the performance of our method was superior to that of other publicly available tools.
Our method, which involved the use of a set of MeSH terms related to disease candidate genes and a novel weighted literature score, improved the accuracy of gathering and prioritizing candidate genes by focusing on a specific disease.