Open Access Research article

A MEDLINE categorization algorithm

Stefan J Darmoni12*, Aurelie Névéol12, Jean-Marie Renard3, Jean-Francois Gehanno2, Lina F Soualmia12, Badisse Dahamna1 and Benoit Thirion1

Author Affiliations

1 CISMeF, Rouen University Hospital, 1, rue de Germont – 76031 Rouen, France

2 Perception and Information Systems Laboratory & GCSIS, Medical School, University of Rouen, France

3 CERIM, EA-2694, Medical School, University of Lille2, 1, Place de Verdun 59045 Lille Cedex, France

For all author emails, please log on.

BMC Medical Informatics and Decision Making 2006, 6:7  doi:10.1186/1472-6947-6-7

Published: 7 February 2006



Categorization is designed to enhance resource description by organizing content description so as to enable the reader to grasp quickly and easily what are the main topics discussed in it. The objective of this work is to propose a categorization algorithm to classify a set of scientific articles indexed with the MeSH thesaurus, and in particular those of the MEDLINE bibliographic database. In a large bibliographic database such as MEDLINE, finding materials of particular interest to a specialty group, or relevant to a particular audience, can be difficult. The categorization refines the retrieval of indexed material. In the CISMeF terminology, metaterms can be considered as super-concepts. They were primarily conceived to improve recall in the CISMeF quality-controlled health gateway.


The MEDLINE categorization algorithm (MCA) is based on semantic links existing between MeSH terms and metaterms on the one hand and between MeSH subheadings and metaterms on the other hand. These links are used to automatically infer a list of metaterms from any MeSH term/subheading indexing. Medical librarians manually select the semantic links.


The MEDLINE categorization algorithm lists the medical specialties relevant to a MEDLINE file by decreasing order of their importance. The MEDLINE categorization algorithm is available on a Web site. It can run on any MEDLINE file in a batch mode. As an example, the top 3 medical specialties for the set of 60 articles published in BioMed Central Medical Informatics & Decision Making, which are currently indexed in MEDLINE are: information science, organization and administration and medical informatics.


We have presented a MEDLINE categorization algorithm in order to classify the medical specialties addressed in any MEDLINE file in the form of a ranked list of relevant specialties. The categorization method introduced in this paper is based on the manual indexing of resources with MeSH (terms/subheadings) pairs by NLM indexers. This algorithm may be used as a new bibliometric tool.