Open Access Research article

Evaluating alignment quality between iconic language and reference terminologies using similarity metrics

Nicolas Griffon1234*, Gaetan Kerdelhué1, Lina F Soualmia1234, Tayeb Merabti1, Julien Grosjean1, Jean-Baptiste Lamy234, Alain Venot234, Catherine Duclos234 and Stefan J Darmoni1234

Author Affiliations

1 CISMeF, Rouen University Hospital, Normandy & TIBS, LITIS EA 4108, Institute for Research and Innovation in Biomedicine, Rouen, France

2 INSERM, U1142, LIMICS, F-75006 Paris, France

3 Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1142, LIMICS, F-75006 Paris, France

4 Université Paris 13, Sorbonne Paris Cité, LIMICS, (UMR_S 1142), F-93430 Villetaneuse, France

For all author emails, please log on.

BMC Medical Informatics and Decision Making 2014, 14:17  doi:10.1186/1472-6947-14-17

Published: 11 March 2014

Abstract

Background

Visualization of Concepts in Medicine (VCM) is a compositional iconic language that aims to ease information retrieval in Electronic Health Records (EHR), clinical guidelines or other medical documents. Using VCM language in medical applications requires alignment with medical reference terminologies. Alignment from Medical Subject Headings (MeSH) thesaurus and International Classification of Diseases – tenth revision (ICD10) to VCM are presented here. This study aim was to evaluate alignment quality between VCM and other terminologies using different measures of inter-alignment agreement before integration in EHR.

Methods

For medical literature retrieval purposes and EHR browsing, the MeSH thesaurus and the ICD10, both organized hierarchically, were aligned to VCM language. Some MeSH to VCM alignments were performed automatically but others were performed manually and validated. ICD10 to VCM alignment was entirely manually performed. Inter-alignment agreement was assessed on ICD10 codes and MeSH descriptors, sharing the same Concept Unique Identifiers in the Unified Medical Language System (UMLS). Three metrics were used to compare two VCM icons: binary comparison, crude Dice Similarity Coefficient (DSCcrude), and semantic Dice Similarity Coefficient (DSCsemantic), based on Lin similarity. An analysis of discrepancies was performed.

Results

MeSH to VCM alignment resulted in 10,783 relations: 1,830 of which were manually performed and 8,953 were automatically inherited. ICD10 to VCM alignment led to 19,852 relations. UMLS gathered 1,887 alignments between ICD10 and MeSH. Only 1,606 of them were used for this study. Inter-alignment agreement using only validated MeSH to VCM alignment was 74.2% [70.5-78.0]CI95%, DSCcrude was 0.93 [0.91-0.94]CI95%, and DSCsemantic was 0.96 [0.95-0.96]CI95%. Discrepancy analysis revealed that even if two thirds of errors came from the reviewers, UMLS was nevertheless responsible for one third.

Conclusions

This study has shown strong overall inter-alignment agreement between MeSH to VCM and ICD10 to VCM manual alignments. VCM icons have now been integrated into a guideline search engine (http://www.cismef.org webcite) and a health terminologies portal (http://www.hetop.eu webcite).

Keywords:
Terminology as topic; International classification of diseases; Medical subject headings; Vocabulary; Controlled; Alignment; Iconic language; Compositional language; Semantic distances; Inter-alignment agreement