Email updates

Keep up to date with the latest news and content from BMC Medical Informatics and Decision Making and BioMed Central.

Open Access Highly Accessed Research article

Creating a medical English-Swedish dictionary using interactive word alignment

Mikael Nyström1*, Magnus Merkel2, Lars Ahrenberg2, Pierre Zweigenbaum345, Håkan Petersson1 and Hans Åhlfeldt1

Author Affiliations

1 Department of Biomedical Engineering, Linköpings universitet, SE-58185 Linköping, Sweden

2 Department of Computer and Information Science, Linköpings universitet, SE-58183 Linköping, Sweden

3 Assistance Publique-Hôpitaux de Paris, F-75683 Paris Cedex 14, France

4 Inserm, U729, F-75270 Paris Cedex 06, France

5 Inalco, CRIM, F-75343 PARIS Cedex 07, France

For all author emails, please log on.

BMC Medical Informatics and Decision Making 2006, 6:35  doi:10.1186/1472-6947-6-35

Published: 12 October 2006

Abstract

Background

This paper reports on a parallel collection of rubrics from the medical terminology systems ICD-10, ICF, MeSH, NCSP and KSH97-P and its use for semi-automatic creation of an English-Swedish dictionary of medical terminology. The methods presented are relevant for many other West European language pairs than English-Swedish.

Methods

The medical terminology systems were collected in electronic format in both English and Swedish and the rubrics were extracted in parallel language pairs. Initially, interactive word alignment was used to create training data from a sample. Then the training data were utilised in automatic word alignment in order to generate candidate term pairs. The last step was manual verification of the term pair candidates.

Results

A dictionary of 31,000 verified entries has been created in less than three man weeks, thus with considerably less time and effort needed compared to a manual approach, and without compromising quality. As a side effect of our work we found 40 different translation problems in the terminology systems and these results indicate the power of the method for finding inconsistencies in terminology translations. We also report on some factors that may contribute to making the process of dictionary creation with similar tools even more expedient. Finally, the contribution is discussed in relation to other ongoing efforts in constructing medical lexicons for non-English languages.

Conclusion

In three man weeks we were able to produce a medical English-Swedish dictionary consisting of 31,000 entries and also found hidden translation errors in the utilized medical terminology systems.