Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Open Badges Software

An edit script for taxonomic classifications

Roderic DM Page1* and Gabriel Valiente2

Author Affiliations

1 DEEB, IBLS, University of Glasgow, Glasgow G12 8QQ, UK

2 Department of Software, Technical University of Catalonia, E-08034 Barcelona, Spain

For all author emails, please log on.

BMC Bioinformatics 2005, 6:208  doi:10.1186/1471-2105-6-208

Published: 25 August 2005



The NCBI taxonomy provides one of the most powerful ways to navigate sequence data bases but currently users are forced to formulate queries according to a single taxonomic classification. Given that there is not universal agreement on the classification of organisms, providing a single classification places constraints on the questions biologists can ask. However, maintaining multiple classifications is burdensome in the face of a constantly growing NCBI classification.


In this paper, we present a solution to the problem of generating modifications of the NCBI taxonomy, based on the computation of an edit script that summarises the differences between two classification trees. Our algorithms find the shortest possible edit script based on the identification of all shared subtrees, and only take time quasi linear in the size of the trees because classification trees have unique node labels.


These algorithms have been recently implemented, and the software is freely available for download from webcite.