Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Automatic detection of anchor points for multiple sequence alignment

Florian Pitschi1, Claudine Devauchelle2 and Eduardo Corel23*

Author Affiliations

1 Partner Institute for Computational Biology, CAS-MPG, 320 Yue Yang Rd, 200031 Shanghai, China

2 Laboratoire Statistique et Génome (LSG), CNRS UMR 8071, INRA 1152, Université d'Evry, Tour Evry2, Place des Terrasses, 91034 Evry Cedex, France

3 Georg-August-Universität, Institut für Mikrobiologie und Genetik, Goldschmidtstraße 1, 37077 Göttingen, Germany

For all author emails, please log on.

BMC Bioinformatics 2010, 11:445  doi:10.1186/1471-2105-11-445

Published: 2 September 2010

Abstract

Background

Determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. Multiple local similarities are be expected to be more adequate, as more biologically relevant. However, even good multiple local similarities can prove incompatible with the ordering of an alignment.

Results

We use a recently developed algorithm to detect multiple local similarities, which returns subsets of positions in the sequences sharing similar contexts of appearence. In this paper, we describe first how to get, with the help of this method, subsets of positions that could form partial columns in an alignment. We introduce next a graph-theoretic algorithm to detect (and remove) positions in the partial columns that are inconsistent with a multiple alignment. Partial columns can be used, for the time being, as guide only by a few MSA programs: ClustalW 2.0, DIALIGN 2 and T-Coffee. We perform tests on the effect of introducing these columns on the popular benchmark BAliBASE 3.

Conclusions

We show that the inclusion of our partial alignment columns, as anchor points, improve on the whole the accuracy of the aligner ClustalW on the benchmark BAliBASE 3.