Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Comprehensive comparison of graph based multiple protein sequence alignment strategies

Ilya Plyusnin1* and Liisa Holm12

Author Affiliations

1 Institute of Biotechnology, University of Helsinki, P.O. Box 56, Viikinkaari 5, FIN-00014 Helsinki, Finland

2 Department of Biological and Environmental Sciences, University of Helsinki, P.O.Box 56, Viikinkaari 5, FIN-00014 Helsinki, Finland

For all author emails, please log on.

BMC Bioinformatics 2012, 13:64  doi:10.1186/1471-2105-13-64

Published: 29 April 2012

Abstract

Background

Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different components of the alignment process and, thus, to investigate their contribution to the alignment quality and computation time. We systematically varied information sources, guiding trees, score transformations and iterative refinement options, and evaluated the resulting alignments on BAliBASE and SABmark.

Results

Our results indicate the optimal alignment strategy based on the choices compared. First, we show that pairwise global and local alignments contain sufficient information to construct a high quality multiple alignment. Second, single linkage clustering is almost invariably the best algorithm to build a guiding tree for progressive alignment. Third, triplet library extension, with introduction of new edges, is the most efficient consistency transformation of those compared. Alternatively, one can apply tree dependent partitioning as a post processing step, which was shown to be comparable with the best consistency transformation in both time and accuracy. Finally, propagating information beyond four transitive links introduces more noise than signal.

Conclusions

This is the first time multiple protein alignment strategies are comprehensively and clearly compared using a single implementation platform. In particular, we showed which of the existing consistency transformations and iterative refinement techniques are the most valid. Our implementation is freely available at http://ekhidna.biocenter.helsinki.fi/MMSA webcite and as a supplementary file attached to this article (see Additional file 1).

Additional file 1. Source code, Makefile, installation instructions and test alignments.

Format: GZ Size: KB Download fileOpen Data