Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

MergeAlign: improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments

Peter W Collingridge1 and Steven Kelly2345*

Author affiliations

1 Marine Biological Association of the United Kingdom, The Laboratory, Citadel Hill, Plymouth, PL1 2PB, Devon, UK

2 Department of Plant Sciences, University of Oxford, South Parks Road, OX1 3RB, Oxford, UK

3 Centre for Mathematical Biology, Mathematical Institute, University of Oxford, 24-29 St Giles’, OX1 3LB, Oxford, UK

4 Oxford Centre for Interactive Systems Biology, Department of Biochemistry, University of Oxford, South Parks Road, OX1 3QU, Oxford, UK

5 Sir William Dunn School of Pathology, University of Oxford, South Parks Road, OX1 3RE, Oxford, UK

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2012, 13:117  doi:10.1186/1471-2105-13-117

Published: 30 May 2012

Abstract

Background

The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs from multiple independent MSAs and assigns an alignment precision score to each column.

Results

Using conventional benchmark tests we demonstrate that on average MergeAlign MSAs are more accurate than MSAs generated using any single matrix of sequence substitution. We show that MergeAlign column scores are related to alignment precision and hence provide an ab initio method of estimating alignment precision in the absence of curated reference MSAs. Using two novel and independent alignment performance tests that utilise a large set of orthologous gene families we demonstrate that increasing MSA performance leads to an increase in the performance of downstream phylogenetic analyses.

Conclusion

Using multiple tests of alignment performance we demonstrate that this novel method has broad general application in biological research.