MergeAlign: improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments
1 Marine Biological Association of the United Kingdom, The Laboratory, Citadel Hill, Plymouth, PL1 2PB, Devon, UK
2 Department of Plant Sciences, University of Oxford, South Parks Road, OX1 3RB, Oxford, UK
3 Centre for Mathematical Biology, Mathematical Institute, University of Oxford, 24-29 St Giles’, OX1 3LB, Oxford, UK
4 Oxford Centre for Interactive Systems Biology, Department of Biochemistry, University of Oxford, South Parks Road, OX1 3QU, Oxford, UK
5 Sir William Dunn School of Pathology, University of Oxford, South Parks Road, OX1 3RE, Oxford, UK
BMC Bioinformatics 2012, 13:117 doi:10.1186/1471-2105-13-117Published: 30 May 2012
The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs from multiple independent MSAs and assigns an alignment precision score to each column.
Using conventional benchmark tests we demonstrate that on average MergeAlign MSAs are more accurate than MSAs generated using any single matrix of sequence substitution. We show that MergeAlign column scores are related to alignment precision and hence provide an ab initio method of estimating alignment precision in the absence of curated reference MSAs. Using two novel and independent alignment performance tests that utilise a large set of orthologous gene families we demonstrate that increasing MSA performance leads to an increase in the performance of downstream phylogenetic analyses.
Using multiple tests of alignment performance we demonstrate that this novel method has broad general application in biological research.