Evolutionary rates at codon sites may be used to align sequences and infer protein domain function
1 Evolutionary Medicine Unit, University of the Witwatersrand and National Health Laboratory Service, Johannesburg, South Africa
2 Plasmodium Molecular Research Unit, Department of Molecular Medicine and Haematology, University of the Witwatersrand and National Health Laboratory Service, Johannesburg, South Africa
3 Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, USA
4 School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
BMC Bioinformatics 2010, 11:151 doi:10.1186/1471-2105-11-151Published: 24 March 2010
Sequence alignments form part of many investigations in molecular biology, including the determination of phylogenetic relationships, the prediction of protein structure and function, and the measurement of evolutionary rates. However, to obtain meaningful results, a significant degree of sequence similarity is required to ensure that the alignments are accurate and the inferences correct. Limitations arise when sequence similarity is low, which is particularly problematic when working with fast-evolving genes, evolutionary distant taxa, genomes with nucleotide biases, and cases of convergent evolution.
A novel approach was conceptualized to address the "low sequence similarity" alignment problem. We developed an alignment algorithm termed FIRE (Functional Inference using the Rates of Evolution), which aligns sequences using the evolutionary rate at codon sites, as measured by the dN/dS ratio, rather than nucleotide or amino acid residues. FIRE was used to test the hypotheses that evolutionary rates can be used to align sequences and that the alignments may be used to infer protein domain function. Using a range of test data, we found that aligning domains based on evolutionary rates was possible even when sequence similarity was very low (for example, antibody variable regions). Furthermore, the alignment has the potential to infer protein domain function, indicating that domains with similar functions are subject to similar evolutionary constraints. These data suggest that an evolutionary rate-based approach to sequence analysis (particularly when combined with structural data) may be used to study cases of convergent evolution or when sequences have very low similarity. However, when aligning homologous gene sets with sequence similarity, FIRE did not perform as well as the best traditional alignment algorithms indicating that the conventional approach of aligning residues as opposed to evolutionary rates remains the method of choice in these cases.
FIRE provides proof of concept that it is possible to align sequences and infer domain function by using evolutionary rates rather than residue similarity. This represents a new approach to sequence analysis with a wide range of potential applications in molecular biology.