Comparing sequences without using alignments: application to HIV/SIV subtyping
1 Institut Mathématique de Luminy, UMR 6206, Campus de Luminy, Case 907, 13288 Marseille Cedex 9, France
2 Equipe Bioinfo, LIFL, USTL, cité scientifique, Batiment M3, 59655 Villeneuve d'Ascq, France
3 Department of Bioinformatics, Institute of Microbiology and Genetics, University of Goettingen. Goettingen 37077, Germany
4 Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
5 Laboratoire Statistique et Génome, UMR 8071, Tour Evry 2, 523 Place des Terrasses, 91034 Evry, France
BMC Bioinformatics 2007, 8:1 doi:10.1186/1471-2105-8-1Published: 2 January 2007
In general, the construction of trees is based on sequence alignments. This procedure, however, leads to loss of informationwhen parts of sequence alignments (for instance ambiguous regions) are deleted before tree building. To overcome this difficulty, one of us previously introduced a new and rapid algorithm that calculates dissimilarity matrices between sequences without preliminary alignment.
In this paper, HIV (Human Immunodeficiency Virus) and SIV (Simian Immunodeficiency Virus) sequence data are used to evaluate this method. The program produces tree topologies that are identical to those obtained by a combination of standard methods detailed in the HIV Sequence Compendium. Manual alignment editing is not necessary at any stage. Furthermore, only one user-specified parameter is needed for constructing trees.
The extensive tests on HIV/SIV subtyping showed that the virus classifications produced by our method are in good agreement with our best taxonomic knowledge, even in non-coding LTR (Long Terminal Repeat) regions that are not tractable by regular alignment methods due to frequent duplications/insertions/deletions. Our method, however, is not limited to the HIV/SIV subtyping. It provides an alternative tree construction without a time-consuming aligning procedure.