Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

GapCoder automates the use of indel characters in phylogenetic analysis

Nelson D Young1* and John Healy2

Author Affiliations

1 Department of Biological Sciences, Duquesne University, Pittsburgh, PA 15219, USA

2 Biology Department, Trinity University, 715 Stadium Dr., San Antonio, TX 78212, USA

For all author emails, please log on.

BMC Bioinformatics 2003, 4:6  doi:10.1186/1471-2105-4-6

Published: 19 February 2003

Abstract

Background

Several ways of incorporating indels into phylogenetic analysis have been suggested. Simple indel coding has two strengths: (1) biological realism and (2) efficiency of analysis. In the method, each indel with different start and/or end positions is considered to be a separate character. The presence/absence of these indel characters is then added to the data set.

Algorithm

We have written a program, GapCoder to automate this procedure. The program can input PIR format aligned datasets, find the indels and add the indel-based characters. The output is a NEXUS format file, which includes a table showing what region each indel characters is based on. If regions are excluded from analysis, this table makes it easy to identify the corresponding indel characters for exclusion.

Discussion

Manual implementation of the simple indel coding method can be very time-consuming, especially in data sets where indels are numerous and/or overlapping. GapCoder automates this method and is therefore particularly useful during procedures where phylogenetic analyses need to be repeated many times, such as when different alignments are being explored or when various taxon or character sets are being explored. GapCoder is currently available for Windows from http://www.home.duq.edu/~youngnd/GapCoder webcite.