Email updates

Keep up to date with the latest news and content from BMC Evolutionary Biology and BioMed Central.

Open Access Highly Accessed Research article

Inferring phylogenies with incomplete data sets: a 5-gene, 567-taxon analysis of angiosperms

J Gordon Burleigh12*, Khidir W Hilu3 and Douglas E Soltis2

Author Affiliations

1 National Evolutionary Synthesis Center (NESCent), Durham, NC 27705, USA

2 Department of Botany and Zoology, University of Florida, Gainesville, FL 32611, USA

3 Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA

For all author emails, please log on.

BMC Evolutionary Biology 2009, 9:61  doi:10.1186/1471-2148-9-61

Published: 17 March 2009



Phylogenetic analyses of angiosperm relationships have used only a small percentage of available sequence data, but phylogenetic data matrices often can be augmented with existing data, especially if one allows missing characters. We explore the effects on phylogenetic analyses of adding 378 matK sequences and 240 26S rDNA sequences to the complete 3-gene, 567-taxon angiosperm phylogenetic matrix of Soltis et al.


We performed maximum likelihood bootstrap analyses of the complete, 3-gene 567-taxon data matrix and the incomplete, 5-gene 567-taxon data matrix. Although the 5-gene matrix has more missing data (27.5%) than the 3-gene data matrix (2.9%), the 5-gene analysis resulted in higher levels of bootstrap support. Within the 567-taxon tree, the increase in support is most evident for relationships among the 170 taxa for which both matK and 26S rDNA sequences were added, and there is little gain in support for relationships among the 119 taxa having neither matK nor 26S rDNA sequences. The 5-gene analysis also places the enigmatic Hydrostachys in Lamiales (BS = 97%) rather than in Cornales (BS = 100% in 3-gene analysis). The placement of Hydrostachys in Lamiales is unprecedented in molecular analyses, but it is consistent with embryological and morphological data.


Adding available, and often incomplete, sets of sequences to existing data sets can be a fast and inexpensive way to increase support for phylogenetic relationships and produce novel and credible new phylogenetic hypotheses.