Phylogenetic representativeness: a new method for evaluating taxon sampling in evolutionary studies
1 Department of "Biologia Evoluzionistica Sperimentale", University of Bologna, Via Selmi, 3 - 40126 Bologna, Italy
2 Department of Biology and Evolution, University of Ferrara, Via Borsari, 46 - 44100 Ferrara, Italy
BMC Bioinformatics 2010, 11:209 doi:10.1186/1471-2105-11-209Published: 27 April 2010
Taxon sampling is a major concern in phylogenetic studies. Incomplete, biased, or improper taxon sampling can lead to misleading results in reconstructing evolutionary relationships. Several theoretical methods are available to optimize taxon choice in phylogenetic analyses. However, most involve some knowledge about the genetic relationships of the group of interest (i.e., the ingroup), or even a well-established phylogeny itself; these data are not always available in general phylogenetic applications.
We propose a new method to assess taxon sampling developing Clarke and Warwick statistics. This method aims to measure the "phylogenetic representativeness" of a given sample or set of samples and it is based entirely on the pre-existing available taxonomy of the ingroup, which is commonly known to investigators. Moreover, our method also accounts for instability and discordance in taxonomies. A Python-based script suite, called PhyRe, has been developed to implement all analyses we describe in this paper.
We show that this method is sensitive and allows direct discrimination between representative and unrepresentative samples. It is also informative about the addition of taxa to improve taxonomic coverage of the ingroup. Provided that the investigators' expertise is mandatory in this field, phylogenetic representativeness makes up an objective touchstone in planning phylogenetic studies.