When species matches are unavailable are DNA barcodes correctly assigned to higher taxa? An assessment using sphingid moths
1 Department of Integrative Biology & Biodiversity Institute of Ontario, University of Guelph, Guelph, ON, N1G 2W1, Canada
2 Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
3 Department of Entomology, Natural History Museum, London, SW7 5BD, UK
4 Honorary Attaché, Muséum National d'Histoire Naturelle de Paris, Le Roc, F-47310 Laplume, France
5 ECODIV, Université de Rouen, Bâtiment IRESE A, Place Emile Blondel, F-76821 Mont Saint Aignan Cedex, France
BMC Ecology 2011, 11:18 doi:10.1186/1472-6785-11-18Published: 1 August 2011
When a specimen belongs to a species not yet represented in DNA barcode reference libraries there is disagreement over the effectiveness of using sequence comparisons to assign the query accurately to a higher taxon. Library completeness and the assignment criteria used have been proposed as critical factors affecting the accuracy of such assignments but have not been thoroughly investigated. We explored the accuracy of assignments to genus, tribe and subfamily in the Sphingidae, using the almost complete global DNA barcode reference library (1095 species) available for this family. Costa Rican sphingids (118 species), a well-documented, diverse subset of the family, with each of the tribes and subfamilies represented were used as queries. We simulated libraries with different levels of completeness (10-100% of the available species), and recorded assignments (positive or ambiguous) and their accuracy (true or false) under six criteria.
A liberal tree-based criterion assigned 83% of queries accurately to genus, 74% to tribe and 90% to subfamily, compared to a strict tree-based criterion, which assigned 75% of queries accurately to genus, 66% to tribe and 84% to subfamily, with a library containing 100% of available species (but excluding the species of the query). The greater number of true positives delivered by more relaxed criteria was negatively balanced by the occurrence of more false positives. This effect was most sharply observed with libraries of the lowest completeness where, for example at the genus level, 32% of assignments were false positives with the liberal criterion versus < 1% when using the strict. We observed little difference (< 8% using the liberal criterion) however, in the overall accuracy of the assignments between the lowest and highest levels of library completeness at the tribe and subfamily level.
Our results suggest that when using a strict tree-based criterion for higher taxon assignment with DNA barcodes, the likelihood of assigning a query a genus name incorrectly is very low, if a genus name is provided it has a high likelihood of being accurate, and if no genus match is available the query can nevertheless be assigned to a subfamily with high accuracy regardless of library completeness. DNA barcoding often correctly assigned sphingid moths to higher taxa when species matches were unavailable, suggesting that barcode reference libraries can be useful for higher taxon assignments long before they achieve complete species coverage.