Optimizations for the EcoPod field identification tool
- Equal contributors
1 Department of Computer Science, Stanford University, Stanford, CA 95305, USA
2 Department of Biology, Stanford University, Stanford, CA 95305, USA
BMC Bioinformatics 2008, 9:150 doi:10.1186/1471-2105-9-150Published: 17 March 2008
We sketch our species identification tool for palm sized computers that helps knowledgeable observers with census activities. An algorithm turns an identification matrix into a minimal length series of questions that guide the operator towards identification. Historic observation data from the census geographic area helps minimize question volume. We explore how much historic data is required to boost performance, and whether the use of history negatively impacts identification of rare species. We also explore how characteristics of the matrix interact with the algorithm, and how best to predict the probability of observing a previously unseen species.
Point counts of birds taken at Stanford University's Jasper Ridge Biological Preserve between 2000 and 2005 were used to examine the algorithm. A computer identified species by correctly answering, and counting the algorithm's questions. We also explored how the character density of the key matrix and the theoretical minimum number of questions for each bird in the matrix influenced the algorithm. Our investigation of the required probability smoothing determined whether Laplace smoothing of observation probabilities was sufficient, or whether the more complex Good-Turing technique is required.
Historic data improved identification speed, but only impacted the top 25% most frequently observed birds. For rare birds the history based algorithms did not impose a noticeable penalty in the number of questions required for identification. For our dataset neither age of the historic data, nor the number of observation years impacted the algorithm. Density of characters for different taxa in the identification matrix did not impact the algorithms. Intrinsic differences in identifying different birds did affect the algorithm, but the differences affected the baseline method of not using historic data to exactly the same degree. We found that Laplace smoothing performed better for rare species than Simple Good-Turing, and that, contrary to expectation, the technique did not then adversely affect identification performance for frequently observed birds.