Conceptual flowchart of computer simulation setup to generate the number of questions required to identify a focal species. The process in the figure identifies one focal species at a time. Parameter settings (left) were varied in the following sets of experiments: (1) the number of years of historical observations and age of the years of historical observations were varied to measure their impact on the number of required questions. (2) To examine how best to account for previously unobserved birds we varied the probability smoothing algorithm used for the decision tree computation. For each focal species, the simulation requested character states from the oracle, which always provided the correct state. Once enough characters had been determined to uniquely identify the focal species, the number of required questions was noted and another species was selected as focal species. This process continued until all 104 birds that were observed at Jasper Ridge had been identified. For the smoothing algorithm experiment an additional random selection of 50 previously unobserved birds was also identified.
Manoharan et al. BMC Bioinformatics 2008 9:150 doi:10.1186/1471-2105-9-150