Simple regression models as a threshold for selecting AFLP loci with reduced error rates
1 Department of Agronomy, University of Wisconsin-Madison, 1575 Linden Dr, Madison, WI 53706, USA
2 USDA-ARS, U.S. Dairy Forage Research Center, 1925 Linden Dr, Madison, WI, 53706, USA
BMC Bioinformatics 2012, 13:268 doi:10.1186/1471-2105-13-268Published: 16 October 2012
Amplified fragment length polymorphism is a popular DNA marker technique that has applications in multiple fields of study. Technological improvements and decreasing costs have dramatically increased the number of markers that can be generated in an amplified fragment length polymorphism experiment. As datasets increase in size, the number of genotyping errors also increases. Error within a DNA marker dataset can result in reduced statistical power, incorrect conclusions, and decreased reproducibility. It is essential that error within a dataset be recognized and reduced where possible, while still balancing the need for genomic diversity.
Using simple regression with a second-degree polynomial term, a model was fit to describe the relationship between locus-specific error rate and the frequency of present alleles. This model was then used to set a moving error rate threshold that varied based on the frequency of present alleles at a given locus. Loci with error rates greater than the threshold were removed from further analyses. This method of selecting loci is advantageous, as it accounts for differences in error rate between loci of varying frequencies of present alleles. An example using this method to select loci is demonstrated in an amplified fragment length polymorphism dataset generated from the North American prairie species big bluestem. Within this dataset the error rate was reduced from 12.5% to 8.8% by removal of loci with error rates greater than the defined threshold. By repeating the method on selected loci, the error rate was further reduced to 5.9%. This reduction in error resulted in a substantial increase in the amount of genetic variation attributable to regional and population variation.
This paper demonstrates a logical and computationally simple method for selecting loci with a reduced error rate. In the context of a genetic diversity study, this method resulted in an increased ability to detect differences between populations. Further application of this locus selection method, in addition to error-reducing methodological precautions, will result in amplified fragment length polymorphism datasets with reduced error rates. This reduction in error rate should result in greater power to detect differences and increased reproducibility.