Computer-aided identification of polymorphism sets diagnostic for groups of bacterial and viral genetic variants
Cooperative Research Centre for Diagnostics, Institute of Health and Biomedical Innovation, Queensland University of Technology, Cnr Blamey St and Musk Ave, Kelvin Grove, Australia
BMC Bioinformatics 2007, 8:278 doi:10.1186/1471-2105-8-278Published: 1 August 2007
Single nucleotide polymorphisms (SNPs) and genes that exhibit presence/absence variation have provided informative marker sets for bacterial and viral genotyping. Identification of marker sets optimised for these purposes has been based on maximal generalized discriminatory power as measured by Simpson's Index of Diversity, or on the ability to identify specific variants. Here we describe the Not-N algorithm, which is designed to identify small sets of genetic markers diagnostic for user-specified subsets of known genetic variants. The algorithm does not treat the user-specified subset and the remaining genetic variants equally. Rather Not-N analysis is designed to underpin assays that provide 0% false negatives, which is very important for e.g. diagnostic procedures for clinically significant subgroups within microbial species.
The Not-N algorithm has been incorporated into the "Minimum SNPs" computer program and used to derive genetic markers diagnostic for multilocus sequence typing-defined clonal complexes, hepatitis C virus (HCV) subtypes, and phylogenetic clades defined by comparative genome hybridization (CGH) data for Campylobacter jejuni, Yersinia enterocolitica and Clostridium difficile.
Not-N analysis is effective for identifying small sets of genetic markers diagnostic for microbial sub-groups. The best results to date have been obtained with CGH data from several bacterial species, and HCV sequence data.