Software for selecting the most informative sets of genomic loci for multi-target microbial typing
Centre for Infectious Diseases and Microbiology and Sydney Institute for Emerging Infections and Biosecurity, University of Sydney, Westmead Hospital, Hawkesbury Road, Westmead, NSW 2145, Australia
BMC Bioinformatics 2013, 14:148 doi:10.1186/1471-2105-14-148Published: 1 May 2013
High-throughput sequencing can identify numerous potential genomic targets for microbial strain typing, but identification of the most informative combinations requires the use of computational screening tools. This paper describes novel software – Automated Selection of Typing Target Subsets (AuSeTTS) - that allows intelligent selection of optimal targets for pathogen strain typing. The objective of this software is to maximise both discriminatory power, using Simpson’s index of diversity (D), and concordance with existing typing methods, using the adjusted Wallace coefficient (AW). The program interrogates molecular typing results for panels of isolates, based on large target sets, and iteratively examines each target, one-by-one, to determine the most informative subset.
AuSeTTS was evaluated using three target sets: 51 binary targets (13 toxin genes, 16 phage-related loci and 22 SCCmec elements), used for multilocus typing of 153 methicillin-resistant Staphylococcus aureus (MRSA) isolates; 17 MLVA loci in 502 Streptococcus pneumoniae isolates from the MLVA database (http://www.mlva.eu webcite) and 12 MLST loci for 98 Cryptococcus spp. isolates.
The maximum D for MRSA, 0.984, was achieved with a subset of 20 targets and a D value of 0.954 with 7 targets. Twelve targets predicted MLST with a maximum AW of 0.9994. All 17 S. pneumoniae MLVA targets were required to achieve maximum D of 0.997, but 4 targets reached D of 0.990. Twelve targets predicted pneumococcal serotype with a maximum AW of 0.899 and 9 predicted MLST with maximum AW of 0.963. Eight of the 12 MLST loci were sufficient to achieve the maximum D of 0.963 for Cryptococcus spp.
Computerised analysis with AuSeTTS allows rapid selection of the most discriminatory targets for incorporation into typing schemes. Output of the program is presented in both tabular and graphical formats and the software is available for free download from http://www.cidmpublichealth.org/pages/ausetts.html webcite.