Open Access Open Badges Research article

Design and validation of a supragenome array for determination of the genomic content of Haemophilus influenzae isolates

Rory A Eutsey1, N Luisa Hiller12, Joshua P Earl1, Benjamin A Janto13, Margaret E Dahlgren1, Azad Ahmed1, Evan Powell1, Matthew P Schultz1, Janet R Gilsdorf45, Lixin Zhang4, Arnold Smith6, Timothy F Murphy7, Sanjay Sethi7, Kai Shen138, J Christopher Post138, Fen Z Hu138* and Garth D Ehrlich138*

Author Affiliations

1 Center for Genomic Sciences, Allegheny Singer Research Institute, Allegheny General Hospital, 320 East North Avenue, 11th Floor, South Tower, Pittsburgh, PA 15212, USA

2 Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA

3 Department of Microbiology and Immunology, Drexel University College of Medicine, Allegheny Campus, Pittsburgh, PA, USA

4 Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MC, USA

5 Department of Pediatrics and Communicable Diseases, University of Michigan School of Public Health, Ann Arbor, MC, USA

6 Center for Childhood Infections, Seattle Children’s Hospital Research Institute, Seattle, WA, USA

7 Department of Medicine, University at Buffalo, State University of New York, Buffalo, NY, USA

8 Department of Otolaryngology Head and Neck Surgery, Drexel University College of Medicine, Allegheny Campus, Pittsburgh, PA, USA

For all author emails, please log on.

BMC Genomics 2013, 14:484  doi:10.1186/1471-2164-14-484

Published: 17 July 2013



Haemophilus influenzae colonizes the human nasopharynx as a commensal, and is etiologically associated with numerous opportunistic infections of the airway; it is also less commonly associated with invasive disease. Clinical isolates of H. influenzae display extensive genomic diversity and plasticity. The development of strategies to successfully prevent, diagnose and treat H. influenzae infections depends on tools to ascertain the gene content of individual isolates.


We describe and validate a Haemophilus influenzae supragenome hybridization (SGH) array that can be used to characterize the full genic complement of any strain within the species, as well as strains from several highly related species. The array contains 31,307 probes that collectively cover essentially all alleles of the 2890 gene clusters identified from the whole genome sequencing of 24 clinical H. influenzae strains. The finite supragenome model predicts that these data include greater than 85% of all non-rare genes (where rare genes are defined as those present in less than 10% of sequenced strains). The veracity of the array was tested by comparing the whole genome sequences of eight strains with their hybridization data obtained using the supragenome array. The array predictions were correct and reproducible for ~ 98% of the gene content of all of the sequenced strains. This technology was then applied to an investigation of the gene content of 193 geographically and clinically diverse H. influenzae clinical strains. These strains came from multiple locations from five different continents and Papua New Guinea and include isolates from: the middle ears of persons with otitis media and otorrhea; lung aspirates and sputum samples from pneumonia and COPD patients, blood specimens from patients with sepsis; cerebrospinal fluid from patients with meningitis, as well as from pharyngeal specimens from healthy persons.


These analyses provided the most comprehensive and detailed genomic/phylogenetic look at this species to date, and identified a subset of highly divergent strains that form a separate lineage within the species. This array provides a cost-effective and high-throughput tool to determine the gene content of any H. influenzae isolate or lineage. Furthermore, the method for probe selection can be applied to any species, given a group of available whole genome sequences.