Open Access Research article

Comparative supragenomic analyses among the pathogens Staphylococcus aureus, Streptococcus pneumoniae, and Haemophilus influenzae Using a modification of the finite supragenome model

Robert Boissy18, Azad Ahmed1, Benjamin Janto1, Josh Earl1, Barry G Hall12, Justin S Hogg3, Gordon D Pusch45, Luisa N Hiller1, Evan Powell1, Jay Hayes1, Susan Yu1, Sandeep Kathju167, Paul Stoodley16, J Christopher Post167, Garth D Ehrlich167* and Fen Z Hu167*

Author Affiliations

1 Center for Genomic Sciences, Allegheny-Singer Research Institute, 320 East North Ave, Pittsburgh, PA 15212, USA

2 Bellingham Research Institute, 218 Chuckanut Point Rd, Bellingham, WA 98229, USA

3 Joint Carnegie Mellon University - University of Pittsburgh Doctoral Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA, 15260, USA

4 Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA

5 Computation Institute, University of Chicago, Chicago, IL 60637, USA

6 Department of Microbiology and Immunology, Drexel University College of Medicine, Allegheny Campus, Allegheny General Hospital, Pittsburgh, PA 15212, USA

7 Department of Otolaryngology-Head and Neck Surgery, Drexel University College of Medicine, Allegheny Campus, Allegheny General Hospital, Pittsburgh, PA 15212, USA

8 Department of Internal Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA

For all author emails, please log on.

BMC Genomics 2011, 12:187  doi:10.1186/1471-2164-12-187

Published: 13 April 2011



Staphylococcus aureus is associated with a spectrum of symbiotic relationships with its human host from carriage to sepsis and is frequently associated with nosocomial and community-acquired infections, thus the differential gene content among strains is of interest.


We sequenced three clinical strains and combined these data with 13 publically available human isolates and one bovine strain for comparative genomic analyses. All genomes were annotated using RAST, and then their gene similarities and differences were delineated. Gene clustering yielded 3,155 orthologous gene clusters, of which 2,266 were core, 755 were distributed, and 134 were unique. Individual genomes contained between 2,524 and 2,648 genes. Gene-content comparisons among all possible S. aureus strain pairs (n = 136) revealed a mean difference of 296 genes and a maximum difference of 476 genes. We developed a revised version of our finite supragenome model to estimate the size of the S. aureus supragenome (3,221 genes, with 2,245 core genes), and compared it with those of Haemophilus influenzae and Streptococcus pneumoniae. There was excellent agreement between RAST's annotations and our CDS clustering procedure providing for high fidelity metabolomic subsystem analyses to extend our comparative genomic characterization of these strains.


Using a multi-species comparative supragenomic analysis enabled by an improved version of our finite supragenome model we provide data and an interpretation explaining the relatively larger core genome of S. aureus compared to other opportunistic nasopharyngeal pathogens. In addition, we provide independent validation for the efficiency and effectiveness of our orthologous gene clustering algorithm.