Email updates

Keep up to date with the latest news and content from BMC Microbiology and BioMed Central.

Open Access Highly Accessed Research article

Proteome driven re-evaluation and functional annotation of the Streptococcus pyogenes SF370 genome

Akira Okamoto* and Keiko Yamada

Author Affiliations

Department of Molecular Bacteriology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550, Japan

For all author emails, please log on.

BMC Microbiology 2011, 11:249  doi:10.1186/1471-2180-11-249

Published: 10 November 2011

Additional files

Additional file 1:

Cross-sectional Genome Overview of GAS. Thirteen chromosomal DNA sequences were obtained from the NCBI database. CDS length and coverage, number of genes, number of protein coding genes, and average lengths of protein coding genes were calculated from the information for each genome. The CDS region indicates the total length of genes annotated in each genome. Number of genes refers to those counted as tagged as "gene" in a particular genome. The genes that are annotated as protein coding regions are the number of protein coding genes. The genome overview is listed for the genome submitted or updated year. a) The gene predictor used in this strain was not clearly stated in the manuscript, but estimated via citation. b) The CDS coverage and the number of genes in Manfredo were not analyzed (NA) because of an annotation format that differed from other genomes.

Format: XLS Size: 34KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Overview of the shotgun proteomic analysis. Using 3 different culture conditions (static; without shaking, CO2; under 5% CO2 condition without shaking, and shake; with shaking), GAS SF370 tryptic-digested peptide was analyzed with LC-MS/MS. Approximately 7,000 spectra were queried with MASCOT server with a real and randomized decoy database for each six-frame and refined amino acid database (read DB) consisting of 1,707 CDSs. The identification certainty was evaluated by the false discovery rate (FDR).

Format: XLS Size: 32KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

Candidate CDS found in this study. The ORFs that were assigned to more than two unique sequences are listed in this table with Gene Ontology annotation. Total numbers of average identified unique sequences of each experiment group are listed. mRNA encoding CDS candidates was amplified with RT-PCR (+) or not (-). Abbreviations: ORF ID, unique number of ORF in the six frame database in this study; Mw and pI, molecular weight and isoelectric point deduced from the amino acid sequence; SNT, supernatant fraction; SOL, soluble fraction; INS, insoluble fraction. n/a; not available.

Format: XLS Size: 175KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Table of identified proteins with in-house refined database. Abbreviations; a) Synonym, Tag number in SF370 genome; b) Gene, gene name; c) PID, GI number of protein in NCBInr database; d) COGs code, abbreviation of functional categories in Clusters of Orthologous Groups project. Each one letter abbreviation is detailed in the manuscript, and Additional file 5 and 6; e) MSD, the number of membrane spanning domain that calculated by SOSUI program; f) SP, the probability score of the signal peptide prediction with SignalP 3.0 program (Hidden Markov Model); g) Abbreviation in "static", "CO2", and "shake" columns: score, MASCOT score; %AA, coverage percent in amino acid; seq, spectrum matched number for unique sequence; emPAI, experimental modified Peptide Abundant Index.

Format: XLS Size: 519KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Annotations for "Conserved hypothetical proteins". "Conserved hypothetical proteins", which were assigned more than two unique sequences, are listed in this table with homology search based annotation, such as Gene Ontology. Total numbers of average identified unique sequences in each experiment group are listed. Abbreviations in the description column; Synonym, tag number in the SF370 genome; a) Abbreviations in the "location" column; S, secreted protein (supernatant fraction); C, cytoplasmic protein (soluble fraction); W, cell wall associated protein (insoluble fraction), uni; universally identified in all cellular fractions; the number indicates average of MS/MS spectrum number that was assigned to unique peptide sequences. b) Abbreviations in the "condition" column; sta, culture under static growth conditions; co, culture under 5% CO2 culture conditions; sha, culture under shaking conditions; uni, universally identified in all three culture conditions. The number indicates average of MS/MS spectrum number that was assigned to unique peptide sequences. c) COGs, abbreviation of functional categories in Clusters of Orthologous Groups project. "D", Cell cycle control, cell division, chromosome partitioning; "E", Amino acid transport and metabolism; "G", Carbohydrate transport and metabolism; "H", Coenzyme transport and metabolism; "I", Lipid transport and metabolism; "J", Translation, ribosomal structure and biogenesis; "K", Transcription; "M", Cell wall/membrane/envelope biogenesis; "O", Posttranslational modification, protein turnover, chaperones; "P", Inorganic ion transport and metabolism; "Q", Secondary metabolites biosynthesis, transport and catabolism; "R", General function prediction only; "S", Function unknown; "T", Signal transduction mechanisms; "U", Intracellular trafficking, secretion, and vesicular transport; "V", Defense mechanisms; and "-", Not classified into COGs; d) MSD, the number of membrane spanning domain calculated by the SOSUI program, in Reference 48. e) SP, the probability score of signal peptide prediction with the SignalP 3.0 program (Hidden Markov Model), in Reference 29, 30

Format: XLS Size: 65KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Annotations for "Hypothetical proteins". "Hypothetical proteins", which were assigned more than two unique sequences, are listed in this table with homology search based annotation, such as Gene Ontology. Total numbers of average identified unique sequences in each experiment group are listed. Abbreviations in the description column; Synonym, tag number in the SF370 genome; a) Abbreviations in the "location" column; S, secreted protein (supernatant fraction); C, cytoplasmic protein (soluble fraction); W, cell wall associated protein (insoluble fraction), uni; universally identified in all cellular fractions; the number indicates average of MS/MS spectrum number that was assigned to unique peptide sequences. b) Abbreviations in the "condition" column; sta, culture under static growth conditions; co, culture under 5% CO2 culture conditions; sha, culture under shaking conditions; uni, universally identified in all three culture conditions. The number indicates average of MS/MS spectrum number that was assigned to unique peptide sequences. c) COGs, abbreviation of functional categories in Clusters of Orthologous Groups project. "D", Cell cycle control, cell division, chromosome partitioning; "E", Amino acid transport and metabolism; "G", Carbohydrate transport and metabolism; "H", Coenzyme transport and metabolism; "I", Lipid transport and metabolism; "J", Translation, ribosomal structure and biogenesis; "K", Transcription; "M", Cell wall/membrane/envelope biogenesis; "O", Posttranslational modification, protein turnover, chaperones; "P", Inorganic ion transport and metabolism; "Q", Secondary metabolites biosynthesis, transport and catabolism; "R", General function prediction only; "S", Function unknown; "T", Signal transduction mechanisms; "U", Intracellular trafficking, secretion, and vesicular transport; "V", Defense mechanisms; and "-", Not classified into COGs; d) MSD, the number of membrane spanning domain calculated by the SOSUI program, in Reference 48. e) SP, the probability score of signal peptide prediction with the SignalP 3.0 program (Hidden Markov Model), in Reference 29, 30

Format: XLS Size: 48KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Table listing the information on primers used for RT-PCR assay. The RT-PCR procedure is detailed in the Methods section. The sequences of each primer, cycle numbers for amplification, and estimated product sizes are listed.

Format: XLS Size: 31KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data