Open Access Highly Accessed Research article

Enhanced whole genome sequence and annotation of Clostridium stercorarium DSM8532T using RNA-seq transcriptomics and high-throughput proteomics

John J Schellenberg1*, Tobin J Verbeke1, Peter McQueen2, Oleg V Krokhin2, Xiangli Zhang3, Graham Alvare3, Brian Fristensky3, Gerhard G Thallinger56, Bernard Henrissat78, John A Wilkins2, David B Levin4 and Richard Sparling1

Author Affiliations

1 Department of Microbiology, University of Manitoba, Winnipeg, Canada

2 Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, Winnipeg, Canada

3 Department of Plant Sciences, University of Manitoba, Winnipeg, Canada

4 Department of Biosystems Engineering, University of Manitoba, Winnipeg, Canada

5 Core Facility Bioinformatics, Austrian Centre of Industrial Biotechnology (ACIB), Graz, Austria

6 Institute for Genomics and Bioinformatics, Graz University of Technology, Graz, Austria

7 Architecture et Fonction des Macromolécules Biologiques, Université Aix-Marseille, Marseille, France

8 Centre National de Recherche Scientifique, UMR 7257, 163 ave. de Luminy, Marseille 13288, France

For all author emails, please log on.

BMC Genomics 2014, 15:567  doi:10.1186/1471-2164-15-567

Published: 7 July 2014



Growing interest in cellulolytic clostridia with potential for consolidated biofuels production is mitigated by low conversion of raw substrates to desired end products. Strategies to improve conversion are likely to benefit from emerging techniques to define molecular systems biology of these organisms. Clostridium stercorarium DSM8532T is an anaerobic thermophile with demonstrated high ethanol production on cellulose and hemicellulose. Although several lignocellulolytic enzymes in this organism have been well-characterized, details concerning carbohydrate transporters and central metabolism have not been described. Therefore, the goal of this study is to define an improved whole genome sequence (WGS) for this organism using in-depth molecular profiling by RNA-seq transcriptomics and tandem mass spectrometry-based proteomics.


A paired-end Roche/454 WGS assembly was closed through application of an in silico algorithm designed to resolve repetitive sequence regions, resulting in a circular replicon with one gap and a region of 2 kilobases with 10 ambiguous bases. RNA-seq transcriptomics resulted in nearly complete coverage of the genome, identifying errors in homopolymer length attributable to 454 sequencing. Peptide sequences resulting from high-throughput tandem mass spectrometry of trypsin-digested protein extracts were mapped to 1,755 annotated proteins (68% of all protein-coding regions). Proteogenomic analysis confirmed the quality of annotation and improvement pipelines, identifying a missing gene and an alternative reading frame. Peptide coverage of genes hypothetically involved in substrate hydrolysis, transport and utilization confirmed multiple pathways for glycolysis, pyruvate conversion and recycling of intermediates. No sequences homologous to transaldolase, a central enzyme in the pentose phosphate pathway, were observed by any method, despite demonstrated growth of this organism on xylose and xylan hemicellulose.


Complementary omics techniques confirm the quality of genome sequence assembly, annotation and error-reporting. Nearly complete genome coverage by RNA-seq likely indicates background DNA in RNA extracts, however these preps resulted in WGS enhancement and transcriptome profiling in a single Illumina run. No detection of transaldolase by any method despite xylose utilization by this organism indicates an alternative pathway for sedoheptulose-7-phosphate degradation. This report combines next-generation omics techniques to elucidate previously undefined features of substrate transport and central metabolism for this organism and its potential for consolidated biofuels production from lignocellulose.

Genome; Proteome; Transcriptome; RNA-seq; Tandem mass spectrometry; Proteogenomics; Glycolysis; Pentose phosphate pathway; Transaldolase