Open Access Open Badges Research article

An iterative workflow for mining the human intestinal metaproteome

Koos Rooijers1, Carolin Kolmeder2, Catherine Juste3, Joël Doré3, Mark de Been2, Sjef Boeren4, Pilar Galan5, Christian Beauvallet6, Willem M de Vos27 and Peter J Schaap1*

Author Affiliations

1 Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein10, 6703 HB Wageningen, The Netherlands

2 Department of Veterinary Biosciences, Division of Microbiology and Epidemiology, University of Helsinki, P.O. Box 66, FIN-00014 Helsinki, Finland

3 INRA, UMR 1319, Micalis, 78350 Jouy-en-Josas, France

4 Laboratory of Biochemistry, Wageningen University, Dreijenlaan 10, 6703 HB Wageningen, The Netherlands

5 UMR U557 INSERM, U1125 INRA, CNAM, Université Paris 13, F-93017 Bobigny, France

6 INRA, IsoCellExpress (ICE), UMR 1313 GABI, 78350 Jouy-en-Josas, France

7 Laboratory of Microbiology, Wageningen University, Dreijenplein 10, 6703 HB Wageningen, the Netherlands

For all author emails, please log on.

BMC Genomics 2011, 12:6  doi:10.1186/1471-2164-12-6

Published: 5 January 2011



Peptide spectrum matching (PSM) is the standard method in shotgun proteomics data analysis. It relies on the availability of an accurate and complete sample proteome that is used to make interpretation of the spectra feasible. Although this procedure has proven to be effective in many proteomics studies, the approach has limitations when applied on complex samples of microbial communities, such as those found in the human intestinal tract. Metagenome studies have indicated that the human intestinal microbiome contains over 100 times more genes than the human genome and it has been estimated that this ecosystem contains over 5000 bacterial species. The genomes of the vast majority of these species have not yet been sequenced and hence their proteomes remain unknown. To enable data analysis of shotgun proteomics data using PSM, and circumvent the lack of a defined matched metaproteome, an iterative workflow was developed that is based on a synthetic metaproteome and the developing metagenomic databases that are both representative for but not necessarily originating from the sample of interest.


Two human fecal samples for which metagenomic data had been collected, were analyzed for their metaproteome using liquid chromatography-mass spectrometry and used to benchmark the developed iterative workflow to other methods. The results show that the developed method is able to detect over 3,000 peptides per fecal sample from the spectral data by circumventing the lack of a defined proteome without naive translation of matched metagenomes and cross-species peptide identification.


The developed iterative workflow achieved an approximate two-fold increase in the amount of identified spectra at a false discovery rate of 1% and can be applied in metaproteomic studies of the human intestinal tract or other complex ecosystems.