Open Access Highly Accessed Research article

Protein abundance profiling of the Escherichia coli cytosol

Yasushi Ishihama12*, Thorsten Schmidt3, Juri Rappsilber27, Matthias Mann24, F Ulrich Hartl5, Michael J Kerner6 and Dmitrij Frishman38*

Author Affiliations

1 Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan

2 Center for Experimental BioInformatics, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark

3 Department of Genome-Oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, D-85350 Freising, Germany

4 Department of Proteomics and Signal Transduction, Max Planck Institute for Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany

5 Department of Cellular Biochemistry, Max Planck Institute for Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany

6 Center for Biological Sequence Analysis, BioCentrum, Technical University of Denmark, Kemitorvet 208, DK-1726 Lyngby, Denmark

7 Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, EH9 3JR, UK

8 Institute for Bioinformatics, GSF National Research Center for Environment and Health, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany

For all author emails, please log on.

BMC Genomics 2008, 9:102  doi:10.1186/1471-2164-9-102

Published: 27 February 2008



Knowledge about the abundance of molecular components is an important prerequisite for building quantitative predictive models of cellular behavior. Proteins are central components of these models, since they carry out most of the fundamental processes in the cell. Thus far, protein concentrations have been difficult to measure on a large scale, but proteomic technologies have now advanced to a stage where this information becomes readily accessible.


Here, we describe an experimental scheme to maximize the coverage of proteins identified by mass spectrometry of a complex biological sample. Using a combination of LC-MS/MS approaches with protein and peptide fractionation steps we identified 1103 proteins from the cytosolic fraction of the Escherichia coli strain MC4100. A measure of abundance is presented for each of the identified proteins, based on the recently developed emPAI approach which takes into account the number of sequenced peptides per protein. The values of abundance are within a broad range and accurately reflect independently measured copy numbers per cell.

As expected, the most abundant proteins were those involved in protein synthesis, most notably ribosomal proteins. Proteins involved in energy metabolism as well as those with binding function were also found in high copy number while proteins annotated with the terms metabolism, transcription, transport, and cellular organization were rare. The barrel-sandwich fold was found to be the structural fold with the highest abundance. Highly abundant proteins are predicted to be less prone to aggregation based on their length, pI values, and occurrence patterns of hydrophobic stretches. We also find that abundant proteins tend to be predominantly essential. Additionally we observe a significant correlation between protein and mRNA abundance in E. coli cells.


Abundance measurements for more than 1000 E. coli proteins presented in this work represent the most complete study of protein abundance in a bacterial cell so far. We show significant associations between the abundance of a protein and its properties and functions in the cell. In this way, we provide both data and novel insights into the role of protein concentration in this model organism.