BMC Genomics

official impact factor 4.21

Open Access Research article

Comparison of theoretical proteomes: Identification of COGs with conserved and variable pI within the multimodal pI distribution

Soumyadeep Nandi1, Nipun Mehra1, Andrew M Lynn1 and Alok Bhattacharya1,2*

Author Affiliations

1 Centre for Computational Biology and Bioinformatics, School of Information Technology, Jawaharlal Nehru University, New Delhi 110067, India

2 School of Life Sciences, Jawaharlal Nehru University, New Delhi 110067, India

For all author emails, please log on.

BMC Genomics 2005, 6:116 doi:10.1186/1471-2164-6-116

Published: 9 September 2005

Abstract

Background

Theoretical proteome analysis, generated by plotting theoretical isoelectric points (pI) against molecular masses of all proteins encoded by the genome show a multimodal distribution for pI. This multimodal distribution is an effect of allowed combinations of the charged amino acids, and not due to evolutionary causes. The variation in this distribution can be correlated to the organisms ecological niche. Contributions to this variation maybe mapped to individual proteins by studying the variation in pI of orthologs across microorganism genomes.

Results

The distribution of ortholog pI values showed trimodal distributions for all prokaryotic genomes analyzed, similar to whole proteome plots. Pairwise analysis of pI variation show that a few COGs are conserved within, but most vary between, the acidic and basic regions of the distribution, while molecular mass is more highly conserved. At the level of functional grouping of orthologs, five groups vary significantly from the population of orthologs, which is attributed to either conservation at the level of sequences or a bias for either positively or negatively charged residues contributing to the function. Individual COGs conserved in both the acidic and basic regions of the trimodal distribution are identified, and orthologs that best represent the variation in levels of the acidic and basic regions are listed.

Conclusion

The analysis of pI distribution by using orthologs provides a basis for resolution of theoretical proteome comparison at the level of individual proteins. Orthologs identified that significantly vary between the major acidic and basic regions maybe used as representative of the variation of the entire proteome.