The relationship of protein conservation and sequence length
National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
BMC Evolutionary Biology 2002, 2:20 doi:10.1186/1471-2148-2-20Published: 1 November 2002
In general, the length of a protein sequence is determined by its function and the wide variance in the lengths of an organism's proteins reflects the diversity of specific functional roles for these proteins. However, additional evolutionary forces that affect the length of a protein may be revealed by studying the length distributions of proteins evolving under weaker functional constraints.
We performed sequence comparisons to distinguish highly conserved and poorly conserved proteins from the bacterium Escherichia coli, the archaeon Archaeoglobus fulgidus, and the eukaryotes Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. For all organisms studied, the conserved and nonconserved proteins have strikingly different length distributions. The conserved proteins are, on average, longer than the poorly conserved ones, and the length distributions for the poorly conserved proteins have a relatively narrow peak, in contrast to the conserved proteins whose lengths spread over a wider range of values. For the two prokaryotes studied, the poorly conserved proteins approximate the minimal length distribution expected for a diverse range of structural folds.
There is a relationship between protein conservation and sequence length. For all the organisms studied, there seems to be a significant evolutionary trend favoring shorter proteins in the absence of other, more specific functional constraints.