Linking evolution of protein structures through fragments

Abeln, Sanne; Deane, Charlotte M

doi:10.1186/1752-0509-1-S1-S12

Volume 1 Supplement 1

BioSysBio 2007: Systems Biology, Bioinformatics, Synthetic Biology

Oral presentation
Open access
Published: 08 May 2007

Linking evolution of protein structures through fragments

Sanne Abeln¹ &
Charlotte M Deane¹

BMC Systems Biology volume 1, Article number: S12 (2007) Cite this article

1564 Accesses
Metrics details

Motivation

At present there is no universal understanding of how proteins can change topology during evolution, and how such pathways can be determined in a systematic way. The ability to create links between fold topologies would have important consequences for structural classification, structure prediction and homology modeling. Several methods based on geometrical measures have been proposed to create links between topologies, e.g. [1, 2]. It has proven difficult, however, to show the evolutionary relevance of such links. Here we use our previously developped age measure for protein superfamilies [3] to investigate the relationship between structural fragments and protein structure evolution.

Results and discussion

We used a set of pairwise fragments to create a network of structural links between superfamilies. In total 1.2e-8, 1.5e-7, 2.7e-6 and 1.1e-5 fragments were generated of lengths 10, 15, 20 and 30 respectively. When comparing the number of fragment-links that young and old superfamilies make with other superfamilies, it becomes clear that the distribution of younger folds is skewed towards fewer links (Figure 1). Similarly we can compare the number of links that each superfamily has with a set of young and a set of old superfamilies. Again most superfamilies share significantly fewer links with the group of young superfamilies (Figure 2). New proteins are thought to be created through duplication and point mutations of structural domains. Here we show (the first) evidence that this might also occur on a scale below the domain level: fragments are shared more often with older superfamilies, which is expected in a model where new topologies can be built through an assembly of, or multiple insertions of, fragments from existing proteins. A little care has to be taken here as these results could also be caused by a scenario of convergent evolution, which would drive the inclusion of more stable fragments. However, the differences between age groups become stronger, with increased fragment length (Figure 2). When increasing the fragment length the probability of convergence should decrease contradicting the above argument. These results have important implications for structure prediction, as it may explain why current 'fragment based' modelling approaches are so successful.

Methods

Fragments

The fragment library generated for this study, contains fragment-pairs of length 10, 15, 20 and 30, with a maximum allowed gap-lengths of 2, 3, 4 and 6 respectively. All fragments are based on pairwise comparisons between structural domain as defined by SCOP. The pairs are scored for similarity purely on structural grounds, using the coordinates of the c-alpha atoms. This is to avoid bias, based on sequence similarity. All possible pairwise fragments between two domains of the given lengths are first screened and aligned using a method similar to the pre-filter used by MAMMOTH [4]. Each fragment pair with an alignment score above a threshold is then superimposed giving the c-alpha RMSD score for the fragment pair.

Age estimates

Age estimates for protein folds or superfamilies are generated using fold recognition of structural domains on a set of completed genomes. The occurrence patterns of such predictions, are analysed with a parsimony algorithm to estimate an age for a superfamily, for more details see [3]. The age of a superfamily is based on a score between [0.0,1.0], with 1.0 indicating the superfamily was estimated to be present at the root of the species tree (oldest), and 0.0 estimating that the superfamily was created at the leaf level (youngest). Here an 'old' fold is defined as a fold with an age of 1.0, and a 'young' fold with an age < 0.5.

Linking Folds

Some fragments might be over-represented (e.g. secondary structure is not considered) therefore the number of shared fragments needs to be normalised for the number of times a fragment occurs. Friedberg and Godzik (2005) used a superfamily based normalisation to overcome this problem [2]. We use a similar approach, although the fragment-pairs in this study are based on structural similarity only. (whereas Friedberg and Godzik (2005) used a combination of sequence and structural similarity). A link between two superfamilies (I and J) is established when f(I, J) > 0.1, which is calculated as:

$f (I, J) = \frac{S i m (I, J)}{m i n (S i m (A - I, I), S i m (A - J, J))} i f I \neq J$

Here Sim(A, B) is the number of shared fragments between two set of domains (e.g. superfamilies), and A is the set of all domains. In this study we do not consider self-similarity of superfamilies.

Conclusion

We show that younger folds have relatively fewer shared fragments with other folds, than old protein folds. This may indicate that evolutionary links above superfamily or fold level could be established, through such shared fragments.

References

Taylor WR: A 'periodic table' for protein structures. Nature. 2002, 416 (6881): 657-660. 10.1038/416657a
Article PubMed CAS Google Scholar
Friedberg I, Godzik A: Fragnostic: walking through protein structure space. Nucleic Acids Res. 2005, W249-W251. 33 Web Server
Google Scholar
Winstanley HF, Abeln S, Deane CM: How old is your fold?. Bioinformatics. 2005, 21 (Suppl 1): i449-i458. 10.1093/bioinformatics/bti1008
Article PubMed CAS Google Scholar
Ortiz AR, Strauss CE, Olmea O: MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 2002, 11 (11): 2606-2621. 10.1110/ps.0215902
Article PubMed CAS PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, UK
Sanne Abeln & Charlotte M Deane

Authors

Sanne Abeln
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte M Deane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanne Abeln.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Abeln, S., Deane, C.M. Linking evolution of protein structures through fragments. BMC Syst Biol 1 (Suppl 1), S12 (2007). https://doi.org/10.1186/1752-0509-1-S1-S12

Download citation

Published: 08 May 2007
DOI: https://doi.org/10.1186/1752-0509-1-S1-S12

BioSysBio 2007: Systems Biology, Bioinformatics, Synthetic Biology

Linking evolution of protein structures through fragments

Motivation

Results and discussion

Methods

Fragments

Age estimates

Linking Folds

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Systems Biology

Contact us

BioSysBio 2007: Systems Biology, Bioinformatics, Synthetic Biology

Linking evolution of protein structures through fragments

Motivation

Results and discussion

Methods

Fragments

Age estimates

Linking Folds

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Systems Biology

Contact us