Discovering putative prion sequences in complete proteomes using probabilistic representations of Q/N-rich domains
1 Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza, Pedro Cerbuna 12, Zaragoza 50009, Spain
2 Institute for Biocomputation and Physics of Complex Systems (BIFI). Universidad de Zaragoza, Mariano Esquillor, Edificio I + D, Zaragoza 50018, Spain
3 Joint Unit BIFI-IQFR (CSIC), Serrano 119, Madrid 28006, Spain
4 Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
5 Departament de Bioquimica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
Citation and License
BMC Genomics 2013, 14:316 doi:10.1186/1471-2164-14-316Published: 10 May 2013
Prion proteins conform a special class among amyloids due to their ability to transmit aggregative folds. Prions are known to act as infectious agents in neurodegenerative diseases in animals, or as key elements in transcription and translation processes in yeast. It has been suggested that prions contain specific sequential domains with distinctive amino acid composition and physicochemical properties that allow them to control the switch between soluble and β-sheet aggregated states. Those prion-forming domains are low complexity segments enriched in glutamine/asparagine and depleted in charged residues and prolines. Different predictive methods have been developed to discover novel prions by either assessing the compositional bias of these stretches or estimating the propensity of protein sequences to form amyloid aggregates. However, the available algorithms hitherto lack a thorough statistical calibration against large sequence databases, which makes them unable to accurately predict prions without retrieving a large number of false positives.
Here we present a computational strategy to predict putative prion-forming proteins in complete proteomes using probabilistic representations of prionogenic glutamine/asparagine rich regions. After benchmarking our predictive model against large sets of non-prionic sequences, we were able to filter out known prions with high precision and accuracy, generating prediction sets with few false positives. The algorithm was used to scan all the proteomes annotated in public databases for the presence of putative prion proteins. We analyzed the presence of putative prion proteins in all taxa, from viruses and archaea to plants and higher eukaryotes, and found that most organisms encode evolutionarily unrelated proteins with susceptibility to behave as prions.
To our knowledge, this is the first wide-ranging study aiming to predict prion domains in complete proteomes. Approaches of this kind could be of great importance to identify potential targets for further experimental testing and to try to reach a deeper understanding of prions’ functional and regulatory mechanisms.