Bioinformatics prediction of overlapping frameshifted translation products in mammalian transcripts
1 Genedata AG, Maulbeerstrasse 46, CH-4016 Basel, Switzerland
2 Basilea Pharmaceutica AG, Grenzacherstrasse 487, CH-4005 Basel, Switzerland
3 Novartis Institutes for Biomedical Research, CH-4002 Basel, Switzerland
BMC Genomics 2008, 9:122 doi:10.1186/1471-2164-9-122Published: 6 March 2008
Exceptionally, a single nucleotide sequence can be translated in vivo in two different frames to yield distinct proteins. In the case of the G-protein alpha subunit XL-alpha-s transcript, a frameshifted open reading frame (ORF) in exon 1 is translated to yield a structurally distinct protein called Alex, which plays a role in platelet aggregation and neurological processes. We carried out a novel bioinformatics screen for other possible dual-frame translated sequences, based on comparative genomics.
Our method searched human, mouse and rat transcripts in frames +1 and -1 for ORFs which are unusually well conserved at the amino acid level. We name these conserved frameshifted overlapping ORFs 'matreshkas' to reflect their nested character. Select findings of our analysis revealed that the G-protein coupled receptor GPR27 is entirely contained within a frame -1 matreshka, thrombopoietin contains a matreshka which spans ~70% of its length, platelet glycoprotein IIIa (ITGB3) contains a matreshka with the predicted characteristics of a secreted peptide hormone, while the potassium channel KCNK12 contains a matreshka spanning >400 amino acids.
Although the in vivo existence of translated matreshkas has not been experimentally verified, this genome-wide analysis provides strong evidence that substantial overlapping coding sequences exist in a number of human and rodent transcripts.