Identification of putative regulatory upstream ORFs in the yeast genome using heuristics and evolutionary conservation
1 Department of Cell and Molecular Biology, Lundberg Laboratory, Göteborg University, PO Box 462 SE-405 30 Göteborg, Sweden
2 Department of Computer Science and Engineering, Chalmers University of Technology, SE-412 96 Göteborg, Sweden
3 Max-Planck Institute for Molecular Genetics, Ihnestraße 63, D-14195 Berlin, Germany
4 Biochemistry Department, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
BMC Bioinformatics 2007, 8:295 doi:10.1186/1471-2105-8-295Published: 8 August 2007
The translational efficiency of an mRNA can be modulated by upstream open reading frames (uORFs) present in certain genes. A uORF can attenuate translation of the main ORF by interfering with translational reinitiation at the main start codon. uORFs also occur by chance in the genome, in which case they do not have a regulatory role. Since the sequence determinants for functional uORFs are not understood, it is difficult to discriminate functional from spurious uORFs by sequence analysis.
We have used comparative genomics to identify novel uORFs in yeast with a high likelihood of having a translational regulatory role. We examined uORFs, previously shown to play a role in regulation of translation in Saccharomyces cerevisiae, for evolutionary conservation within seven Saccharomyces species. Inspection of the set of conserved uORFs yielded the following three characteristics useful for discrimination of functional from spurious uORFs: a length between 4 and 6 codons, a distance from the start of the main ORF between 50 and 150 nucleotides, and finally a lack of overlap with, and clear separation from, neighbouring uORFs. These derived rules are inherently associated with uORFs with properties similar to the GCN4 locus, and may not detect most uORFs of other types. uORFs with high scores based on these rules showed a much higher evolutionary conservation than randomly selected uORFs. In a genome-wide scan in S. cerevisiae, we found 34 conserved uORFs from 32 genes that we predict to be functional; subsequent analysis showed the majority of these to be located within transcripts. A total of 252 genes were found containing conserved uORFs with properties indicative of a functional role; all but 7 are novel. Functional content analysis of this set identified an overrepresentation of genes involved in transcriptional control and development.
Evolutionary conservation of uORFs in yeasts can be traced up to 100 million years of separation. The conserved uORFs have certain characteristics with respect to length, distance from each other and from the main start codon, and folding energy of the sequence. These newly found characteristics can be used to facilitate detection of other conserved uORFs.