Comparative genomic analysis of novel conserved peptide upstream open reading frames in Drosophila melanogaster and other dipteran species
Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
BMC Genomics 2008, 9:61 doi:10.1186/1471-2164-9-61Published: 1 February 2008
Upstream open reading frames (uORFs) are elements found in the 5'-region of an mRNA transcript, capable of regulating protein production of the largest, or major ORF (mORF), and impacting organismal development and growth in fungi, plants, and animals. In Drosophila, approximately 40% of transcripts contain upstream start codons (uAUGs) but there is little evidence that these are translated and affect their associated mORF.
Analyzing 19,389 Drosophila melanogaster transcript annotations and 666,153 dipteran EST sequences we have identified 44 putative conserved peptide uORFs (CPuORFs) in Drosophila melanogaster that show evidence of negative selection, and therefore are likely to be translated. Transcripts with CPuORFs constitute approximately 0.3% of the total number of transcripts, a similar frequency to the Arabidopsis genome, and have a mean length of 70 amino acids, much larger than the mean length of plant CPuORFs (40 amino acids). There is a statistically significant clustering of CPuORFs at cytological band 57 (p = 10-5), a phenomenon that has never been described for uORFs. Based on GO term and Interpro domain analyses, genes in the uORF dataset show a higher frequency of ORFs implicated in mitochondrial import than the genome-wide frequency (p < 0.01) as well as methyltransferases (p < 0.02).
Based on these data, it is clear that Drosophila contain putative CPuORFs at frequencies similar to those found in plants. They are distinguished, however, by the type of mORF they tend to associate with, Drosophila CPuORFs preferentially occurring in transcripts encoding mitochondrial proteins and methyltransferases. This provides a basis for the study of CPuORFs and their putative regulatory role in mitochondrial function and disease.