Stem-loop structures in prokaryotic genomes
1 CEINGE Biotecnologie Avanzate scarl Via Comunale Margherita 482, 80145 Napoli, Italy
2 Dipartimento di Biologia e Patologia Cellulare e Molecolare, Università Federico II Via S. Pansini 5, 80131 Napoli, Italy
3 Dipartimento SAVA Università del Molise Via De Sanctis, 86100 Campobasso, Italy
4 Dipartimento di Biochimica e Biotecnologie Mediche, Università Federico II Via S. Pansini 5, 80131 Napoli, Italy
BMC Genomics 2006, 7:170 doi:10.1186/1471-2164-7-170Published: 4 July 2006
Prediction of secondary structures in the expressed sequences of bacterial genomes allows to investigate spontaneous folding of the corresponding RNA. This is particularly relevant in untranslated mRNA regions, where base pairing is less affected by interactions with the translation machinery. Relatively large stem-loops significantly contribute to the formation of more complex secondary structures, often important for the activity of sequence elements controlling gene expression.
Systematic analysis of the distribution of stem-loop structures (SLSs) in 40 wholly-sequenced bacterial genomes is presented. SLSs were searched as stems measuring at least 12 bp, bordering loops 5 to 100 nt in length. G-U pairing in the stems was allowed. SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and composition. The large majority of SLSs fall within protein-coding regions but enrichment of specific, non random, SLS sub-populations of higher stability was observed within the intergenic regions of the chromosomes of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. Some intergenic SLS regions are members of novel repeated sequence families.
In depth analysis of SLS features and distribution in 40 different bacterial genomes showed the presence of non random populations of such structures in all species. Many of these structures are plausibly transcribed, and might be involved in the control of transcription termination, or might serve as RNA elements which can enhance either the stability or the turnover of cotranscribed mRNAs. Three previously undescribed families of repeated sequences were found in Yersiniae, Bordetellae and Enterococci.