Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures
- Equal contributors
1 Program in Bioinformatics and Proteomics/Genomics, University of Toledo Health Science Campus, Toledo, OH 43614, USA
2 Dept. of Physiology and Pharmacology, University of Toledo Health Science Campus, Toledo, OH 43614, USA
3 Dept. of Neuroscience, University of Toledo Health Science Campus, Toledo, OH 43614, USA
4 Department of Cardiovascular and Metabolic Diseases, University of Toledo Health Science Campus, Toledo, OH 43614, USA
5 Dept. of Medicine, University of Toledo Health Science Campus, Toledo, OH 43614, USA
6 Department of Biological Sciences, Bowling Green State University, Bowling Green, OH 43403, USA
7 Luye Pharmaceutical LTD, Rm1107, Zhubang 2000 Business Center, Chaoyang District, Beijing 100025, PR China
BMC Genomics 2008, 9:284 doi:10.1186/1471-2164-9-284Published: 12 June 2008
Genomes possess different levels of non-randomness, in particular, an inhomogeneity in their nucleotide composition. Inhomogeneity is manifest from the short-range where neighboring nucleotides influence the choice of base at a site, to the long-range, commonly known as isochores, where a particular base composition can span millions of nucleotides. A separate genomic issue that has yet to be thoroughly elucidated is the role that RNA secondary structure (SS) plays in gene expression.
We present novel data and approaches that show that a mid-range inhomogeneity (~30 to 1000 nt) not only exists in mammalian genomes but is also significantly associated with strong RNA SS. A whole-genome bioinformatics investigation of local SS in a set of 11,315 non-redundant human pre-mRNA sequences has been carried out. Four distinct components of these molecules (5'-UTRs, exons, introns and 3'-UTRs) were considered separately, since they differ in overall nucleotide composition, sequence motifs and periodicities. For each pre-mRNA component, the abundance of strong local SS (< -25 kcal/mol) was a factor of two to ten greater than a random expectation model. The randomization process preserves the short-range inhomogeneity of the corresponding natural sequences, thus, eliminating short-range signals as possible contributors to any observed phenomena.
We demonstrate that the excess of strong local SS in pre-mRNAs is linked to the little explored phenomenon of genomic mid-range inhomogeneity (MRI). MRI is an interdependence between nucleotide choice and base composition over a distance of 20–1000 nt. Additionally, we have created a public computational resource to support further study of genomic MRI.