Conservation and implications of eukaryote transcriptional regulatory regions across multiple species
1 School of Mathematical Sciences, Peking University, Beijing 100871, PR China
2 Center for Theoretical Biology, Peking University, Beijing 100871, PR China
3 State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, PR China
4 Department of Epidemiology, Michigan State University, East Lansing, Michigan 48824, USA
5 MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100871, PR China
6 Molecular and Computational Biology Program, University of Southern California, Los Angeles, California 90089, USA
BMC Genomics 2008, 9:623 doi:10.1186/1471-2164-9-623Published: 20 December 2008
Increasing evidence shows that whole genomes of eukaryotes are almost entirely transcribed into both protein coding genes and an enormous number of non-protein-coding RNAs (ncRNAs). Therefore, revealing the underlying regulatory mechanisms of transcripts becomes imperative. However, for a complete understanding of transcriptional regulatory mechanisms, we need to identify the regions in which they are found. We will call these transcriptional regulation regions, or TRRs, which can be considered functional regions containing a cluster of regulatory elements that cooperatively recruit transcriptional factors for binding and then regulating the expression of transcripts.
We constructed a hierarchical stochastic language (HSL) model for the identification of core TRRs in yeast based on regulatory cooperation among TRR elements. The HSL model trained based on yeast achieved comparable accuracy in predicting TRRs in other species, e.g., fruit fly, human, and rice, thus demonstrating the conservation of TRRs across species. The HSL model was also used to identify the TRRs of genes, such as p53 or OsALYL1, as well as microRNAs. In addition, the ENCODE regions were examined by HSL, and TRRs were found to pervasively locate in the genomes.
Our findings indicate that 1) the HSL model can be used to accurately predict core TRRs of transcripts across species and 2) identified core TRRs by HSL are proper candidates for the further scrutiny of specific regulatory elements and mechanisms. Meanwhile, the regulatory activity taking place in the abundant numbers of ncRNAs might account for the ubiquitous presence of TRRs across the genome. In addition, we also found that the TRRs of protein coding genes and ncRNAs are similar in structure, with the latter being more conserved than the former.