Two different classes of co-occurring motif pairs found by a novel visualization method in human promoter regions
1 Integrated Database Group, Japan Biological Information Research Center (JBIRC), Japan Biological Informatics Consortium, Aomi 2-41, Koto-ku, Tokyo, 135-0064, Japan
2 Integrated Database Group, Biological Information Research Center (BIRC), National Institute of Advanced Industrial Science and Technology (AIST), Aomi 2-41, Koto-ku, Tokyo, 135-0064, Japan
3 Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
4 Department of Genetics, The Graduate University for Advanced Studies, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
5 Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, Tokyo, 108-8639, Japan
6 Institute for Bioinformatics Research and Development (BIRD), Japan Science and Technology Agency (JST), Japan
BMC Genomics 2008, 9:112 doi:10.1186/1471-2164-9-112Published: 1 March 2008
It is essential in modern biology to understand how transcriptional regulatory regions are composed of cis-elements, yet we have limited knowledge of, for example, the combinational uses of these elements and their positional distribution.
We predicted the positions of 228 known binding motifs for transcription factors in phylogenetically conserved regions within -2000 and +1000 bp of transcriptional start sites (TSSs) of human genes and visualized their correlated non-overlapping occurrences. In the 8,454 significantly correlated motif pairs, two major classes were observed: 248 pairs in Class 1 were mainly found around TSSs, whereas 4,020 Class 2 pairs appear at rather arbitrary distances from TSSs. These classes are distinct in a number of aspects. First, the positional distribution of the Class 1 constituent motifs shows a single peak near the TSSs, whereas Class 2 motifs show a relatively broad distribution. Second, genes that harbor the Class 1 pairs are more likely to be CpG-rich and to be expressed ubiquitously than those that harbor Class 2 pairs. Third, the 'hub' motifs, which are used in many different motif pairs, are different between the two classes. In addition, many of the transcription factors that correspond to the Class 2 hub motifs contain domains rich in specific amino acids; these domains may form disordered regions important for protein-protein interaction.
There exist at least two classes of motif pairs with respect to TSSs in human promoters, possibly reflecting compositional differences between promoters and enhancers. We anticipate that our visualization method may be useful for the further characterisation of promoters.