Human promoter genomic composition demonstrates non-random groupings that reflect general cellular function
1 The Advanced Technology Center, Laboratory of Receptor Biology and Gene Expression, National Cancer Institute, Bethesda, Maryland 20892-4605, USA
2 The University of Texas Southwestern Medical Center at Dallas, TX, USA
3 Bristol-Myers Squibb, Syracuse, NY, USA
BMC Bioinformatics 2005, 6:259 doi:10.1186/1471-2105-6-259Published: 18 October 2005
The purpose of this study is to determine whether or not there exists nonrandom grouping of cis-regulatory elements within gene promoters that can be perceived independent of gene expression data and whether or not there is any correlation between this grouping and the biological function of the gene.
Using ProSpector, a web-based promoter search and annotation tool, we have applied an unbiased approach to analyze the transcription factor binding site frequencies of 1400 base pair genomic segments positioned at 1200 base pairs upstream and 200 base pairs downstream of the transcriptional start site of 7298 commonly studied human genes. Partitional clustering of the transcription factor binding site composition within these promoter segments reveals a small number of gene groups that are selectively enriched for gene ontology terms consistent with distinct aspects of cellular function. Significance ranking of the class-determining transcription factor binding sites within these clusters show substantial overlap between the gene ontology terms of the transcriptions factors associated with the binding sites and the gene ontology terms of the regulated genes within each group.
Thus, gene sorting by promoter composition alone produces partitions in which the "regulated" and the "regulators" cosegregate into similar functional classes. These findings demonstrate that the transcription factor binding site composition is non-randomly distributed between gene promoters in a manner that reflects and partially defines general gene class function.