This article is part of the supplement: Symposium of Computations in Bioinformatics and Bioscience (SCBB06)
Frequency distribution of TATA Box and extension sequences on human promoters
School of Engineering and Information Technology, Deakin University, 221 Burwood Hwy, Burwood, VIC 3125, Australia
BMC Bioinformatics 2006, 7(Suppl 4):S2 doi:10.1186/1471-2105-7-S4-S2Published: 12 December 2006
TATA box is one of the most important transcription factor binding sites. But the exact sequences of TATA box are still not very clear.
In this study, we conduct a dedicated analysis on the frequency distribution of TATA Box and its extension sequences on human promoters. Sixteen TATA elements derived from the TATA Box motif, TATAWAWN, are classified into three distribution patterns: peak, bottom-peak, and bottom. Fourteen TATA extension sequences are predicted to be the new TATA Box elements due to their high motif factors, which indicate their statistical significance. Statistical analysis on the promoters of mice, zebrafish and drosophila melanogaster verifies seven of these elements. It is also observed that the distribution of TATA elements on the promoters of housekeeping genes are very similar with their distribution on the promoters of tissue specific genes in human.
The dedicated statistical analysis on TATA box and its extension sequences yields new TATA elements. The statistical significance of these elements has been verified on random data sets by calculating their p values.