With multiple metazoan genomes in each family being sequenced promoter analysis is becoming a useful tool in genomic analysis. Aligning the promoter regions in the DNA of C. elegans and C. briggsae identifies conserved promoter elements. While not all promoter elements are conserved and not all conserved regions are promoter elements, we find that conservation is a useful method for determining promoter complexity. Promoter complexity identifies which genes have particularly interesting regulation, identifying gene groups with a strong promoter complexity signal and cases where a gene's promoter complexity differs from the group's promoter complexity.
We identify potential promoter sequence by several local sequence alignment methods. Instead of studying individual promoter elements we are looking at patterns of promoter complexity; the total conserved sequence for each gene gives us a measure for promoter complexity. Monte Carlo random sampling is used to identify Gene Ontology and KEGG Pathway annotated gene groups that appear to have significantly low or high complexity.
Developmental genes were found to have low complexity while growth genes have high complexity. Other groups that we expected to have high significance show none at all or had low promoter complexity. Genes contributing to the extracellular region scored high in promoter complexity while basal transcription factors often scored low in complexity. Genes annotated with GO terms transcription factors, signalling genes, genes with multiple alternative splice products, and developmental genes had significant promoter scores.
We examined gene expression in the published C. elegans microarray experiments and found a strong positive correlation between gene group expression variation and promoter complexity. Promoter complexity tends to be an accurate predictor of the complexity of a gene's pattern of expression and also gives us another tool to find anomalous genes.
This work was supported by Grant Number P20RR-16481 from the National Center for Research Resources (NCRR), a component of the National Institute of Health (NIH).