Evolutionary hierarchies of conserved blocks in 5'-noncoding sequences of dicot rbcS genes
1 Cardiff School of Computer Science, Cardiff University, Queen's Buildings, 5 The Parade, Roath, Cardiff, CF24 3AA, UK
2 Department of Biological Sciences, University of Central Lancashire, Preston, PR1 2HE, UK
3 Institute of Grassland and Environmental Research, Aberystwyth, Ceredigion, SY23 3EB, UK
4 Institute of Biological Sciences, University of Wales, Aberystwyth, Ceredigion, SY23 3DA, UK
BMC Evolutionary Biology 2007, 7:51 doi:10.1186/1471-2148-7-51Published: 2 April 2007
Evolutionary processes in gene regulatory regions are major determinants of organismal evolution, but exceptionally challenging to study. We explored the possibilities of evolutionary analysis of phylogenetic footprints in 5'-noncoding sequences (NCS) from 27 ribulose-1,5-bisphosphate carboxylase small subunit (rbcS) genes, from three dicot families (Brassicaceae, Fabaceae and Solanaceae).
Sequences of up to 400 bp encompassing proximal promoter and 5'-untranslated regions were analyzed. We conducted phylogenetic footprinting by several alternative methods: generalized Lempel-Ziv complexity (CLZ), multiple alignments with DIALIGN and ALIGN-M, and the MOTIF SAMPLER Gibbs sampling algorithm. These tools collectively defined 36 conserved blocks of mean length 12.8 bp. On average, 12.5 blocks were found in each 5'-NCS. The blocks occurred in arrays whose relative order was absolutely conserved, confirming the existence of 'conserved modular arrays' in promoters. Identities of half of the blocks confirmed past rbcS research, including versions of the I-box, G-box, and GT-1 sites such as Box II. Over 90% of blocks overlapped DNase-protected regions in tomato 5'-NCS. Regions characterized by low CLZ in sliding-window analyses were also frequently associated with DNase-protection. Blocks could be assigned to evolutionary hierarchies based on taxonomic distribution and estimated age. Lineage divergence dates implied that 13 blocks found in all three plant families were of Cretaceous antiquity, while other family-specific blocks were much younger. Blocks were also dated by formation of multigene families, using genome and coding sequence information. Dendrograms of evolutionary relations of the 5'-NCS were produced by several methods, including: cluster analysis using pairwise CLZ values; evolutionary trees of DIALIGN sequence alignments; and cladistic analysis of conserved blocks.
Dicot 5'-NCS contain conserved modular arrays of recurrent sequence blocks, which are coincident with functional elements. These blocks are amenable to evolutionary interpretation as hierarchies in which ancient, taxonomically widespread blocks can be distinguished from more recent, taxon-specific ones.