Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes
- Equal contributors
1 Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
2 Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
3 Department of Biosciences at Novum, Karolinska Institutet, Stockholm, Sweden
4 Centre for Molecular Medicine, Department of Medical Genetics, University of British Columbia, Vancouver, Canada
BMC Genomics 2004, 5:99 doi:10.1186/1471-2164-5-99Published: 21 December 2004
Evolutionarily conserved sequences within or adjoining orthologous genes often serve as critical cis-regulatory regions. Recent studies have identified long, non-coding genomic regions that are perfectly conserved between human and mouse, termed ultra-conserved regions (UCRs). Here, we focus on UCRs that cluster around genes involved in early vertebrate development; genes conserved over 450 million years of vertebrate evolution.
Based on a high resolution detection procedure, our UCR set enables novel insights into vertebrate genome organization and regulation of developmentally important genes. We find that the genomic positions of deeply conserved UCRs are strongly associated with the locations of genes encoding key regulators of development, with particularly strong positional correlation to transcription factor-encoding genes. Of particular importance is the observation that most UCRs are clustered into arrays that span hundreds of kilobases around their presumptive target genes. Such a hallmark signature is present around several uncharacterized human genes predicted to encode developmentally important DNA-binding proteins.
The genomic organization of UCRs, combined with previous findings, suggests that UCRs act as essential long-range modulators of gene expression. The exceptional sequence conservation and clustered structure suggests that UCR-mediated molecular events involve greater complexity than traditional DNA binding by transcription factors. The high-resolution UCR collection presented here provides a wealth of target sequences for future experimental studies to determine the nature of the biochemical mechanisms involved in the preservation of arrays of nearly identical non-coding sequences over the course of vertebrate evolution.