Long-range regulation is a major driving force in maintaining genome integrity
1 McGill Centre for Bioinformatics, McGill University, Montreal, Canada
2 Research Institute of McGill University Health Centre, McGill University and Genome Quebec Innovation Centre, Montreal, Canada
3 Departments of Human Genetics and Experimental Medicine, McGill University, Montreal, Canada
4 School of Computer Science, McGill University, Montreal, Canada
BMC Evolutionary Biology 2009, 9:203 doi:10.1186/1471-2148-9-203Published: 15 August 2009
The availability of newly sequenced vertebrate genomes, along with more efficient and accurate alignment algorithms, have enabled the expansion of the field of comparative genomics. Large-scale genome rearrangement events modify the order of genes and non-coding conserved regions on chromosomes. While certain large genomic regions have remained intact over much of vertebrate evolution, others appear to be hotspots for genomic breakpoints. The cause of the non-uniformity of breakpoints that occurred during vertebrate evolution is poorly understood.
We describe a machine learning method to distinguish genomic regions where breakpoints would be expected to have deleterious effects (called breakpoint-refractory regions) from those where they are expected to be neutral (called breakpoint-susceptible regions). Our predictor is trained using breakpoints that took place along the human lineage since amniote divergence. Based on our predictions, refractory and susceptible regions have very distinctive features. Refractory regions are significantly enriched for conserved non-coding elements as well as for genes involved in development, whereas susceptible regions are enriched for housekeeping genes, likely to have simpler transcriptional regulation.
We postulate that long-range transcriptional regulation strongly influences chromosome break fixation. In many regions, the fitness cost of altering the spatial association between long-range regulatory regions and their target genes may be so high that rearrangements are not allowed. Consequently, only a limited, identifiable fraction of the genome is susceptible to genome rearrangements.