<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2148-8-99</ui>
   <ji>1471-2148</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Both selective and neutral processes drive GC content evolution in the human genome</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Pozzoli</snm>
               <fnm>Uberto</fnm>
               <insr iid="I1"/>
               <email>uberto.pozzoli@bp.lnf.it</email>
            </au>
            <au id="A2">
               <snm>Menozzi</snm>
               <fnm>Giorgia</fnm>
               <insr iid="I1"/>
               <email>giorgia.menozzi@bp.lnf.it</email>
            </au>
            <au id="A3">
               <snm>Fumagalli</snm>
               <fnm>Matteo</fnm>
               <insr iid="I1"/>
               <email>matteo.fumagalli@bp.lnf.it</email>
            </au>
            <au id="A4">
               <snm>Cereda</snm>
               <fnm>Matteo</fnm>
               <insr iid="I1"/>
               <email>matteo.cereda@bp.lnf.it</email>
            </au>
            <au id="A5">
               <snm>Comi</snm>
               <mi>P</mi>
               <fnm>Giacomo</fnm>
               <insr iid="I2"/>
               <email>giacomo.comi@unimi.it</email>
            </au>
            <au id="A6">
               <snm>Cagliani</snm>
               <fnm>Rachele</fnm>
               <insr iid="I1"/>
               <email>rachele.cagliani@bp.lnf.it</email>
            </au>
            <au id="A7">
               <snm>Bresolin</snm>
               <fnm>Nereo</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>nereo.bresolin@bp.lnf.it</email>
            </au>
            <au id="A8" ca="yes">
               <snm>Sironi</snm>
               <fnm>Manuela</fnm>
               <insr iid="I1"/>
               <email>manuela.sironi@bp.lnf.it</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Scientific Institute IRCCS E. Medea, Bioinformatic Lab, Via don L. Monza 20, 23842 Bosisio Parini (LC), Italy</p>
            </ins>
            <ins id="I2">
               <p>Dino Ferrari Centre, Department of Neurological Sciences, University of Milan, IRCCS Ospedale Maggiore Policlinico, Mangiagalli and Regina Elena Foundation, Via F. Sforza 35, 20100 Milan, Italy</p>
            </ins>
         </insg>
         <source>BMC Evolutionary Biology</source>
         <issn>1471-2148</issn>
         <pubdate>2008</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>99</fpage>
         <url>http://www.biomedcentral.com/1471-2148/8/99</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18371205</pubid>
               <pubid idtype="doi">10.1186/1471-2148-8-99</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>18</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>27</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>27</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Pozzoli et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Mammalian genomes consist of regions differing in GC content, referred to as isochores or GC-content domains. The scientific debate is still open as to whether such compositional heterogeneity is a selected or neutral trait.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here we analyze SNP allele frequencies, retrotransposon insertion polymorphisms (RIPs), as well as fixed substitutions accumulated in the human lineage since its divergence from chimpanzee to indicate that biased gene conversion (BGC) has been playing a role in within-genome GC content variation. Yet, a distinct contribution to GC content evolution is accounted for by a selective process. Accordingly, we searched for independent evidences that GC content distribution does not conform to neutral expectations. Indeed, after correcting for possible biases, we show that intron GC content and size display isochore-specific correlations.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We consider that the more parsimonious explanation for our results is that GC content is subjected to the action of both weak selection and BGC in the human genome with features such as nucleosome positioning or chromatin conformation possibly representing the final target of selective processes. This view might reconcile previous contrasting findings and add some theoretical background to recent evidences suggesting that GC content domains display different behaviors with respect to highly regulated biological processes such as developmentally-stage related gene expression and programmed replication timing during neural stem cell differentiation.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Mammalian genomes are non homogeneous with respect to base composition; striking variations in GC content occur over scales of hundred kilobases to megabases. The so called isochoric structure of the human genome was initially described by Bernardi and coworkers <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and isochores were conceived as long genomic regions fairly homogeneous in their GC composition. Full sequencing of the human genome <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> indicated that the isochore model might need slight revision in that long regions are less compositionally homogeneous than previously thought and transitions at composition domains less sharp, so that the term "GC-content domain" was proposed instead of "isochore". Whatever designation we decide to adopt, the fact remains that isochores/GC content domains represent a large-scale genomic feature lacking a satisfactory interpretation. Indeed, the scientific debate is still open as to whether such a compositional heterogeneity is a selected or neutral trait and different hypothesis have been proposed <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. The biased gene conversion (BGC) model <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> envisages a situation whereby recombination drives GC content in mammalian genomes through the preferential fixation of GC alleles following parental chromosome hetroduplex formation at meiosis. The effect is due to the bias toward GC nucleotides over AT during DNA repair at mismatched bases <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. The model therefore conceives of GC content variation as a by-product of recombination and, although supported by extensive evidence <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>, its ability to explain isochore formation and maintenance has recently been criticized on different grounds. Spencer et al. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> have indicated that recombination rates are too fast-evolving to have permanent effects on base composition; the authors therefore suggested that the cause-consequence relationships might be the other way round with GC rich regions promoting the occurrence of recombination hotspots. Also, several studies have suggested that GC content variation results from a selective process <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. In particular, a role for GC content in chromatin organization and, therefore, gene regulation has been proposed <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Indeed, GC content has been shown to covary with genomic properties such as regulated replication or expression timing <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>, DNA bendability <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and ability to B-Z transition <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, while the existence of a relationship between gene expression level (or breadth) and GC content is still controversial <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. Nonetheless, a positive effect of increased coding sequence GC content on transcriptional efficiency has recently been experimentally demonstrated <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
         <p>Up to now, with the exception of the above mentioned studies on gene expression, evidences of selection acting on GC content <it>per se </it>have been scant (see <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> for review). This might partially be due to difficulty in discriminating between BGC and weak selection.</p>
         <p>Here we analyze SNP allele frequencies, retrotransposon insertion polymorphisms (RIPs), as well as fixed substitutions accumulated in the human lineage since its divergence from chimpanzee to indicate that both biased gene conversion (BGC) and selection have been playing a role in GC content variation.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data retrieval</p>
            </st>
            <p>Gene and intergenic sequences as well as intron/exon boundaries were obtained from the UCSC genome annotation database <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, assembly hg17. Gene selection was performed as previously described <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Isochore boundary coordinates were derived from a previous work <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Fine-scale recombination rates and recombination hotspot locations were obtained from the UCSC database; they are based on HapMap Phase I data <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Pseudogene sequences and genomic locations derive from Pseudogene.org <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>; only duplicated pseudogenes were selected and genes that generated more than one pseudogene were discarded (this procedure limits the number of observations but avoids multiple ties in statistical analysis). Also we retained only gene-pseudogenes pairs located in the same isochore type (for example, both gene and pseudogene located in isochores H1). The final data set consisted of 364 gene-pseudogene pairs. Duplicated pseudogenes often represent gene fragments; we therefore aligned gene-pseudogene couples using ClustalW <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and corresponding intron-pseudointron pairs were retained only if they were both longer than 25 bp. Expression data were obtained as previously described <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and derive from microarray data on 72 healthy human tissues. Mean expression level was calculated as the mean averaged over all tissues (counting as zero all tissues in which there is no detectable expression). Peak expression was calculated as the maximum expression level across all tissues and expression breadth was the number of distinct tissues expressing the gene.</p>
         </sec>
         <sec>
            <st>
               <p>Polymorphism data</p>
            </st>
            <p>Biallelic SNP locations and allele frequencies were downloaded from the HapMap web site <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> (non-redundant dataset, release 21a). Since previous authors <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> have indicated African populations as having genetic variation patterns most compatible with a constant population size, SNP allele frequencies were obtained for Yoruba (YRI), and derive from the genotyping of 60 individuals. The ancestral allele was inferred by alignment with the chimpanzee sequence (UCSC genome browser, assembly panTro1); SNPs were discarded when orthologous chimpanzee regions were unavailable or did not match either human allele. A total of about 2.2 million GC->AT and 1.7 million AT->GC SNPs were retained. We next purged SNPs at CpG sites, as well as those with no associated allele frequencies: the final dataset comprised more than 2 million SNPs.</p>
            <p>For the analysis of substitution rates and stationary GC content (GC*), SNPs deriving from the Seattle SNP database <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, which derive from resequencing experiments, were used; for 206 human genes in the Seattle SNP dataset both chimpanzee and macaque orthologous loci could be retrieved.</p>
            <p>Data on polymorphic repeat insertions were obtained through the UCSC genome database (RIPs track) and derive from the dbRIP database <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>; RIPs which have been associated with a human genetic disease were discarded. Also, polymorphic insertions were not included in the study if less than 10 instances were described for the same retrotransposon subfamily. Fixed transposon instances were identified and categorized using the UCSC annotation tables that rely on RepeatMasker. Since fixed and polymorphic repeat instances derive from different sources, we verified that no systematic bias occurs in the detection of either insertion events by calculating correlation between polymorphic and fixed chromosomal frequencies; significant correlations were retrieved for Alus, SVAs and L1s (Spearman rho = 0.854, 0.439 and 0.923, respectively; all <it>p </it>values &lt; 0.05). Reference sequences for different retrotransposon subtypes were derived from Repbase Update <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of allele frequency spectra</p>
            </st>
            <p>Introns/intergenic spacers were divided in 1 kb windows (1 kbseqs) starting from the most 5' nucleotide position (with respect to the chromosome orientation) and extending through the intron/intergenic region in 1000 bp non-overlapping steps (residual nucleotides in 3' were discarded). The following features were then calculated (or retrieved) for all 1 kbseqs: (1) fine scale recombination rate, (2) GC content, (3) allele frequencies of comprised SNPs, (4) expression parameters (peak, mean level and breadth) of the corresponding genes. In order to analyze allele frequency spectra after controlling for recombination rate, we applied the following procedure: starting from all 1 kbseqs, we identified couples of 1 kbseqs that differed less than 10% in recombination rate but displayed extremely different GC contents; in particular, we asked one partner of the recombination-coupled 1 kbseqs to be located below the 30<sup>th </sup>percentile in the distribution of 1 kbseqs GC content and the other one above the 70<sup>th </sup>percentile. This approach yielded two groups of sequences having extremely similar recombination rates (the equality of medians was checked using the Wilcoxon Rank Sum Test) but very different GC contents. A similar procedure was applied to analyze allele frequency spectra after controlling for recombination rate; in particular, 1 kbseq couples were created having similar GC content (a difference lower than 5% was required) but extremely different recombination rates. Again, two groups of sequences were obtained and used for comparisons.</p>
            <p>The same approach described above can be extended to control for two variables: for example, in order to compare allele frequency spectra between highly and lowly expressed sequences, 1 kbseqs couples were identified that displayed both similar GC content and recombination rate (less than 5% and 10% difference, respectively) but extremely different expression levels. To allow comparisons between introns and intergenic spacers, percentile values were calculated over the complete set of 1 kbseqs, irrespective of their location.</p>
            <p>In order to quantify the displacement of GC vs AT derived allele frequency distributions observed in Quantile-Quantile plots, differences between corresponding percentiles in the two distributions were summed. These measures were used to compare different groups of sequences selected on the basis of relevant variables (for example high and low GC content or recombination rate). We used bootstrapping procedures to assess the statistical significance of differences in allele frequencies shifts. In particular, 1000 permutations were performed and <it>p </it>values were calculated after normality assessment through the Shapiro-Wilk Test.</p>
         </sec>
         <sec>
            <st>
               <p>Multispecies alignments, substitution rates and stationary GC content</p>
            </st>
            <p>Orthologous human-chimpanzee-macaque regions were retrieved using the liftOver utility from UCSC (assemblies: panTro1 and rheMac2) with a cutoff of at least 70% remapping bases. Three-way species alignments were performed using MAVID <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>.</p>
            <p>In order to calculate substitution rates and GC* after controlling for ancestral GC content or recombination rate, a procedure similar to the one described above for SNP allele frequencies was applied, with the only difference that the inferred ancestral GC content was used instead of human GC content. In particular, 1 kbseqs couples were created (on the basis of either recombination rate or ancestral GC content) and their position subsequently mapped onto the 3-way species (human/chimpanzee/macaque) alignments; at this stage windows containing less than 600 perfectly aligning bases (i.e. the same nucleotide in the 3 species) were discarded and, for the remaining ones, the ancestral sequence was reconstructed by parsimony (only positions where the macaque was identical to either human or chimpanzee were considered).</p>
            <p>The number of 1 kbseqs couples and the corresponding number of sites (in MB) that were analyzed for each comparison are reported in table notes. The number of sites does not exactly correspond to the number of sequences multiplied by 1000 because the presence of gaps in the human sequence (as compared to the two primates) can result in alignments longer than 1000 bp.</p>
            <p>Substitution rates and stationary GC content were calculated using a previously developed neighbor-dependent substitution model <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. For each comparison, the two 1 kbseqs groups were then divided in 20 paired sub-samples of equal size; GC* and substitution rates were calculated for each sub-sample; average values are reported in the tables, together with <it>p </it>values obtained from two tailed Wilcoxon Rank Sum Tests for paired samples.</p>
            <p>For the analysis of intron GC content in relation to size, we discarded first introns (due to their increased sequence constraints) and introns shorter than 750 bp (in order to spare constrained sequences at splice sites).</p>
            <p>For the analysis, of recombination rates in long and short introns, for each gene, two introns were selected so that one was longer than 80<sup>th </sup>and the other shorter than 20<sup>th </sup>size percentile of introns length distribution. If no introns satisfied the criteria, the gene was not analyzed. Recombination rates were calculated for 500 bp centered around the median position of each intron. Differences in recombination rates were evaluated using the Wilcoxon Rank Sum Test.</p>
            <p>For the analysis of fixed variations in recombination hotspots, we selected 897 hotspot on the basis of their size (smaller than 5 kb) and recombination rate (above the 80<sup>th </sup>percentile of the distribution of all hotspots); in 790 cases both chimpanzee and macaque orthologous regions could be retrieved.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical analysis</p>
            </st>
            <p>All statistical analysis were performed using R <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. For loess fittings <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> a smoothing span of 0.5 was used.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>Analysis of SNP allele frequencies</p>
            </st>
            <p>The analysis of SNP allele frequencies is a convenient strategy to study GC content evolution for two main reasons. First, when SNP allele frequencies are analyzed, no requirement for base composition stationarity is needed; this is relevant since base composition has been shown not to be at equilibrium in mammals <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>. Second, given the fast evolution of recombination rates and hotspots <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>, allele frequencies of SNPs, which represent relatively recent variations, should carry the most evident signature of recombination-associated fixation biases.</p>
            <p>Starting from our gene set, we therefore used the chimpanzee sequence to infer the ancestral allele so that variations could be classified as either GC->AT or AT->GC (SNPs at CpG sites were excluded). As previously noted <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> treating SNPs as independent data, despite the extensive presence of linkage disequilibrium in the human genome <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, introduces no bias since linkage is expected to be independent from the GC/AT status of individual SNPs.</p>
            <p>BGC and selection are both expected to result in AT->GC mutations segregating at higher frequencies compared to GC->AT. Yet, this effect is expected to be stronger in highly recombining regions if BGC is involved. Conversely, selection on GC content should be acting differentially depending on the background GC content of SNP flanking regions; in particular, AT->GC variations are expected to segregate at higher frequency in GC rich regions, irrespective of recombination rate. In order to disentangle a possible selective effect from BGC, we analyzed SNP allele frequencies in noncoding genomic sequences after correcting for either GC content or recombination rate. As further detailed in methods, genomic regions were divided in 1 kb sub-sequences. These latter were arranged in couples having very similar recombination rate and extremely different GC content for the comparison of allele frequencies between GC-rich and -poor sequences. Similarly, for the comparison between high- and low-recombining sequences, sequences were arranged in couples showing very similar GC content but extremely different recombination rates. The results of SNP frequency spectra analysis are reported in figure <figr fid="F1">1</figr> as quantile-quantile plots; in agreement with previous findings <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> and consistent with the action of BGC, GC derived alleles display higher frequencies than AT alleles but the effect is significantly (<it>p </it>&lt; 10<sup>-17</sup>) stronger in highly recombining regions for both introns (Figure <figr fid="F1">1A</figr>) and intergenic spacers (see Additional file <supplr sid="S1">1</supplr>); yet, when allele frequencies were compared after fixing recombination rate, a residual effect of GC content was observed: derived GC alleles segregate at significantly higher frequencies in regions showing a high GC content (<it>p </it>= 2.07 &#215; 10<sup>-10</sup>) compared to AT-rich regions (Figure <figr fid="F1">1B</figr> and Additional file <supplr sid="S1">1</supplr> for intergenic spacers).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>analysis of allele frequency spectra for intergenic regions. The figures provided represent quantile-quantile plots of GC->AT and AT->GC derived allele frequencies for 5' and 3' intergenic sequences.</p>
               </text>
               <file name="1471-2148-8-99-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Comparison of allele frequency spectra</p>
               </caption>
               <text>
                  <p><b>Comparison of allele frequency spectra</b>. (<b>A</b>) Quantile-quantile plots of GC->AT and AT-> GC derived allele frequencies for highly (red) and low (blue) recombining intronic regions after fixing GC content. (<b>B</b>) The same as (<b>A</b>), but in this case we fixed recombination rates and compared high (red) vs low (blue) GC regions. (<b>C</b>) The same as (<b>A</b>), but in this case we fixed both GC content and recombination rates in order to compare regions from highly (red) vs low (blue) expressed genes.</p>
               </text>
               <graphic file="1471-2148-8-99-1"/>
            </fig>
            <p>These data suggest that GC content or other related features affect SNP allele segregation independently of recombination rates, although we cannot formally rule out the possibility that extinct recombination hotspots have played a role in the allele frequency spectra we observe. Indeed, as reported above, recombination hotspots are fast evolving <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp> and, therefore, the observed increased segregation of GC alleles in GC-rich regions might have been caused by the presence of an hotspot which is now inactive. Yet, if this latter were the case, given the relatively small effect that recombination has played in recent primate history on GC variation (see below), and given that most SNPs are specific to humans (and, therefore, relatively young), a direct role for GC content in promoting recombination events must be postulated to explain our results.</p>
            <p>Since GC content has been shown to increase transcriptional activity <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> and some authors detected a positive correlation between gene expression parameters and GC content <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B46">46</abbr></abbrgrp>, we wished to determine whether expression level, rather than GC content <it>per se</it>, was responsible for increased segregation of GC alleles. Yet, after controlling for both GC content and recombination rates (as described in methods, we used a similar approach to the one described above) we detected no significant difference in SNP allele frequencies between high- and low-level expressed genes (Figure <figr fid="F1">1c</figr> and Additional file <supplr sid="S1">1</supplr>). These data are not consistent with selection acting on highly or broadly expressed human genes to increase (or maintain) their GC content, although we cannot exclude that such a selection has acted during vertebrate evolution and subsequently relaxed in humans (further data on gene expression level and GC content evolution are reported below).</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of retrotransposon insertion polymorphisms</p>
            </st>
            <p>It has been suggested <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> that, if selection is acting on base composition, it should also affect the fixation probabilities of transposable elements; indeed, fixed Alu and LINE-1 (L1) elements (average GC content of reference sequences = 0.53 and 0.41, respectively) are differentially represented in the human genome depending on GC content <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, despite both having a preference for AT-rich integration sites <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>. Yet, a better estimation of fixation versus integration probabilities might be obtained by the comparison of polymorphic and fixed transposable elements. We retrieved all available instances of retrotransopson insertion polymorphisms. For both Alu and L1 repeats, we restricted the analysis of fixed repeats to the same subfamilies showing at least 10 instances of polymorphic insertions. Such subfamilies represent relatively young insertion events, yet, given the previously reported preference of older Alus for GC rich regions <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B49">49</abbr></abbrgrp> we further purged all fixed Alu elements showing a divergence higher than 5%. As shown in figure <figr fid="F2">2A</figr>, both SVA (GC content of reference sequence = 0.63) and Alu fixed elements are located in regions with significantly higher GC content compared to their polymorphic counterparts (Wilcoxon Rank Sum Test, two-tailed, p = 0.028 and p &lt; 10<sup>-21</sup>, respectively); conversely, fixed and polymorphic L1 flanking regions do not show different average GC contents, being both relatively GC poor (as L1 sequences are).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Analysis of fixed versus polymorphic retrotransposon insertions</p>
               </caption>
               <text>
                  <p><b>Analysis of fixed versus polymorphic retrotransposon insertions</b>. (A) Analysis of average GC content flanking polymorphic (P, white) and fixed (F, gray) retrotransposons. GC content was calculated in 5 kb flanking the repeat. The number of repeat instances is also indicated. GC content is significantly higher for regions flanking fixed compared to polymorphic Alus ; the same holds for SVAs. (<b>B</b>) Analysis of polymorphic (white) and fixed (gray) retrotransposon relative frequency in different isochores (L1 to H3, ordered from 1 to 5, as described in [20]). Fixed Alus are significantly enriched in heavy isochores compared to polymorphic instances.</p>
               </text>
               <graphic file="1471-2148-8-99-2"/>
            </fig>
            <p>We next wished to verify whether polymorphic and fixed repetitive elements were differently distributed depending on isochore type. Isochores were classified according to a recent <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> description and are referred to as L1, L2, H1, H2, and H3, in order of increasing GC levels. The results of transposable element distribution are reported in figure <figr fid="F2">2B</figr> and indicate that fixed Alus are significantly enriched (Chi Square Test, p &lt; 10<sup>-5</sup>) within heavy isochores compared to polymorphic instances, while no different isochore distribution of fixed vs polymorphic repeats was evident for SVAs (possibly because of the small number of polymorphic insertions, <it>n </it>= 60) or L1s. For further confirmation we performed this same analysis using IsoFinder isochores <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> and the same results were obtained (see Additional file <supplr sid="S2">2</supplr>). These data confirm the preferential integration of Alus and L1s in AT-rich regions (polymorphic L1 and Alu distributions are relatively similar, Figure <figr fid="F2">2B</figr>), but indicate that additional forces, which relate to GC content, drive their fixation.</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>Analysis of fixed versus polymorphic retrotransposon insertions. The figure provides an analysis of polymorphic and fixed retrotransposon relative frequency in different isochores (identified as described in <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>)</p>
               </text>
               <file name="1471-2148-8-99-S2.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>We believe that our results differ from previous reports showing no different GC content surrounding polymorphic and fixed Alus <abbrgrp><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp> because of the larger sample of polymorphic elements we analyzed.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of substitution rates and stationary GC content</p>
            </st>
            <p>We next wished to infer nucleotide changes fixed in the human lineage after divergence from chimpanzee by using macaque as an outgroup. In analogy to the procedure we applied for SNP frequency spectra, we analyzed substitution rates and stationary GC content (GC*, i.e. the GC content toward which sequences are evolving according to measured substitution rates) after controlling for recombination rates or ancestral GC content. Data are reported in table <tblr tid="T1">1</tblr> and indicate that GC* is significantly higher for both highly recombining and GC-rich sequences compared to their less recombining and GC-poorer counterparts. Yet, different processes seem to explain GC* increase in the two comparisons. All substitution rates increase with recombination, an observation consistent with recombination being mutagenic, as previously suggested <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B53">53</abbr></abbrgrp>; in particular, AT->GC substitution rate shows the most marked difference between high vs low recombining regions. Conversely, when recombination was controlled for, we observed a moderate increase of AT->GC rate in GC-rich compared to GC-poor regions, while all other substitution rates (including GC->AT) decrease. This observation, verified in both introns and intergenic spacers (see Additional file <supplr sid="S3">3</supplr>), rules out the possibility that the confounding effects of extinct recombination hotspots account for substitution rates and increased GC* in GC-rich regions. Indeed, if previously active hotspots had left a molecular signature in GC-rich regions, causing increase in GC content, we would expect substitution rates in GC-rich regions to display a similar trend as those observed in highly recombining regions and, as shown in table <tblr tid="T1">1</tblr>, this is not the case.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>substitution rates and GC* in intergenic regions. The data provided represent tables of substitution rates and GC* for 3' and 5' intergenic regions.</p>
               </text>
               <file name="1471-2148-8-99-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Substitution rates and GC* in intronic regions</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="center">
                        <p>Substitution type</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Fixed GC content</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Fixed recombination rate</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Low rec.<sup>a</sup></p>
                     </c>
                     <c ca="center">
                        <p>High rec.<sup>a</sup></p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>p</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>Low GC</p>
                     </c>
                     <c ca="center">
                        <p>High GC</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>p</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A/T -> C/G</p>
                     </c>
                     <c ca="right">
                        <p>0.00068</p>
                     </c>
                     <c ca="right">
                        <p>0.00091</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00070</p>
                     </c>
                     <c ca="right">
                        <p>0.00075</p>
                     </c>
                     <c ca="right">
                        <p>5.6 &#215; 10<sup>-3</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A/T -> G/C</p>
                     </c>
                     <c ca="right">
                        <p>0.00287</p>
                     </c>
                     <c ca="right">
                        <p>0.00366</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00275</p>
                     </c>
                     <c ca="right">
                        <p>0.00324</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A/T -> T/A</p>
                     </c>
                     <c ca="right">
                        <p>0.00059</p>
                     </c>
                     <c ca="right">
                        <p>0.00065</p>
                     </c>
                     <c ca="right">
                        <p>6.3 &#215; 10<sup>-5</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00067</p>
                     </c>
                     <c ca="right">
                        <p>0.00056</p>
                     </c>
                     <c ca="right">
                        <p>5.7 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C/G -> G/C</p>
                     </c>
                     <c ca="right">
                        <p>0.00096</p>
                     </c>
                     <c ca="right">
                        <p>0.00114</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00104</p>
                     </c>
                     <c ca="right">
                        <p>0.00101</p>
                     </c>
                     <c ca="right">
                        <p>1.2 &#215; 10<sup>-2</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C/G -> A/T</p>
                     </c>
                     <c ca="right">
                        <p>0.00086</p>
                     </c>
                     <c ca="right">
                        <p>0.00099</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-5</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00108</p>
                     </c>
                     <c ca="right">
                        <p>0.00087</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C/G -> T/A</p>
                     </c>
                     <c ca="right">
                        <p>0.00300</p>
                     </c>
                     <c ca="right">
                        <p>0.00336</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00336</p>
                     </c>
                     <c ca="right">
                        <p>0.00300</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CpG -> TpG</p>
                     </c>
                     <c ca="right">
                        <p>0.02427</p>
                     </c>
                     <c ca="right">
                        <p>0.027713</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.03117</p>
                     </c>
                     <c ca="right">
                        <p>0.02352</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GC*</p>
                     </c>
                     <c ca="right">
                        <p>0.40750</p>
                     </c>
                     <c ca="right">
                        <p>0.43310</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.37349</p>
                     </c>
                     <c ca="right">
                        <p>0.42967</p>
                     </c>
                     <c ca="right">
                        <p>1.9 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of sites (Mb)</p>
                     </c>
                     <c ca="right">
                        <p>19.42</p>
                     </c>
                     <c ca="right">
                        <p>19.42</p>
                     </c>
                     <c ca="right">
                        <p>-</p>
                     </c>
                     <c ca="right">
                        <p>17.38</p>
                     </c>
                     <c ca="right">
                        <p>17.33</p>
                     </c>
                     <c ca="right">
                        <p>-</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>a </sup>rec.: recombination rate</p>
               </tblfn>
            </tbl>
            <p>Still, the data we report here are consistent with selection acting to maintain GC content but also with the presence of mutation biases operating in different GC content regions. In order to evaluate this latter possibility we calculated substitution rates and GC* using either fixed variations or SNPs; while SNPs can reasonably be thought to reflect mutation rates, fixed variations depend on both mutation rates and fixation probabilities. In this case, in order to avoid biases towards high frequency variants, the analysis was restricted to intronic regions deriving from 206 fully resequenced genes (see methods). Also, given the influence, documented above, of recombination on mutation rates, we used only gene regions (1 kb windows) showing low crossover rates.</p>
            <p>As shown in table <tblr tid="T2">2</tblr>, and in agreement with previous findings <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, a very similar (Wilcoxon Rank Sum Tests for paired samples, p = 0.37) intronic GC* is obtained when SNPs are used to infer substitution rates, irrespective of background GC content. Conversely, when fixed variations were taken into account, GC* resulted to be significantly higher for GC-rich than GC-poor sequences. These data suggest that mutation biases, which would be recapitulated by SNP mutations, do not account for the difference in GC* we observe when genomic regions displaying different background GC contents are analyzed; rather, such differences derive from diverse fixation probabilities. These data are therefore fully consistent with the analysis of SNP allele frequency spectra we reported above.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Substitutions rates and GC* calculated for fixed substitutions and SNPs</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="center">
                        <p>Substitution type</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Fixed substitutions</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>SNPs</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Low GC</p>
                     </c>
                     <c ca="center">
                        <p>High GC</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>p</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>Low GC</p>
                     </c>
                     <c ca="center">
                        <p>High GC</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>p</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A/T -> C/G</p>
                     </c>
                     <c ca="right">
                        <p>0.00049</p>
                     </c>
                     <c ca="right">
                        <p>0.00057</p>
                     </c>
                     <c ca="right">
                        <p>4.1 &#215; 10<sup>-1</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00020</p>
                     </c>
                     <c ca="right">
                        <p>0.00014</p>
                     </c>
                     <c ca="right">
                        <p>7.6 &#215; 10<sup>-1</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A/T -> G/C</p>
                     </c>
                     <c ca="right">
                        <p>0.00214</p>
                     </c>
                     <c ca="right">
                        <p>0.00257</p>
                     </c>
                     <c ca="right">
                        <p>6.9 &#215; 10<sup>-2</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00069</p>
                     </c>
                     <c ca="right">
                        <p>0.00074</p>
                     </c>
                     <c ca="right">
                        <p>7.3 &#215; 10<sup>-1</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A/T -> T/A</p>
                     </c>
                     <c ca="right">
                        <p>0.00049</p>
                     </c>
                     <c ca="right">
                        <p>0.00043</p>
                     </c>
                     <c ca="right">
                        <p>5.0 &#215; 10<sup>-1</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00018</p>
                     </c>
                     <c ca="right">
                        <p>0.00014</p>
                     </c>
                     <c ca="right">
                        <p>6.6 &#215; 10<sup>-1</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C/G -> G/C</p>
                     </c>
                     <c ca="right">
                        <p>0.00088</p>
                     </c>
                     <c ca="right">
                        <p>0.00080</p>
                     </c>
                     <c ca="right">
                        <p>5.5 &#215; 10<sup>-1</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00031</p>
                     </c>
                     <c ca="right">
                        <p>0.00028</p>
                     </c>
                     <c ca="right">
                        <p>9.0 &#215; 10<sup>-1</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C/G -> A/T</p>
                     </c>
                     <c ca="right">
                        <p>0.00095</p>
                     </c>
                     <c ca="right">
                        <p>0.00075</p>
                     </c>
                     <c ca="right">
                        <p>1.3 &#215; 10<sup>-1</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00035</p>
                     </c>
                     <c ca="right">
                        <p>0.00027</p>
                     </c>
                     <c ca="right">
                        <p>2.9 &#215; 10<sup>-1</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C/G -> T/A</p>
                     </c>
                     <c ca="right">
                        <p>0.00275</p>
                     </c>
                     <c ca="right">
                        <p>0.00268</p>
                     </c>
                     <c ca="right">
                        <p>7.3 &#215; 10<sup>-1</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.00105</p>
                     </c>
                     <c ca="right">
                        <p>0.00109</p>
                     </c>
                     <c ca="right">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CpG -> TpG</p>
                     </c>
                     <c ca="right">
                        <p>0.03226</p>
                     </c>
                     <c ca="right">
                        <p>0.02081</p>
                     </c>
                     <c ca="right">
                        <p>7.3 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.01464</p>
                     </c>
                     <c ca="right">
                        <p>0.01300</p>
                     </c>
                     <c ca="right">
                        <p>6.7 &#215; 10<sup>-1</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GC*</p>
                     </c>
                     <c ca="right">
                        <p>0.35548</p>
                     </c>
                     <c ca="right">
                        <p>0.41816</p>
                     </c>
                     <c ca="right">
                        <p>1.3 &#215; 10<sup>-4</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.35703</p>
                     </c>
                     <c ca="right">
                        <p>0.38604</p>
                     </c>
                     <c ca="right">
                        <p>3.7 &#215; 10<sup>-1</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of sites (Mb)</p>
                     </c>
                     <c ca="right">
                        <p>0.96</p>
                     </c>
                     <c ca="right">
                        <p>0.44</p>
                     </c>
                     <c ca="right">
                        <p>-</p>
                     </c>
                     <c ca="right">
                        <p>0.96</p>
                     </c>
                     <c ca="right">
                        <p>0.44</p>
                     </c>
                     <c ca="right">
                        <p>-</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Note: as stated in the text, only regions displaying low recombination rates were analyzed.</p>
               </tblfn>
            </tbl>
            <p>Finally, we wished to verify whether analysis of substitution rates and GC* confirmed our above indication that gene expression levels have not been influencing base composition evolution in recent human history. In addition to serving as a useful confirmation, this approach allows analysis of fixed variations at CpGs, which is not feasible using SNP allele frequency spectra (due to recurrent mutations at these dinucleotides); this is relevant to the topic we are addressing since previous authors have indicated that both gene GC content and CpG level correlate with gene expression parameters <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Again, we analyzed substitution rates and GC* in genes displaying narrow and wide expression breadth, after controlling for both GC content and recombination rates: we found no significant differences in either substitution rates (including CpG->TpG) or GC* between the two groups of sequences (not shown).</p>
         </sec>
         <sec>
            <st>
               <p>Local excess of AT->GC fixed variations at recombination hotspots</p>
            </st>
            <p>The possibility that BGC has permanent effects on base composition has recently been questioned <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, being its effect too weak and hotspots too ephemeral. The availability of an outgroup species now allows orientation of substitutions events which accumulated after human/chimpanzee divergence and, therefore, an excess of fixed AT->GC mutation should be observed at recombination hotspots if BGC exerts a strong enough bias. We selected 897 human recombination hotspot on the basis of their size (smaller than 5 kb) and recombination rate (above the 80<sup>th </sup>percentile of the distribution of all hotspots); in 790 cases both chimpanzee and macaque orthologous regions could be retrieved. As controls, we used 20 samples of randomly selected sequences with a GC content differing less than 1% from that of each hotspot and having its same size. The frequency (Tab. <tblr tid="T3">3</tblr>) of fixed AT->GC mutations is significantly (Wilcoxon Rank Sum Test for paired samples with Bonferroni correction for multiple tests, maximum <it>p </it>= 0.0285) but only slightly (1.08 fold) higher in hotspots compared to control sequences, while no difference is observed for the other substitution types. Yet, no difference in AT->GC fixation was observed when the hotspot 4 kb flanking sequences were compared to their control counterparts. These data are consistent with recombination hotspots having a very small and local effect on GC allele fixation frequency.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Average frequency of fixed substitutions in recombination hotspots and control regions</p>
               </caption>
               <tblbdy cols="10">
                  <r>
                     <c ca="center">
                        <p>Substitution</p>
                     </c>
                     <c ca="center">
                        <p>Hotspot</p>
                     </c>
                     <c ca="center">
                        <p>Control</p>
                     </c>
                     <c ca="center">
                        <p>Maximum <it>p</it></p>
                     </c>
                     <c ca="center">
                        <p>5' hotspot flank</p>
                     </c>
                     <c ca="center">
                        <p>5' control flank</p>
                     </c>
                     <c ca="center">
                        <p>Maximum <it>p</it></p>
                     </c>
                     <c ca="center">
                        <p>3' hotspot flank</p>
                     </c>
                     <c ca="center">
                        <p>3' control flank</p>
                     </c>
                     <c ca="center">
                        <p>Maximum <it>p</it></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AT -> GC</p>
                     </c>
                     <c ca="center">
                        <p>0.0051</p>
                     </c>
                     <c ca="center">
                        <p>0.0047</p>
                     </c>
                     <c ca="center">
                        <p>0.0285</p>
                     </c>
                     <c ca="center">
                        <p>0.0048</p>
                     </c>
                     <c ca="center">
                        <p>0.0046</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.0048</p>
                     </c>
                     <c ca="center">
                        <p>0.0046</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>GC -> AT</p>
                     </c>
                     <c ca="center">
                        <p>0.0066</p>
                     </c>
                     <c ca="center">
                        <p>0.0066</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.0064</p>
                     </c>
                     <c ca="center">
                        <p>0.0065</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.0067</p>
                     </c>
                     <c ca="center">
                        <p>0.0064</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AT -> AT</p>
                     </c>
                     <c ca="center">
                        <p>0.0008</p>
                     </c>
                     <c ca="center">
                        <p>0.0009</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.0008</p>
                     </c>
                     <c ca="center">
                        <p>0.0008</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.0007</p>
                     </c>
                     <c ca="center">
                        <p>0.0008</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>GC -> GC</p>
                     </c>
                     <c ca="center">
                        <p>0.0013</p>
                     </c>
                     <c ca="center">
                        <p>0.0014</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.0013</p>
                     </c>
                     <c ca="center">
                        <p>0.0013</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.0014</p>
                     </c>
                     <c ca="center">
                        <p>0.0013</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><it>Note: </it>Frequencies were calculated as number of fixed variation over the number of potentially mutable sites (i.e. for AT>GC the frequency was calculated as the number of substitutions over the total number of AT nucleotides in the human/chimpanzee ancestor). For both hotspots and controls 2 kb flanking sequences were analyzed. <it>p </it>values for each comparison (between hotspots and each control sample) were calculated using the Wilcoxon Rank Sum Test for paired samples. The Bonferroni correction for multiple tests was applied and the maximum <it>p </it>value is reported.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Intron GC distribution deviates from neutral expectations</p>
            </st>
            <p>Finally, we wished to determine whether GC content in human introns conforms to neutral expectations. As shown in figure <figr fid="F3">3A</figr>, human introns located in light and heavy isochores yield two relatively distinct distributions when their GC content is plotted against size; the effect is not due to the presence of transposable elements, since similar trends are observed when GC content is calculated after masking repetitive sequences (see Additional file <supplr sid="S4">4</supplr>). Yet, a similar relationship is somehow expected: shorter introns are likely to display more extreme GC values due to sampling biases. In order to verify that this is not the sole explanation for our findings, for each intron and after masking for repetitive sequences, we calculated the GC content in a 200 bp window (GC<sub>200</sub>) centered around its median position. In particular, only introns longer than 500 bp were analyzed (in order to avoid splice site constraints) and GC content was calculated only if the 200 central nucleotides were covered by repeats for less than 20% of their sequence. This procedure assures that the same number of intronic nucleotides is used for GC content calculations so that sampling biases (due to extreme variations in intron size) are avoided. Introns from either light or heavy isochores were then grouped in 6 percentile size classes and their GC<sub>200 </sub>analyzed: a significant decrease of GC<sub>200 </sub>with increasing residual intron size (intron size calculated after repeat removal) is observed for introns located in heavy isochores while an increasing trend is evident for those located in GC-poor isochores (Figure <figr fid="F3">3B</figr>).</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>Analysis of GC content distribution and size for human introns. the data provide an analysis of intron size and GC content calculated after repeat removal. Also, additional data concerning the relationship between intron size and GC content are shown: in particular, both a different isochore identification procedure was applied and the gene GC content instead of isochore location were used.</p>
               </text>
               <file name="1471-2148-8-99-S4.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Analysis of GC content distribution in human introns with different isochoric location</p>
               </caption>
               <text>
                  <p><b>Analysis of GC content distribution in human introns with different isochoric location</b>. Isochore definition is as described in [20]. (A) Scatter plot and loess fitting of intron size and GC content in light (blue) and heavy (red) isochores. (B) Analysis of GC<sub>200 </sub>(see text). GC<sub>200 </sub>significantly increases or decreases with residual size (percentile classes are shown) for introns located in heavy (red; breaks in bp = 681, 934, 1309, 1960, 3665) or light (blue; breaks in bp = 810, 1181, 1714, 2638, 5476) isochores, respectively (Kruskall Wallis Test, <it>p </it>= 1.3 &#215; 10<sup>-34 </sup>and 7.9 &#215; 10<sup>-7</sup>, respectively). The number of introns in each size class amounted to 2490 and 1567 for heavy and light isochores, respectively. (C) Distributions of within-gene correlation coefficients. For each gene having more than 15 introns (n = 500 and 1021 for light and heavy isochores, respectively) we calculated correlation coefficients between masked GC content and residual size. Hatched and dotted lines represent envelopes (1<sup>st </sup>and 99<sup>th </sup>percentiles, respectively) of correlation coefficient distributions obtained by randomization. (D) Scatter plot and loess fits of GC content over intron size (log<sub>10 </sub>values) for introns (upper panel) and pseudointrons (lower panel). Spearman correlation coefficients (<it>rho</it>) are also shown (all <it>p </it>values were &lt; 0.01). Introns and pseudointrons were divided on the basis of their isochoric location: blue for light isochores (501 introns-pseudointrons pairs), red for heavy ones (926 pairs).</p>
               </text>
               <graphic file="1471-2148-8-99-3"/>
            </fig>
            <p>We speculated that these results might originate from the preferential location of genes with short introns in regions displaying an extreme GC content. Yet, we verified that this is not the case, since introns belonging to the same gene tend to recapitulate the distributions observed above; in particular, introns belonging to genes located in light isochores tend to display an increase in GC content with size; those located in heavy isochores behave in the opposite manner. This is shown in figure <figr fid="F3">3</figr>: we selected genes having more than 15 introns and calculated, for each one, the correlation coefficient between the masked GC content of its intervening regions and their residual size; the distributions of correlation coefficients are shifted to positive and negative values for genes located in light and heavy isochores (Figure <figr fid="F3">3C</figr>), respectively. The significance of this finding was assessed by re-sampling (GC content and intron size were randomly assorted 1000 times for each gene).</p>
            <p>All these analyzes have been performed after removal of transposable elements from both GC and size calculations; still, it might be argued that old, unrecognizable transposable elements have contributed to both intronic GC content and size, therefore explaining the observed distributions. In order to verify that this is not the sole explanation for our findings, we analyzed nonrepetitive GC content and residual intron length in intron-pseudointron pairs: old transposable elements gave the same contribution to both intron and pseudointrons (as their insertion predated pseudogene duplication) and therefore, once recognizable transposable elements have been masked, any difference in GC distribution is expected to be accounted for by repeat-independent events. Data are reported in figure <figr fid="F3">3D</figr> and show the homogenization of GC content in short pseudointrons (compared to real ones) located in light or heavy isochores.</p>
            <p>It should be noted that many different isochore-identification methods have been described. We therefore verified that the results above were also obtained using IsoFinder <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> isochore definition (see additional file <supplr sid="S4">4</supplr> for figures and details); also, the same results are obtained when the gene GC content (rather than isochore attribution) is used to define "light" (average GC content &lt; 0.41) and "heavy" genes (see additional file <supplr sid="S4">4</supplr> for figures and details).</p>
            <p>In summary, these data indicate that intron GC content and size do not evolve independently; even when possible confounding effects such as size variation, presence of transposable elements and skewed genomic location are taken into account, isochore-specific correlations exist between intron size and GC content. Although there is no theoretic basis to expect it, we verified that no significant difference exists between recombination rates of long and short introns in both heavy and light isochores (not shown, see methods for details). Therefore, the data we report here can hardly be reconciled with a vision whereby BGC alone drives GC content evolution; rather, these finding might be consistent with a role of both base composition and intron size in gene regulation mediated by nucleosome positioning or chromatin conformation, as previously proposed <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B23">23</abbr></abbrgrp>. In agreement with this view, it has recently been shown <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> that a considerable amount of human intronic sequence is weakly selected, possibly due to its functioning in chromatin structure and transcription regulation.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>A possible caveat of the data we report here concerns the accuracy of recombination rate measures; the data we used derive from HapMap and refer to crossover rates (and not gene conversion rates); evidences have suggested that, although crossovers and conversions arise from the same recombination-initiating events <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, the ratio of conversions to crossovers can vary among hotspots <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>. It is therefore possible that correction for recombination rates leaves a residual; still, there is no a priori reason to expect the residual error to be skewed depending on background GC content. Also, as stated above, analysis of substitution rates in GC-poor vs GC-rich regions do not parallel rates in low-vs high-recombining regions, which would be expected if the same effect (i.e. BGC) were operating in both comparisons. Given this premise and taking into account the analysis of polymorphic repeat insertion and intron GC content distribution, we consider that the more parsimonious explanation for our results is that GC content is subjected to the action of both weak selection and BGC in the human genome with features such as nucleosome positioning or chromatin conformation possibly representing the final target of selective processes. This view might reconcile previous contrasting findings <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp> and add some theoretical background to recent evidences suggesting that GC content domains display different behaviors with respect to highly regulated biological processes such as developmentally-stage related gene expression <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> and programmed replication timing during neural stem cell differentiation <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>UP conceived and designed the study, and wrote the paper; GM retrieved data and performed analyzes concerning allele frequency spectra; MF retrieved data and performed analyzes concerning substitution rates; MC retrieved data and performed analyzes concerning retrotransposon insertion polymorphisms; GPC and NB coordinated the study; RC analyzed the relationship between intron size and GC content; MS conceived and designed the study, and wrote the paper.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We wish to thank Roberto Giorda for discussion and helpful comments.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The mosaic genome of warm-blooded vertebrates</p>
            </title>
            <aug>
               <au>
                  <snm>Bernardi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Olofsson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Filipski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zerial</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Salinas</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cuny</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Meunier-Rotival</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rodier</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1985</pubdate>
            <volume>228</volume>
            <fpage>953</fpage>
            <lpage>958</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.4001930</pubid>
                  <pubid idtype="pmpid" link="fulltext">4001930</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Initial sequencing and analysis of the human genome</p>
            </title>
            <aug>
               <au>
                  <cnm>International Human Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>860</fpage>
            <lpage>921</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35057062</pubid>
                  <pubid idtype="pmpid" link="fulltext">11237011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The evolution of isochores</p>
            </title>
            <aug>
               <au>
                  <snm>Eyre-Walker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>549</fpage>
            <lpage>555</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35080577</pubid>
                  <pubid idtype="pmpid" link="fulltext">11433361</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>A new perspective on isochore evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Eyre-Walker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Galtier</snm>
                  <fnm>NA</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2006</pubdate>
            <volume>385</volume>
            <fpage>71</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2006.04.030</pubid>
                  <pubid idtype="pmpid" link="fulltext">16971063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>GC-content evolution in mammalian genomes: the biased gene conversion hypothesis</p>
            </title>
            <aug>
               <au>
                  <snm>Galtier</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Piganeau</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2001</pubdate>
            <volume>159</volume>
            <fpage>907</fpage>
            <lpage>911</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1461818</pubid>
                  <pubid idtype="pmpid" link="fulltext">11693127</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Biased gene conversion: implications for genome and sex evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Marais</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>330</fpage>
            <lpage>338</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(03)00116-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">12801726</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Different base/base mispairs are corrected with different efficiencies and specificities in monkey kidney cells</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Jiricny</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1988</pubdate>
            <volume>54</volume>
            <fpage>705</fpage>
            <lpage>711</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(88)80015-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">2842064</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Birdsell</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>1181</fpage>
            <lpage>1197</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12082137</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The evolution of isochores: evidence from SNP frequency distributions</p>
            </title>
            <aug>
               <au>
                  <snm>Lercher</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>NG</fnm>
               </au>
               <au>
                  <snm>Eyre-Walker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2002</pubdate>
            <volume>162</volume>
            <fpage>1805</fpage>
            <lpage>1810</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1462390</pubid>
                  <pubid idtype="pmpid" link="fulltext">12524350</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Recombination explains isochores in mammalian genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Montoya-Burgos</snm>
                  <fnm>JI</fnm>
               </au>
               <au>
                  <snm>Boursot</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Galtier</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>128</fpage>
            <lpage>130</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(03)00021-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">12615004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Gene conversion and GC-content evolution in mammalian Hsp70</p>
            </title>
            <aug>
               <au>
                  <snm>Kudla</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Helwak</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lipinski</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>1438</fpage>
            <lpage>1444</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh146</pubid>
                  <pubid idtype="pmpid" link="fulltext">15084682</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Recombination drives the evolution of GC-content in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Meunier</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>984</fpage>
            <lpage>990</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh070</pubid>
                  <pubid idtype="pmpid" link="fulltext">14963104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Male-driven biased gene conversion governs the evolution of base composition in human alu repeats</p>
            </title>
            <aug>
               <au>
                  <snm>Webster</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>NG</fnm>
               </au>
               <au>
                  <snm>Hultin-Rosenberg</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Arndt</snm>
                  <fnm>PF</fnm>
               </au>
               <au>
                  <snm>Ellegren</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>1468</fpage>
            <lpage>1474</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi136</pubid>
                  <pubid idtype="pmpid" link="fulltext">15772377</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The influence of recombination on human genetic diversity</p>
            </title>
            <aug>
               <au>
                  <snm>Spencer</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Deloukas</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mullikin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Silverman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Donnelly</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bentley</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>McVean</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e148</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1575889</pubid>
                  <pubid idtype="pmpid" link="fulltext">17044736</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0020148</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Bendable genes of warm-blooded vertebrates</p>
            </title>
            <aug>
               <au>
                  <snm>Vinogradov</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>2195</fpage>
            <lpage>2200</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11719569</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Isochores and tissue-specificity</p>
            </title>
            <aug>
               <au>
                  <snm>Vinogradov</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>5212</fpage>
            <lpage>5220</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">212799</pubid>
                  <pubid idtype="pmpid" link="fulltext">12930973</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg699</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <aug>
               <au>
                  <snm>Bernardi</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Structural and evolutionary genomics. Natural Selection in Genome Evolution</source>
            <publisher>Amsterdam: Elsevier</publisher>
            <pubdate>2004</pubdate>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Noncoding DNA, isochores and gene expression: nucleosome formation potential</p>
            </title>
            <aug>
               <au>
                  <snm>Vinogradov</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>559</fpage>
            <lpage>563</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">548339</pubid>
                  <pubid idtype="pmpid" link="fulltext">15673716</pubid>
                  <pubid idtype="doi">10.1093/nar/gki184</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Dualism of gene GC content and CpG pattern in regard to expression in the human genome: magnitude versus breadth</p>
            </title>
            <aug>
               <au>
                  <snm>Vinogradov</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>639</fpage>
            <lpage>643</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.09.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">16202472</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>An isochore map of human chromosomes</p>
            </title>
            <aug>
               <au>
                  <snm>Costantini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Clay</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Auletta</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bernardi</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>536</fpage>
            <lpage>541</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1457033</pubid>
                  <pubid idtype="pmpid" link="fulltext">16597586</pubid>
                  <pubid idtype="doi">10.1101/gr.4910606</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Differentiation-induced replication-timing changes are restricted to AT-rich/long interspersed nuclear element (LINE)-rich isochores</p>
            </title>
            <aug>
               <au>
                  <snm>Hiratani</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Leskovar</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>16861</fpage>
            <lpage>16866</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">534734</pubid>
                  <pubid idtype="pmpid" link="fulltext">15557005</pubid>
                  <pubid idtype="doi">10.1073/pnas.0406687101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Developmental stage related patterns of codon usage and genomic GC content: searching for evolutionary fingerprint by models of stem cell differentiation</p>
            </title>
            <aug>
               <au>
                  <snm>Ren</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gao</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ding</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Luo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Deng</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>R35</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1868930</pubid>
                  <pubid idtype="pmpid" link="fulltext">17349061</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-3-r35</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>DNA helix: the importance of being GC-rich</p>
            </title>
            <aug>
               <au>
                  <snm>Vinogradov</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>1838</fpage>
            <lpage>1844</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">152811</pubid>
                  <pubid idtype="pmpid" link="fulltext">12654999</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg296</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance</p>
            </title>
            <aug>
               <au>
                  <snm>Semon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2005</pubdate>
            <volume>14</volume>
            <fpage>421</fpage>
            <lpage>427</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/ddi038</pubid>
                  <pubid idtype="pmpid" link="fulltext">15590696</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>High guanine and cytosine content increases mRNA levels in mammalian cells</p>
            </title>
            <aug>
               <au>
                  <snm>Kudla</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lipinski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Caffin</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Helwak</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zylicz</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <fpage>e180</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1463026</pubid>
                  <pubid idtype="pmpid" link="fulltext">16700628</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0040180</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The UCSC genome annotation database</p>
            </title>
            <url>http://genome.ucsc.edu</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Intron size in mammals: complexity comes to terms with economy</p>
            </title>
            <aug>
               <au>
                  <snm>Pozzoli</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Menozzi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Comi</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Cagliani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bresolin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sironi</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>20</fpage>
            <lpage>24</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2006.10.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">17070957</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The HapMap web site</p>
            </title>
            <url>http://www.hapmap.org/</url>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Karro</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Yan</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Carriero</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Cayting</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>D55</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1669708</pubid>
                  <pubid idtype="pmpid" link="fulltext">17099229</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl851</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Pseudogene.org</p>
            </title>
            <url>http://www.pseudogene.org/</url>
         </bibl>
         <bibl id="B31">
            <title>
               <p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308517</pubid>
                  <pubid idtype="pmpid" link="fulltext">7984417</pubid>
                  <pubid idtype="doi">10.1093/nar/22.22.4673</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels</p>
            </title>
            <aug>
               <au>
                  <snm>Frisse</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hudson</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Bartoszewicz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wall</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Donfack</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Di Rienzo</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Am J Hum Genet</source>
            <pubdate>2001</pubdate>
            <volume>69</volume>
            <fpage>831</fpage>
            <lpage>843</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1226068</pubid>
                  <pubid idtype="pmpid" link="fulltext">11533915</pubid>
                  <pubid idtype="doi">10.1086/323612</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The Seattle SNP database</p>
            </title>
            <url>http://pga.gs.washington.edu/</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Song</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Grover</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Azrak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Batzer</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Hum Mutat</source>
            <pubdate>2006</pubdate>
            <volume>27</volume>
            <fpage>323</fpage>
            <lpage>329</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1855216</pubid>
                  <pubid idtype="pmpid" link="fulltext">16511833</pubid>
                  <pubid idtype="doi">10.1002/humu.20307</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Repbase Update, a database of eukaryotic repetitive elements</p>
            </title>
            <aug>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kapitonov</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Pavlicek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Klonowski</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kohany</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Walichiewicz</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cytogenet Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>110</volume>
            <fpage>462</fpage>
            <lpage>467</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1159/000084979</pubid>
                  <pubid idtype="pmpid" link="fulltext">16093699</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The Repbase Update</p>
            </title>
            <url>http://www.girinst.org/repbase/update/</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>MAVID: Constrained Ancestral Alignment of Multiple Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Bray</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <volume>14</volume>
            <fpage>693</fpage>
            <lpage>699</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">383315</pubid>
                  <pubid idtype="pmpid" link="fulltext">15060012</pubid>
                  <pubid idtype="doi">10.1101/gr.1960404</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Identification and measurement of neighbor-dependent nucleotide substitution processes</p>
            </title>
            <aug>
               <au>
                  <snm>Arndt</snm>
                  <fnm>PF</fnm>
               </au>
               <au>
                  <snm>Hwa</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>2322</fpage>
            <lpage>2328</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti376</pubid>
                  <pubid idtype="pmpid" link="fulltext">15769841</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <url>http://evogen.molgen.mpg.de/</url>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The R project</p>
            </title>
            <url>http://www.r-project.org/</url>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Robust locally weighted regression and smoothing scatterplots</p>
            </title>
            <aug>
               <au>
                  <snm>Cleveland</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>J Amer Statist Assoc</source>
            <pubdate>1979</pubdate>
            <volume>74</volume>
            <fpage>829</fpage>
            <lpage>836</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2286407</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>A haplotype map of the human genome</p>
            </title>
            <aug>
               <au>
                  <cnm>International HapMap Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <fpage>1299</fpage>
            <lpage>1320</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1880871</pubid>
                  <pubid idtype="pmpid" link="fulltext">16255080</pubid>
                  <pubid idtype="doi">10.1038/nature04226</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Vanishing GC-rich isochores in mammalian genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Semon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Piganeau</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Galtier</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2002</pubdate>
            <volume>162</volume>
            <fpage>1837</fpage>
            <lpage>1847</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1462357</pubid>
                  <pubid idtype="pmpid" link="fulltext">12524353</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Fine-scale recombination patterns differ between chimpanzees and humans</p>
            </title>
            <aug>
               <au>
                  <snm>Ptak</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hinds</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Koehler</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nickel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Patil</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Ballinger</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Przeworski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Paabo</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2005</pubdate>
            <volume>37</volume>
            <issue>4</issue>
            <fpage>429</fpage>
            <lpage>34</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng0405-445</pubid>
                  <pubid idtype="pmpid" link="fulltext">15723063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Comparison of fine-scale recombination rates in humans and chimpanzees</p>
            </title>
            <aug>
               <au>
                  <snm>Winckler</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Richter</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Onofrio</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Bontrop</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>McVean</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Gabriel</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Reich</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Donnelly</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Altshuler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>308</volume>
            <fpage>107</fpage>
            <lpage>111</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1105322</pubid>
                  <pubid idtype="pmpid" link="fulltext">15705809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>A unification of mosaic structures in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Lercher</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Urrutia</snm>
                  <fnm>AO</fnm>
               </au>
               <au>
                  <snm>Pavlicek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2003</pubdate>
            <volume>12</volume>
            <fpage>2411</fpage>
            <lpage>2415</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/ddg251</pubid>
                  <pubid idtype="pmpid" link="fulltext">12915446</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Integration site preferences of the Alu family and similar repetitive DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Daniels</snm>
                  <fnm>GR</fnm>
               </au>
               <au>
                  <snm>Deininger</snm>
                  <fnm>PL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1985</pubdate>
            <volume>13</volume>
            <fpage>8939</fpage>
            <lpage>8954</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">318963</pubid>
                  <pubid idtype="pmpid" link="fulltext">3001654</pubid>
                  <pubid idtype="doi">10.1093/nar/13.24.8939</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition</p>
            </title>
            <aug>
               <au>
                  <snm>Feng</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Moran</snm>
                  <fnm>JV</fnm>
               </au>
               <au>
                  <snm>Kazazian</snm>
                  <fnm>HH</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Boeke</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1996</pubdate>
            <volume>87</volume>
            <fpage>905</fpage>
            <lpage>916</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)81997-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">8945517</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Retroelement distributions in the human genome: variations associated with age and proximity to genes</p>
            </title>
            <aug>
               <au>
                  <snm>Medstrand</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>van de Lagemaat</snm>
                  <fnm>LN</fnm>
               </au>
               <au>
                  <snm>Mager</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1483</fpage>
            <lpage>1495</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187529</pubid>
                  <pubid idtype="pmpid" link="fulltext">12368240</pubid>
                  <pubid idtype="doi">10.1101/gr.388902</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>IsoFinder: computational prediction of isochores in genome sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Oliver</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Carpena</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hackenberg</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bernaola-Galv&#225;n</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>W287</fpage>
            <lpage>92</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">441537</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215396</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh399</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>A test of whether selection maintains isochores using sites polymorphic for Alu and L1 element insertions</p>
            </title>
            <aug>
               <au>
                  <snm>Belle</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Eyre-Walker</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2002</pubdate>
            <volume>160</volume>
            <fpage>815</fpage>
            <lpage>817</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1461991</pubid>
                  <pubid idtype="pmpid">11898794</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Recently integrated Alu retrotransposons are essentially neutral residents of the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Cordaux</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dinoso</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Batzer</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2006</pubdate>
            <volume>373</volume>
            <fpage>138</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2006.01.020</pubid>
                  <pubid idtype="pmpid" link="fulltext">16527433</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Human SNP variability and mutation rate are higher in regions of high recombination</p>
            </title>
            <aug>
               <au>
                  <snm>Lercher</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>337</fpage>
            <lpage>340</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)02669-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">12127766</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Compositional evolution of noncoding DNA in the human and chimpanzee genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Webster</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>NG</fnm>
               </au>
               <au>
                  <snm>Ellegren</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <fpage>278</fpage>
            <lpage>286</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msg037</pubid>
                  <pubid idtype="pmpid" link="fulltext">12598695</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>"Genome design" model: evidence from conserved intronic sequence in human-mouse comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Vinogradov</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>347</fpage>
            <lpage>54</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1415212</pubid>
                  <pubid idtype="pmpid" link="fulltext">16461636</pubid>
                  <pubid idtype="doi">10.1101/gr.4318206</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Factors influencing recombination frequency and distribution in a human meiotic crossover hotspot</p>
            </title>
            <aug>
               <au>
                  <snm>Jeffreys</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Neumann</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2005</pubdate>
            <volume>14</volume>
            <fpage>2277</fpage>
            <lpage>2287</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/ddi232</pubid>
                  <pubid idtype="pmpid" link="fulltext">15987698</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Intense and highly localized gene conversion activity in human meiotic crossover hot spots</p>
            </title>
            <aug>
               <au>
                  <snm>Jeffreys</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>May</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2004</pubdate>
            <volume>36</volume>
            <fpage>151</fpage>
            <lpage>156</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1287</pubid>
                  <pubid idtype="pmpid" link="fulltext">14704667</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

