<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-8-418</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Association of the Matrix Attachment Region Recognition Signature with coding regions in <it>Caenorhabditis elegans</it></p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Anthony</snm>
               <fnm>Alasdair</fnm>
               <insr iid="I1"/>
               <email>al.anthony@ed.ac.uk</email>
            </au>
            <au id="A2">
               <snm>Blaxter</snm>
               <fnm>Mark</fnm>
               <insr iid="I1"/>
               <email>mark.blaxter@ed.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, UK</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>418</fpage>
         <url>http://www.biomedcentral.com/1471-2164/8/418</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18005410</pubid>
               <pubid idtype="doi">10.1186/1471-2164-8-418</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>04</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>15</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>15</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Anthony and Blaxter; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Matrix attachment regions (MAR) are the sites on genomic DNA that interact with the nuclear matrix. There is increasing evidence for the involvement of MAR in regulation of gene expression. The unsuitability of experimental detection of MAR for genome-wide analyses has led to the development of computational methods of detecting MAR. The MAR recognition signature (MRS) has been reported to be associated with a significant fraction of MAR in <it>C. elegans </it>and has also been found in MAR from a wide range of other eukaryotes. However the effectiveness of the MRS in specifically and sensitively identifying MAR remains unresolved.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Using custom software, we have mapped the occurrence of MRS across the entire <it>C. elegans </it>genome. We find that MRS have a distinctive chromosomal distribution, in which they appear more frequently in the gene-rich chromosome centres than in arms. Comparison to distributions of MRS estimated from chromosomal sequences randomised using mono-, di- tri- and tetra-nucleotide frequency patterns showed that, while MRS are less common in real sequence than would be expected from nucleotide content alone, they are more frequent than would be predicted from short-range nucleotide structure. In comparison to the rest of the genome, MRS frequency was elevated in 5' and 3' UTRs, and striking peaks of average MRS frequency flanked <it>C. elegans </it>coding sequence (CDS). Genes associated with MRS were significantly enriched for receptor activity annotations, but not for expression level or other features.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Through a genome-wide analysis of the distribution of MRS in <it>C. elegans </it>we have shown that they have a distinctive distribution, particularly in relation to genes. Due to their association with untranslated regions, it is possible that MRS could have a post-transcriptional role in the control of gene expression. A role for MRS in nuclear scaffold attachment is not supported by these analyses.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>As genome sequencing and annotation has progressed, it has become clear that even relatively compact eukaryotic genomes have large amounts of non-coding DNA. This DNA harbours elements that control genomic activity such as gene regulators, non-coding RNAs and less well characterised elements that position the chromosomes on the nuclear matrix. The nuclear matrix forms a three dimensional protein network onto which chromatin fibres are attached. Interaction between chromatin and the nuclear matrix is believed to occur at specific sites from 300 bp to several kb long, termed matrix attachment regions (MAR) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
         <p>There is increasing evidence for the involvement of MAR in gene regulation. For example, expression levels of some genes alter depending on their position relative to the matrix <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. MAR have also been associated with enhanced transcription, notably in transgene constructs where flanking transgenes with MAR results in higher and more stable expression (for review see <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>). A role for MAR as a boundary between functional chromatin domains has been proposed <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. The effects of long-range enhancers may be restricted by the positioning of MAR <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. MAR have also been implicated in the positioning of chromosomal territories <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B7">7</abbr></abbrgrp>. Coordinated spatial positioning of sequences on different chromosomes can facilitate interactions <it>in trans</it>. For example, active genes from different chromosomes have been shown to migrate through the nuclear space to converge on "transcriptional factories" <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Localisation of genes in this way is likely to involve control of higher order chromosome structure and there is evidence that some chromatin loop attachments are under developmental control <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <p>Experimentally, MAR have been defined as either DNA fragments that remain bound to the nuclear matrix after chromatin proteins and other DNA have been removed, or DNA that binds to extracted nuclear matrix in the presence of competitor DNA <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. The most common experimental method for identifying MAR uses re-association assays to define DNA fragments that bind to the nuclear matrix <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. However, as experimental methods are poorly amenable to genome wide analysis, computational methods have been sought for identifying MAR.</p>
         <p>MAR-associated sequences for approximately 500 experimentally defined MAR are catalogued in the S/MAR transaction Database <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The overriding feature of many MAR is that they are AT rich, but several other more specific sequence motifs have also been identified <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. MAR sequences also show elevated DNA unwinding potential, through stress-induced DNA duplex destabilisation (SIDD) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Computational tools based on these sequence characteristics have been used to identify MAR using DNA sequence information alone. MARfinder uses 20 motifs within a set of higher order rules. The density of rule occurrences is then used to identify MAR <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. SMARTest is based on a density analysis of a set of MAR sequences represented by position weight matrices <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. An <it>in silico</it>, genome-wide mapping of MAR in <it>Arabidopsis thaliana </it>using SMARTest revealed that genes containing predicted MARs had low transcription levels <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. SIDD identifies putative MAR based on the predicted sites of stress-induced DNA duplex destabilisation <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. ChrClass uses multivariate linear discriminant analysis to compare MAR sequences and develop a classification system <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The limited effectiveness of these methods in reliably identifying MAR is discussed in a recent comparative study of MAR prediction software <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
         <p>The most complex motif associated with MAR sequences is the bipartite MAR recognition signature (MRS). The MRS was identified through analysis of MAR from three independent genomic regions of >30 kb in <it>A. thaliana </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. To assess the effectiveness of the MRS, van Drunen <it>et al</it>. mapped all the MRS and experimentally detected MAR on a single <it>C. elegans </it>genomic DNA segment, ~40 kb long. All MRS were located in six of the seven MAR sites. Further analysis of >300 kb of genomic sequence from 7 other eukaryotic organisms showed that MRS were present in 80% of MAR, leading van Drunen <it>et al</it>. to suggest that the MRS was a specific sequence element representative of a subset of MAR <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
         <p>Donev <it>et al</it>. used the MRS to identify novel MAR in the human major histocompatibility complex class II region <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The regions they identified were found to bind the nuclear matrix and a subset were also able to bind the mRNA processing protein hnRNP-A1 during transcriptional up-regulation of nearby genes. The MRS has also been used to identify MAR in <it>Entamoeba histolytica </it>and was found in MAR from <it>Bombyx mori </it><abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. However, MAR mapping studies in mammals have shown that MRS are sometimes identified outside known MAR sites <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. In their analysis of 1 Mb of the mouse genome, Purbowasito <it>et al</it>. reported that MAR prediction based on MRS had a specificity of 41%, with 29 of 49 predictions lying outside experimentally defined MAR <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. There is, therefore, some doubt as to the effectiveness of the MRS as a marker for MAR.</p>
         <p>We have undertaken a genome-wide mapping and analysis of MRS in <it>C. elegans </it>in an attempt to determine the validity of the MRS. If MRS constitute a feature with real biological meaning their distribution would be expected to be non-random with respect to other genome features. We found that the MRS signature had a distinctive pattern of distribution along chromosomes, similar to that of genes. Further, we show that there is a marked increase in the frequency of MRS in the regions flanking <it>C. elegans </it>coding sequence (CDS).</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>The MRS is a degenerate bipartite motif consisting of a 16 bp pattern, AWWRTAANNWWGNNNC (where W = A or T, R = A or G, N = A,C,G or T), within which one mismatch is allowed, and an 8 bp pattern, AATAAYAA (where Y = C or T) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. To be scored as an MRS, both these sequences must lie within 200 bp of each other, although they may overlap and they may be on either strand of the DNA duplex <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Existing MRS finding programs were designed to under-report closely apposed MRS <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. To allow full control over data reported, a custom program, MRSfinder, was designed. MRSfinder was used to map the location of MRS across the entire <it>C.elegans </it>genome.</p>
         <p>MRS were found across all 6 <it>C.elegans </it>chromosomes at an average frequency of 249 per Mb, similar to the frequency of genes (228 per Mb). At small scales (&lt;100 kb), the motif distribution was noisy (see Additional File <supplr sid="S1">1</supplr>). As would be expected of an AT-rich motif, there was some correlation with regions of high AT% (see below).</p>
         <suppl id="S1">
            <title>
               <p>Additional file 1</p>
            </title>
            <text>
               <p>Distribution of genes and MRS in <it>C.elegans </it>chromosomes at window sizes of 100 kb and 500 kb. Number of gene (black) and MRS (red) start positions in non-overlapping 100 kb and 500 kb windows. To account for short sequence length in the end window, the number of genes and MRS in the last window was scaled.</p>
            </text>
            <file name="1471-2164-8-418-S1.png">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p>However, at a chromosomal level distinct patterns emerged. Analyses of non-overlapping 2 Mb windows along the chromosomes showed that MRS were significantly more abundant in the centres than in the arms of all chromosomes except chromosome IV (Figure <figr fid="F1">1</figr> and Additional File <supplr sid="S2">2</supplr>). The division between chromosome arms and centres is characteristic of several genomic features in <it>C. elegans</it>. Centres tend to be gene rich, with a high concentration of essential, well conserved and highly expressed genes <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. By comparison, the chromosome arms exhibit a higher meiotic recombination rate, and are enriched for transposons and repeats <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Thus, at the chromosome level, MRS are more likely to be found in the vicinity of highly expressed and essential genes.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Distribution of genes and MRS in <it>C. elegans </it>chromosomes</p>
            </caption>
            <text>
               <p><b>Distribution of genes and MRS in <it>C. elegans </it>chromosomes</b>. Number of gene (black) and MRS (red) start positions in non-overlapping 2 Mb windows. To account for short sequence length in the end window, the number of genes and MRS in the last window has been scaled to 2 Mb.</p>
            </text>
            <graphic file="1471-2164-8-418-1"/>
         </fig>
         <suppl id="S2">
            <title>
               <p>Additional file 2</p>
            </title>
            <text>
               <p>Correlation between MRS frequency and distance to centre of chromosome. Each 2 Mb chromosome window was given a number based on its distance from the centre of the chromosome. The windows at the far ends chromosome were assigned 1, the next windows towards the chromosome centre were assigned 2 and so on until all windows had been assigned a number. The correlation between the MRS frequency in each window and its number was then calculated using Pearson's r correlation coefficient.</p>
            </text>
            <file name="1471-2164-8-418-S2.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <sec>
            <st>
               <p>MRS frequency in real sequence is different to that in randomised sequence</p>
            </st>
            <p>Although the distribution of MRS appeared to correlate broadly with several other genome features, the specific nucleotide composition of each sequence window will influence the number of MRS. By randomising the genome sequence whilst maintaining nucleotide composition (mononucleotide randomisation), we estimated the number of MRS expected in the sequence due to nucleotide composition alone. Additional randomisation models were used in order to account for relationships between adjacent bases. The mononucleotide randomisation model generated sequence in which the frequency of each of the four nucleotides matched that observed in the chromosomal sequence. More complex first, second and third order Markov chain randomisation processes reflected the di-, tri- and tetra-nucleotide content of the chromosomal sequence. For each 2 Mb non-overlapping window used in Figure <figr fid="F1">1</figr>, the nucleotide sequence was randomised 1000 times, and MRSfinder was used to map and count the number of MRS in each randomised sequence. A comparison of MRS counts for chromosome I under each randomisation process is shown in Figure <figr fid="F2">2</figr> (results for second order Markov chain randomisation of the other chromosomes can be found in Additional file <supplr sid="S3">3</supplr>). The observed number of MRS in mononucleotide randomised sequence was similar to that found in real sequence, while the first, second and third order Markov chain randomised sequence yielded far fewer MRS. As MRS occurrence was best modelled by the mononucleotide randomisation process, subsequent analyses focussed on this method of randomisation.</p>
            <p>Figure <figr fid="F3">3</figr> shows the difference in observed MRS count for each 2 Mb window from the mean count in the mononucleotide randomised sequences, in terms of standard deviations from the mean. Throughout the length of each chromosome, the number of MRS in real sequence was generally lower than in the mononucleotide randomised sequence. The arms were particularly poor in MRS and the chromosome centres were at most only slightly enriched for MRS. In contrast to the autosomes, the distribution of MRS along chromosome X (Figure <figr fid="F3">3</figr>, broken line) was much more even and similar to that found in mononucleotide randomised chromosome X sequence.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>MRS in second order Markov chain randomised chromosome I, II, III, IV, V and X. The chromosomes were randomised in non-overlapping 2 Mb windows using a second order Markov chain process. The average number of MRS over 1000 randomisations (+/- one standard deviation) in the 2 Mb windows (black) is compared with the number of MRS in real sequence (red).</p>
               </text>
               <file name="1471-2164-8-418-S3.png">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Comparison of MRS distribution in <it>C. elegans </it>chromosome I under various randomisations</p>
               </caption>
               <text>
                  <p><b>Comparison of MRS distribution in <it>C. elegans </it>chromosome I under various randomisations</b>. The number of MRS in non-overlapping 2 Mb windows in real <it>C. elegans </it>chromosome I sequence is shown in red. The chromosome was randomised in non-overlapping 2 Mb sections using four different Markov chain processes. The average number of MRS +/- one standard deviation for the 2 Mb windows for zero (mononucleotide, black), first (orange), second (green) and third (blue) order Markov chain process randomisation is shown.</p>
               </text>
               <graphic file="1471-2164-8-418-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Distribution of MRS along <it>C. elegans </it>chromosomes, relative to average number of MRS in chromosome sequence randomised in 2 Mb sections</p>
               </caption>
               <text>
                  <p><b>Distribution of MRS along <it>C. elegans </it>chromosomes, relative to average number of MRS in chromosome sequence randomised in 2 Mb sections</b>. The sequence of each chromosome was randomised using a mononucleotide process in non-overlapping sections of 2 Mb, MRS were then mapped in this sequence using MRSfinder. This was repeated 1000 times and the average and standard deviation of MRS frequency in the 2 Mb sections was obtained. This graph shows the distribution of MRS in actual <it>C. elegans </it>sequence, as the number of standard deviations from the mean MRS frequency in the randomised sequence.</p>
               </text>
               <graphic file="1471-2164-8-418-3"/>
            </fig>
            <p>One effect of randomising the genome sequence in relatively large sections of 2 Mb is that nucleotide content (or nucleotide local pattern) becomes more uniform across each section, eliminating, for example, local peaks of very high AT%. To identify the effects of local areas of extreme nucleotide composition, mononucleotide randomisation was applied to smaller sections of sequence (10 bp, 100 bp, 1 kb, 50 kb, 2 Mb and the whole chromosome length) to <it>C. elegans </it>chromosome I. The number of MRS found in the whole chromosome under each mononucleotide randomisation regime, averaged over 1000 iterations, is shown in Figure <figr fid="F4">4</figr>. The numbers of MRS found when the chromosome was randomised along its entire length in one section and in 50 kb sections were very similar to the 2 Mb randomised sequence (about 10% higher than in the actual sequence). However, at randomisation sections of less than 50 kb the total number of MRS found rose dramatically. A similar effect was observed in the second order Markov chain process randomised sequence (data not shown). Compared to actual genomic sequence, the average number of MRS observed in mononucleotide randomised sequence doubled when the chromosome was randomised in sections of 10 bp.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Frequency of MRS in <it>C. elegans </it>chromosome I and various randomisations of it</p>
               </caption>
               <text>
                  <p><b>Frequency of MRS in <it>C. elegans </it>chromosome I and various randomisations of it</b>. Chromosome I was randomised using a mononucleotide process in non-overlapping sections of various lengths 10 bp, 100 bp, 1 kb, 50 kb, 2 Mb and the entire length of the chromosome, and MRSfinder used to identify MRS in each sequence. The randomisation and MRS mapping was repeated 1000 times for each section length. The bar height shows the average number of MRS in the chromosome and the error bars represent +/- 1 standard deviation. The actual number of MRS in <it>C. elegans </it>chromosome I is shown for comparison.</p>
               </text>
               <graphic file="1471-2164-8-418-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Different genome feature types have different MRS frequencies</p>
            </st>
            <p>The above results show that the number and distribution of MRS in the <it>C. elegans </it>genome is distinct from that found in random sequence. To investigate how this distribution is related to other genome features, the degree of overlap between MRS and different functional parts of the genome was assessed. The number of MRS occupying the same genome space as exons, introns, 3' untranslated regions (UTR), 5' UTR, genes and intergenic regions, is given in Table <tblr tid="T1">1</tblr>. The expected score indicates how many MRS would be expected to lie in a feature, based on the total size of the feature and assuming a uniform distribution of MRS across the genome.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Number of MRS in genic and non-genic portions of the genome.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>genes</p>
                     </c>
                     <c ca="center">
                        <p>exons</p>
                     </c>
                     <c ca="center">
                        <p>introns</p>
                     </c>
                     <c ca="center">
                        <p>5' UTR</p>
                     </c>
                     <c ca="center">
                        <p>3' UTR</p>
                     </c>
                     <c ca="center">
                        <p>intergenic</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Size of feature (bp)</p>
                     </c>
                     <c ca="center">
                        <p>58734823</p>
                     </c>
                     <c ca="center">
                        <p>25497325</p>
                     </c>
                     <c ca="center">
                        <p>30586607</p>
                     </c>
                     <c ca="center">
                        <p>456649</p>
                     </c>
                     <c ca="center">
                        <p>1616413</p>
                     </c>
                     <c ca="center">
                        <p>41740777</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of features in genome</p>
                     </c>
                     <c ca="center">
                        <p>18719</p>
                     </c>
                     <c ca="center">
                        <p>124049</p>
                     </c>
                     <c ca="center">
                        <p>100853</p>
                     </c>
                     <c ca="center">
                        <p>8293</p>
                     </c>
                     <c ca="center">
                        <p>9103</p>
                     </c>
                     <c ca="center">
                        <p>18832</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Feature AT%</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>60</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>66</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Actual number of MRS in feature</p>
                     </c>
                     <c ca="center">
                        <p>11368</p>
                     </c>
                     <c ca="center">
                        <p>1955</p>
                     </c>
                     <c ca="center">
                        <p>7094</p>
                     </c>
                     <c ca="center">
                        <p>139</p>
                     </c>
                     <c ca="center">
                        <p>691</p>
                     </c>
                     <c ca="center">
                        <p>12683</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Expected number of MRS in feature</p>
                     </c>
                     <c ca="center">
                        <p>14303</p>
                     </c>
                     <c ca="center">
                        <p>4218</p>
                     </c>
                     <c ca="center">
                        <p>5883</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>246</p>
                     </c>
                     <c ca="center">
                        <p>10070</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ratio (actual/expected)</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                     <c ca="center">
                        <p>1.21</p>
                     </c>
                     <c ca="center">
                        <p>4.22</p>
                     </c>
                     <c ca="center">
                        <p>2.81</p>
                     </c>
                     <c ca="center">
                        <p>1.26</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AT% corrected ratio (score system 1)</p>
                     </c>
                     <c ca="center">
                        <p>1.05</p>
                     </c>
                     <c ca="center">
                        <p>1.66</p>
                     </c>
                     <c ca="center">
                        <p>0.74</p>
                     </c>
                     <c ca="center">
                        <p>8.80</p>
                     </c>
                     <c ca="center">
                        <p>1.71</p>
                     </c>
                     <c ca="center">
                        <p>1.02</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AT% corrected ratio (score system 2)</p>
                     </c>
                     <c ca="center">
                        <p>1.09</p>
                     </c>
                     <c ca="center">
                        <p>1.83</p>
                     </c>
                     <c ca="center">
                        <p>0.64</p>
                     </c>
                     <c ca="center">
                        <p>2.08</p>
                     </c>
                     <c ca="center">
                        <p>1.34</p>
                     </c>
                     <c ca="center">
                        <p>1.03</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AT% corrected ratio (score system 3)</p>
                     </c>
                     <c ca="center">
                        <p>1.07</p>
                     </c>
                     <c ca="center">
                        <p>1.77</p>
                     </c>
                     <c ca="center">
                        <p>0.68</p>
                     </c>
                     <c ca="center">
                        <p>2.98</p>
                     </c>
                     <c ca="center">
                        <p>1.46</p>
                     </c>
                     <c ca="center">
                        <p>1.02</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AT correction factor</p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                     <c ca="center">
                        <p>0.28</p>
                     </c>
                     <c ca="center">
                        <p>1.64</p>
                     </c>
                     <c ca="center">
                        <p>0.48</p>
                     </c>
                     <c ca="center">
                        <p>1.64</p>
                     </c>
                     <c ca="center">
                        <p>1.24</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The number of MRS overlapping genes, exons, introns, 3' UTR, 5' UTR and intergenic regions of the genome was used to calculate an overlap score as described in Methods. The expected overlap score was calculated assuming a uniform distribution of MRS across the genome, using the formulae described in Methods. The ratio of the actual to expected score is shown. This ratio was multiplied by the AT correction factor (see Methods) to give the AT corrected ratio. Score system 1 was used except where indicated otherwise.</p>
               </tblfn>
            </tbl>
            <p>The ratios of actual and expected MRS numbers showed large differences in MRS abundance in each of the genome features. MRS were particularly rare in exons, which contained less than half the MRS expected. As a result, the number of MRS in genes was also lower than expected, despite enrichment for MRS in introns and untranslated regions. Intergenic regions had slightly more MRS than expected. However, the 5' UTR and 3' UTR were by far the most MRS-enriched parts of the genome, by factors of 4.2 and 2.8 respectively. The relative enrichment of introns, 5' UTR and 3' UTR for MRS provides an explanation for the spatial relationship between genes and MRS described in Figure <figr fid="F1">1</figr>.</p>
            <p>The MRS is AT rich and so is more likely to occur in AT rich sequence (see Additional File <supplr sid="S4">4</supplr>). To control for this bias, an AT-correction factor was used to adjust the expected number of MRS. The correction factor was based on the number of MRS found in mononucleotide random sequence with AT content equivalent to that of each feature, as a proportion of the number of MRS found in random sequence with AT content equivalent to that of the whole genome. When this correction is applied, the AT-poor exons appeared enriched for MRS, while the AT-rich introns had fewer than expected. Both genes and intergenic regions had approximately the number of MRS expected.</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>Number of MRS in random sequence of defined AT content. The number of MRS in 2 Mb of random sequence with AT content ranging from 90% to 50% was calculated. Random sequence for each AT value was generated 1000 times, error bars show +/- 1 standard deviation.</p>
               </text>
               <file name="1471-2164-8-418-S4.png">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>However, even with AT correction, the untranslated regions, particularly the 5' UTR, showed strong enrichment for MRS. Alternative overlap scoring systems that take into account partial MRS-feature overlaps did not affect these results. Although UTR form only a small part of the genome and contain only a small proportion of the total MRS, the degree of MRS enrichment and their proximity to genes points to a functional role for MRS.</p>
         </sec>
         <sec>
            <st>
               <p>Striking peaks of MRS and AT content at CDS boundaries</p>
            </st>
            <p>To clarify the relationship between genes, especially their 5' and 3' UTRs and MRS, the frequency of MRS in the regions surrounding gene boundaries was investigated. Using the data from MRSfinder, MRS locations were plotted on a section of sequence extending 1000 bp upstream of the translation start site (ATG codon) through the first 400 bp of the coding sequence (CDS) from each <it>C. elegans </it>gene. The same analysis was carried out on sequence from the last 400 bp of the CDS through to 1000 bp downstream of the stop codon (Figure <figr fid="F5">5A</figr>).</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>MRS distribution and AT content near genes in <it>C. elegans </it>(A) and <it>C. briggsae </it>(B)</p>
               </caption>
               <text>
                  <p><b>MRS distribution and AT content near genes in <it>C. elegans </it>(A) and <it>C. briggsae </it>(B)</b>. Average AT% in 50 bp (blue line) and 10 bp (red line) non-overlapping windows and number of MRS per CDS in 50 bp non-overlapping windows (black line) is displayed. The windows extend from 1000 bp upstream of the translation start site (ATG codon) through the first 400 bp of the CDS and from the last 400 bp of the CDS, through to 1000 bp downstream of the translation stop site (stop codon).</p>
               </text>
               <graphic file="1471-2164-8-418-5"/>
            </fig>
            <p>As expected from the overlap of MRS with genes and intergenic regions reported in Table <tblr tid="T1">1</tblr>, the frequency of MRS in regions outside the CDS was higher than in the CDS itself. The enrichment of MRS in the 5' and 3' UTRs shown in Table <tblr tid="T1">1</tblr> correlates with striking increases in MRS frequency in the regions immediately flanking genes. The MRS frequency sharply rose and fell over a span of 350 bp, peaking 50&#8211;100 bp upstream of the CDS start. At the 3' end of the CDS the MRS frequency spike had an even greater amplitude, increasing by more than 3 fold in 200 bp.</p>
            <p>One explanation for the MRS spikes bounding CDS is that they are related to AT content of these areas. For example, in the case of 3' UTR the apparent over-representation of MRS was reduced when AT content was taken into account (Table <tblr tid="T1">1</tblr>). Plotting AT content in the region surrounding CDS revealed a pattern of sharp spikes similar to that observed for MRS frequency (Figure <figr fid="F5">5</figr>). However, on closer inspection there were subtle differences between the MRS frequency and AT content variation. Firstly, the upstream AT peak occurred in the 50 bp immediately preceding the start codon, 50&#8211;100 bp after the MRS peak. Similarly at the downstream end, the AT peak occurred in the 50 bp immediately following the stop codon, again 50&#8211;100 bp separate from the MRS peak.</p>
            <p>Another difference was that the AT content dropped to 58% in the first 50 bp of the CDS, then rose to about 62% for the middle part of the CDS. The pattern was similar at the end of the CDS, where the AT dropped to near 58% in the last 50&#8211;100 bp. In both locations this AT dip was not matched by a dip in the MRS frequency. The variation in AT content in the vicinity of gene boundaries is an intriguing observation. A similar pattern was described previously by Zhang et al. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> but further discussion of this phenomenon is beyond the scope of this paper.</p>
            <p>An analysis of the MRS frequency surrounding gene boundaries was also performed on a related nematode, <it>Caenorhabditis briggsae </it>(Figure <figr fid="F5">5B</figr>). As in <it>C. elegans</it>, the frequency of MRS was higher in <it>C. briggsae </it>intergenic regions near genes than in CDS. However, from 1 kb upstream to 1 kb downstream of the CDS, the frequency of MRS was generally lower in <it>C. briggsae </it>than in <it>C. elegans</it>. The main difference in the pattern of MRS frequency between the species was that while <it>C. briggsae </it>displayed the same striking increase in average MRS frequency at the 3' end of the CDS, it lacked any increase in frequency at the 5' end. The possibility that less robust gene annotation in <it>C. briggsae </it>could have lead to this discrepancy was addressed by filtering the dataset to ensure all CDS started with ATG and ended with a stop codon, and that the selected sequence was complete and of high quality (i.e. no Ns). However, the possibility that the <it>C. briggsae </it>gene set is systematically lacking upstream exons cannot be excluded.</p>
            <p>The difference between MRS frequency and AT content is even more marked in <it>C. briggsae </it>than in <it>C. elegans</it>. Although <it>C. briggsae </it>lacked an upstream MRS peak, an increase in AT content from about 63% to 66% was evident in the 50 bp immediately preceding the CDS start. In common with <it>C. elegans</it>, the downstream AT peak occurred 50 bp before the MRS peak and the AT dip at the start and end of the CDS was not matched by a dip in MRS frequency.</p>
         </sec>
         <sec>
            <st>
               <p>MRS conservation between <it>C. elegans </it>and <it>C. briggsae</it></p>
            </st>
            <p>The distinctive increase in MRS frequency at the downstream end of both <it>C. elegans </it>and <it>C. briggsae </it>CDS could be due to conservation of MRS in specific genes, or simply a reflection of a general tendency. To investigate this, the occurrence of MRS within 200 bp of the CDS stop codon in <it>C. elegans </it>genes was compared to MRS occurrence in the same region of the corresponding <it>C. briggsae </it>ortholog (Table <tblr tid="T2">2</tblr>). Surprisingly, of the 224 <it>C. briggsae </it>genes annotated as orthologs of <it>C. elegans </it>genes with an MRS within 200 bp of the CDS stop codon, only 18 had an MRS in a similar position. Nonetheless, a small but significant degree of correlation between <it>C. elegans </it>genes and their <it>C. briggsae </it>orthologs for the presence or absence of MRS was detected (log odds ratio = 0.641, <it>p </it>value = 0.006). Therefore, the peak of average MRS frequency at the downstream end of <it>C. elegans </it>and <it>C. briggsae </it>CDS was due partly to apparent conservation of MRS in specific genes.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>MRS within 200 bp downstream of translation stop sites of <it>C. briggsae </it>orthologs of <it>C.elegans </it>genes.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Number of <it>C. elegans </it>genes in ortholog set</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MRS within 200 bp of CDS stop</p>
                     </c>
                     <c ca="center">
                        <p>No MRS within 200 bp of CDS stop</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center"/>
                     <c ca="left">
                        <p>MRS within 200 bp of CDS stop</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>172</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Number of <it>C. briggsae </it>genes in ortholog set</p>
                     </c>
                     <c/>
                     <c/>
                     <c cspan=""/>
                  </r>
                  <r>
                     <c cspan="">
                        <p/>
                     </c>
                     <c ca="left">
                        <p>No MRS within 200 bp of CDS stop</p>
                     </c>
                     <c ca="center">
                        <p>206</p>
                     </c>
                     <c ca="center">
                        <p>3736</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The filtered set of <it>C. elegans </it>and <it>C. briggsae </it>orthologs were assessed to identify the number of genes in the set from each organism that had an MRS within 200 bp of the CDS stop codon. The association between orthologs for the presence or absence of 3' MRS was significant (log odds 0.641, <it>p </it>value = 0.006).</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Functional classification of MRS associated genes</p>
            </st>
            <p>If the MRS is related to a <it>cis </it>regulatory function then the presence of an MRS near a gene may be associated with a particular functional group of genes. This possibility was examined by identifying the set of <it>C. elegans </it>genes with an MRS within 200 bp of the CDS stop codon, and searching for over-represented Gene Ontology (GO) terms within this set. The top most over-represented GO slim terms are shown in Figure <figr fid="F6">6</figr>. The most over-represented term was the molecular function "receptor activity": 89/509 genes in the MRS set had this annotation (17.5%) compared to 1122/9102 genes in the reference set (12.3%). None of the other terms were significantly over-represented after correction for multiple testing. Analyses were conducted to detect correlation of MRS-associated genes with other genomic and functional genomic features, including expression pattern (as determined by Serial Analysis of Gene Expression data) and position in operons, but no significant associations were obtained (data not shown).</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Over-represented GO terms for <it>C. elegans </it>genes with an MRS within 200 bp of CDS stop codon</p>
               </caption>
               <text>
                  <p><b>Over-represented GO terms for <it>C. elegans </it>genes with an MRS within 200 bp of CDS stop codon</b>. The log odds ratios and 95% confidence intervals (two-tailed test) for the top most over-represented GO slim terms for genes with an MRS within 200 bp of the CDS stop codon. The GO terms are split into three ontologies cellular component (CC), biological process (BP) and molecular function (MF). The number above the bar represents the <it>p </it>value. Only the term "receptor activity" was significant after correction for multiple testing.</p>
               </text>
               <graphic file="1471-2164-8-418-6"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>In describing and analysing MRS frequency in the genome of <it>C. elegans</it>, we have shown these sites to have a specific distribution, particularly in relation to genes. These observations support the validity of the MRS as a real genomic feature, though not necessarily indicative of MAR, and may also provide an insight to specific roles for MRS.</p>
         <p>At the chromosomal level, MRS density had features similar to that of protein-coding genes, with more MRS per kilobase in chromosome centres compared to arms. Chromosome X was distinct in having no such pattern in gene density, and MRS on the X also had a flat distribution. The MRS signature is AT rich, and thus some correlation with local AT% of the genome would be expected (see Additional File <supplr sid="S4">4</supplr>). We investigated whether the distribution of the MRS signature was merely a by-product of the local nucleotide content of the genome, and/or of the local content of di-, tri- and tetra-nucleotides. When genome sequence was randomised in 2 Mb sections, the frequency of MRS observed in the real chromosomal DNA was less than that predicted from simple (mononucleotide) randomisation, and approximately double that found in second and third order Markov model randomisations. Thus, we conclude that the distribution of the MRS signature in the <it>C. elegans </it>genome is not simply a product of small- or large-scale base-compositional biases. MRS frequency in some classes of genomic regions was elevated compared to the surrounding sequence. Coincidence of MRS and genes was apparent from their similar chromosomal distributions (as shown in Figure <figr fid="F1">1</figr>). By analysing the overlap of MRS with different functional parts of the genome, we found that MRS had relatively high incidence in the non-coding parts of genes, specifically 5' and 3' UTRs and introns. These results contrasts with experimental identification of a high incidence of MAR in intergenic and intronic regions, rather than UTRs. This suggests that MRS may not be representative of a large portion of MAR.</p>
         <p>There were striking peaks of average MRS frequency at the 3' and 5' ends of <it>C. elegans </it>CDS, which were distinct from similar peaks in average AT content in the same regions. Interestingly, the average MRS frequency surrounding <it>C. briggsae </it>CDS showed no peak at the 5' end, though the pattern of average AT content was very similar to <it>C. elegans</it>. However, the peak at the 3' end of CDS was maintained in <it>C. briggsae </it>and there was evidence for conservation of MRS in this region.</p>
         <p>Although <it>C. briggsae </it>orthologs of <it>C. elegans </it>genes that had 3' MRS were more likely also to have an MRS than were orthologs of genes that lacked an MRS, it was surprising that the MRS was conserved in only 10% of orthologs. It is possible that the MRS, as currently defined, does not accurately represent the potential functional element. The non-conserved MRS from both <it>C. elegans </it>and <it>C. briggsae </it>could represent a high 'false positive' rate, giving rise to a background level of MRS that masks the degree of conservation of the underlying functional element. Alternatively, the apparent low level of conservation of MRS could reflect rapid evolution of the MRS. The association of MRS with the start and stop of genes means they are in a position to influence the control of transcription. The over-representation of the GO term "receptor activity" in genes with an MRS near the 3' end was significant but small. However, if, as discussed above, the MRS does not accurately represent an underlying functional element and is subject to a high false positive rate, then the true degree of association with specific annotations may be underestimated. Efforts were made to correlate MRS-associated genes with other genomic and functional genomic features, including expression pattern (as determined by Serial Analysis of Gene Expression data) and position in operons, but no significant associations were obtained. The presence of MRS in <it>C. elegans </it>5' and 3' UTRs suggest that they may be transcribed and therefore also have a role in mRNA stability or translational control. The MRS is therefore an element that is perhaps of limited value in predicting MAR, but serves as a clear marker of some CDS boundaries.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We have carried out a genome-wide analysis of the distribution of MRS in <it>C. elegans</it>. Two distinct patterns of MRS frequency were identified. MRS were less frequent that would be predicted by nucleotide content but more frequent than predicted by di-, tri and tetra-nucleotide pattern. In comparison to the rest of the genome, there were striking peaks of average MRS frequency flanking <it>C. elegans </it>CDS. Although <it>C. briggsae </it>surprisingly lacked a peak in average MRS frequency upstream of CDS, <it>C. briggsae </it>orthologs showed conservation of MRS in the region immediately downstream of the CDS. The results presented here reveal the MRS to have a non-random genomic distribution, with particularly close association with genes. The results further suggest that, rather than acting as a marker for MAR, the MRS is an indicator of CDS, and may have role in control of gene expression.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>MRSfinder</p>
            </st>
            <p>The identification of MRS on a genome-wide scale was automated through the use of a custom perl program, MRSfinder. Using the description of the MRS given by van Drunen <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, MRSfinder locates all occurrences of the MRS in a given sequence in either orientation and reports their start and stop positions. The program is freely available <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p><it>C. elegans </it>genome sequence data</p>
            </st>
            <p>Version WS150 of the <it>C. elegans </it>genome was downloaded from the WormBase ftp site <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. The associated gene annotation for WS150 was downloaded using WormMart <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> and additional annotation was downloaded from the WormBase genome browser <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p><it>C. briggsae </it>genome sequence data</p>
            </st>
            <p>Version cb25 of the <it>C. briggsae </it>genome was downloaded from WormBase ftp site <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. This version is assembled into 578 contigs. The associated annotation was downloaded using WormMart <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>MRS and gene distribution in 2 Mb windows</p>
            </st>
            <p>Each chromosome was divided into consecutive, non-overlapping 2 Mb windows, with the first window starting at chromosome base position 1. Where the final window did not contain 2 Mb, the counts for that window were scaled proportionally. For each window, the number of MRS (from MRSfinder) and gene start positions (from WormBase) were assessed. Where a gene was annotated as having more than one transcript or gene model, one transcript and model was randomly selected.</p>
         </sec>
         <sec>
            <st>
               <p>Mononucletide randomisation of the genome sequence in variety of window sizes</p>
            </st>
            <p>For randomisation of sequences >= 32,000 bp, a roulette wheel selection algorithm was used where a nucleotide's chance of selection was based on its frequency in the original sequence. Due to the stochastic nature of this randomisation method the nucleotide frequency was verified to ensure it fell within 0.2% of that found in the original sequence. For sequences &lt;32,000 bp, the sequence was randomised using a Fisher-Yates shuffle. Each sequence was randomised 1000 times. Each chromosomal sequence was split into consecutive, non-overlapping windows of the appropriate length with correction for shorter end windows as above. Following randomisation, MRSfinder was used to identify all the MRS in the randomised sequence. The mean and standard deviation of the MRS counts for each randomised version of the sequence were calculated.</p>
         </sec>
         <sec>
            <st>
               <p>Randomisation of the genome using Markov chain processes</p>
            </st>
            <p>First, second and third order Markov chain processes were used to randomise the genome sequence following the algorithm of Workman and Krogh <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. In a first order Markov chain process, the first nucleotide is chosen by sampling from the mono-nucleotide frequency. Subsequent nucleotides are added by sampling the probability distribution derived from the frequency of the four di-nucleotides that start with the previous nucleotide. Higher order Markov chain processes are used to generate randomised sequence in a similar fashion.</p>
         </sec>
         <sec>
            <st>
               <p>Number of MRS in genome features</p>
            </st>
            <p>Genes, introns, exons, 3' UTR and 5' UTR were identified based on the GFF file for the appropriate <it>C. elegans </it>chromosome. Intergenic regions were defined as all sections of DNA not annotated as belonging to a gene. Where two or more incidents of a single feature type overlap, they were joined to form a single incident of that feature. The genomic coordinates of each feature were used to identify MRS that lay wholly within and partially overlapping a unit of that feature.</p>
            <p>The number of MRS expected to lie wholly within each feature type (i.e. complete overlap) was calculated using the formula:</p>
            <p>
               <display-formula>M(F((f-m)+1))/c</display-formula>
            </p>
            <p>The expected number of MRS expected to partially overlap a feature:</p>
            <p>
               <display-formula>M(F(2(m-w)))/c</display-formula>
            </p>
            <p>When the average size of the MRS exceeds that of the feature, a complete overlap is defined as a feature lying wholly within an MRS. The expected number was calculated using the formula:</p>
            <p>
               <display-formula>M(F(m-f)+1))/c</display-formula>
            </p>
            <p>The expected number of partial overlaps when the average size of the MRS exceeds that of the feature:</p>
            <p>
               <display-formula>M(F(2(f-w)))/c</display-formula>
            </p>
            <p>where M = number of MRS, F = number features of specific type, f = average length of feature, m = average length of MRS, w = minimum number of nucleotides required for a partial overlap and c = chromosome length.</p>
            <p>Three different scoring methods were used to combine the number of partial and complete overlaps to give an overall score. In method 1 complete overlaps = 1 point, partial overlaps = 0 points, method 2 complete overlaps = 1 point, partial overlaps = 1 point method 3 complete overlaps = 1 point, partial overlaps = 1/2 point. In all scoring methods, the minimum number of nucleotides required for a partial overlap was 12. An AT content correction factor was calculated based on the ratio of the number of MRS found in random sequence with the same AT content as each feature to the number of MRS found in random sequence with the same AT content as the genome. The number of MRS found in random sequence of specific AT content is shown in Additional File <supplr sid="S4">4</supplr>.</p>
         </sec>
         <sec>
            <st>
               <p>MRS frequency across CDS</p>
            </st>
            <p>In this analysis, one CDS per gene was used: where a gene was annotated with multiple transcripts and/or gene models, a single transcript/model was randomly selected to represent the gene. The CDS were then subjected to quality filters to remove poor quality sequence (containing Ns), CDS with insufficient sequence upstream or downstream and CDS that did not start with ATG or end with a stop codon. Of the 20,052 <it>C. elegans </it>CDS originally identified, 20,032 passed these filters. The 19528 <it>C. briggsae </it>CDS were reduced to 12954 after filtering. Each successfully filtered CDS was then split into consecutive, non-overlapping 50 bp windows, starting 1000 bp upstream of the CDS start site and continuing to 1000 bp downstream of the CDS stop site. The total number of MRS mid-points occurring in each window across all CDS was divided by the number of CDS used to produce a frequency of MRS occurrence in that window.</p>
         </sec>
         <sec>
            <st>
               <p>AT content across CDS</p>
            </st>
            <p>CDS were selected, filtered and split into consecutive, non-overlapping 50 and 10 bp windows as described above. For each window the AT content was calculated as a percentage of the window length. The mean AT% for each position across all CDS was calculated.</p>
         </sec>
         <sec>
            <st>
               <p>MRS in <it>C. briggsae </it>orthologs</p>
            </st>
            <p>The cb25 version of the <it>C. briggsae genome </it>sequence and annotated orthologs to <it>C. elegans </it>were downloaded from WormBase. After subjecting the 11,953 orthologs to filtering for length (i.e. sufficient sequence upstream and downstream for further analysis), poor quality (sequence containing Ns), and CDS not starting with ATG or ending in a stop codon, 4132 genes remained. MRSfinder was used to detect MRS within 200 bp of the CDS stop for each of these filtered genes in <it>C. elegans </it>and <it>C. briggsae</it>. To test for association between a <it>C. elegans </it>gene having an MRS and the <it>C. briggsae </it>ortholog having an MRS we calculated the log odds ratio (a &#215; d)/(b &#215; c) where a is the number of orthologs with an MRS within 200 bp of the CDS stop codon in <it>C. elegans </it>and <it>C. briggsae</it>, b is the number of orthologs where an MRS is only found within 200 bp of the CDS stop codon in <it>C. briggsae</it>, c is the number of orthologs where an MRS is only found within 200 bp of the CDS stop codon in <it>C. elegans </it>and d is the number of orthologs where neither organism has an MRS within 200 bp of the CDS stop codon.</p>
         </sec>
         <sec>
            <st>
               <p>Functional classification of MRS associated genes</p>
            </st>
            <p>A set of <it>C. elegans </it>genes with an MRS within 200 bp of the CDS stop codon was analysed to identify over or under-represented Gene Ontology (GO terms). The Gene Ontology annotation file for <it>C. elegans </it>was downloaded from the Gene Ontology website <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Following Vavouri <it>et al</it>. <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, only GO terms inferred from electronic annotation (evidence code IEA) were used due to the bias of RNAi phenotypes on the GO annotations of <it>C. elegans </it>genes. The Perl script map2slim and version 1.2 of the generic GO slim ontology (both available from the GO website <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>) were used to obtain GO slim term association counts for the <it>C. elegans </it>gene set. Of the 1057 genes in the set, 509 were associated with a GO slim term. The GO slim term counts for this gene set were compared to a reference set, containing all the remaining <it>C. elegans </it>genes. For each GO slim term the log odds ratio (a &#215; d)/(b &#215; c) was calculated, where a is the number of genes in the MRS set associated with the term, b is the number of genes in the reference set associated with the term, c is the number of genes in the MRS set not associated with the term and d is the number of genes in the reference set not associated with the term. To account for multiple testing, the Benjamini and Hochberg method was used to calculate a <it>p </it>value threshold for a 5% false discovery rate <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>AA carried out the analyses and wrote the manuscript. MB assisted with the analyses and writing the manuscript and supervised the project. Both authors read and approved the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We would like to thank Peter Keightley for useful comments on the manuscript. AA was supported by a NERC studentship. We would like to thank the useful comments of three anonymous referees.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Scaffold-associated regions: cis-acting determinants of chromatin structural loops and functional domains</p>
            </title>
            <aug>
               <au>
                  <snm>Laemmli</snm>
                  <fnm>UK</fnm>
               </au>
               <au>
                  <snm>K&#228;s</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Poljak</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Adachi</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>1992</pubdate>
            <volume>2</volume>
            <fpage>275</fpage>
            <lpage>85</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(05)80285-0</pubid>
                  <pubid idtype="pmpid">1322207</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Functional interaction between PML and SATB1 regulates chromatin-loop architecture and transcription of the MHC class I locus</p>
            </title>
            <aug>
               <au>
                  <snm>Kumar</snm>
                  <fnm>PP</fnm>
               </au>
               <au>
                  <snm>Bischof</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Purbey</snm>
                  <fnm>PK</fnm>
               </au>
               <au>
                  <snm>Notani</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Urlaub</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Dejean</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Galande</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nat Cell Biol</source>
            <pubdate>2007</pubdate>
            <volume>9</volume>
            <fpage>45</fpage>
            <lpage>56</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ncb1516</pubid>
                  <pubid idtype="pmpid" link="fulltext">17173041</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Use of matrix attachment regions (MARs) to minimize transgene silencing</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>Spiker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>43</volume>
            <fpage>361</fpage>
            <lpage>76</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1006424621037</pubid>
                  <pubid idtype="pmpid" link="fulltext">10999416</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>In-silico prediction and observations of nuclear matrix attachment</p>
            </title>
            <aug>
               <au>
                  <snm>Platts</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Quayle</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Krawetz</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>Cell Mol Biol Lett</source>
            <pubdate>2006</pubdate>
            <volume>11</volume>
            <fpage>191</fpage>
            <lpage>213</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.2478/s11658-006-0016-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">16847565</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A bipartite sequence element associated with matrix/scaffold attachment regions</p>
            </title>
            <aug>
               <au>
                  <snm>van Drunen</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Sewalt</snm>
                  <fnm>RG</fnm>
               </au>
               <au>
                  <snm>Oosterling</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Weisbeek</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Smeekens</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>van Driel</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>2924</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148508</pubid>
                  <pubid idtype="pmpid" link="fulltext">10390535</pubid>
                  <pubid idtype="doi">10.1093/nar/27.14.2924</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Boundary and insulator elements in chromosomes</p>
            </title>
            <aug>
               <au>
                  <snm>Gerasimova</snm>
                  <fnm>TI</fnm>
               </au>
               <au>
                  <snm>Corces</snm>
                  <fnm>VG</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>1996</pubdate>
            <volume>6</volume>
            <fpage>185</fpage>
            <lpage>92</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(96)80049-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">8722175</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Association of chromosome territories with the nuclear matrix. Disruption of human chromosome territories correlates with the release of a subset of nuclear matrix proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Ma</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Siegel</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Berezney</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Cell Biol</source>
            <pubdate>1999</pubdate>
            <volume>146</volume>
            <fpage>531</fpage>
            <lpage>42</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1083/jcb.146.3.531</pubid>
                  <pubid idtype="pmpid" link="fulltext">10444063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Active genes dynamically colocalize to shared sites of ongoing transcription</p>
            </title>
            <aug>
               <au>
                  <snm>Osborne</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Chakalova</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Carter</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Horton</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Debrand</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Goyenechea</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mitchell</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Lopes</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Reik</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2004</pubdate>
            <volume>36</volume>
            <fpage>1065</fpage>
            <lpage>71</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1423</pubid>
                  <pubid idtype="pmpid" link="fulltext">15361872</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Does looping and clustering in the nucleus regulate gene expression?</p>
            </title>
            <aug>
               <au>
                  <snm>Chambeyron</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bickmore</snm>
                  <fnm>WA</fnm>
               </au>
            </aug>
            <source>Curr Opin Cell Biol</source>
            <pubdate>2004</pubdate>
            <volume>16</volume>
            <fpage>256</fpage>
            <lpage>62</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ceb.2004.03.004</pubid>
                  <pubid idtype="pmpid" link="fulltext">15145349</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Organization of the higher-order chromatin loop: specific DNA attachment sites on nuclear scaffold</p>
            </title>
            <aug>
               <au>
                  <snm>Mirkovitch</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mirault</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Laemmli</snm>
                  <fnm>UK</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1984</pubdate>
            <volume>39</volume>
            <fpage>223</fpage>
            <lpage>32</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(84)90208-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">6091913</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Chromosomal loop anchorage of the kappa immunoglobulin gene occurs next to the enhancer in a region containing topoisomerase II sites</p>
            </title>
            <aug>
               <au>
                  <snm>Cockerill</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>Garrard</snm>
                  <fnm>WT</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1986</pubdate>
            <volume>44</volume>
            <fpage>273</fpage>
            <lpage>82</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(86)90761-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">3002631</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Characterization of randomly-obtained matrix attachment regions (MARs) from higher plants</p>
            </title>
            <aug>
               <au>
                  <snm>Michalowski</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>Hall</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>WF</fnm>
               </au>
               <au>
                  <snm>Spiker</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1999</pubdate>
            <volume>38</volume>
            <fpage>12795</fpage>
            <lpage>804</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi991142c</pubid>
                  <pubid idtype="pmpid" link="fulltext">10504249</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>S/MARt DB: a database on scaffold/matrix attached regions</p>
            </title>
            <aug>
               <au>
                  <snm>Liebich</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bode</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Frisch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wingender</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>372</fpage>
            <lpage>4</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99064</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752340</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.372</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Stress-induced duplex DNA destabilization in scaffold/matrix attachment regions</p>
            </title>
            <aug>
               <au>
                  <snm>Benham</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kohwi-Shigematsu</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bode</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>274</volume>
            <fpage>181</fpage>
            <lpage>96</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.1385</pubid>
                  <pubid idtype="pmpid" link="fulltext">9398526</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>A matrix associated region localizes the human SOCS-1 gene to chromosome 16p13.13</p>
            </title>
            <aug>
               <au>
                  <snm>Kramer</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Singh</snm>
                  <fnm>GB</fnm>
               </au>
               <au>
                  <snm>Doggett</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Krawetz</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>Somat Cell Mol Genet</source>
            <pubdate>1998</pubdate>
            <volume>24</volume>
            <fpage>131</fpage>
            <lpage>3</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/B:SCAM.0000007115.58601.87</pubid>
                  <pubid idtype="pmpid">9919312</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>MAR-Wiz</p>
            </title>
            <url>http://www.futuresoft.org/MAR-Wiz/</url>
         </bibl>
         <bibl id="B17">
            <title>
               <p>In silico prediction of scaffold/matrix attachment regions in large genomic sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Frisch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Frech</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Klingenhoff</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cartharius</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Liebich</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Werner</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>349</fpage>
            <lpage>54</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155272</pubid>
                  <pubid idtype="pmpid" link="fulltext">11827955</pubid>
                  <pubid idtype="doi">10.1101/gr.206602. Article published online before print in January 2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Genome-wide in silico mapping of scaffold/matrix attachment regions in Arabidopsis suggests correlation of intragenic scaffold/matrix attachment regions with gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Rudd</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Frisch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Grote</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Meyers</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Mayer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Werner</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2004</pubdate>
            <volume>135</volume>
            <fpage>715</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">514109</pubid>
                  <pubid idtype="pmpid" link="fulltext">15208419</pubid>
                  <pubid idtype="doi">10.1104/pp.103.037861</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Comparative study and prediction of DNA fragments associated with various elements of the nuclear matrix</p>
            </title>
            <aug>
               <au>
                  <snm>Glazko</snm>
                  <fnm>GV</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Glazkov</snm>
                  <fnm>MV</fnm>
               </au>
            </aug>
            <source>Biochim Biophys Acta</source>
            <pubdate>2001</pubdate>
            <volume>1517</volume>
            <fpage>351</fpage>
            <lpage>64</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11342213</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>A comparative study of S/MAR prediction tools</p>
            </title>
            <aug>
               <au>
                  <snm>Evans</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ott</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hansen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Koentges</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wernisch</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>71</fpage>
            <lpage>71</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1847452</pubid>
                  <pubid idtype="pmpid" link="fulltext">17335576</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-8-71</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Recruitment of heterogeneous nuclear ribonucleoprotein A1 in vivo to the LMP/TAP region of the major histocompatibility complex</p>
            </title>
            <aug>
               <au>
                  <snm>Donev</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Horton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Beck</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Doneva</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vatcheva</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bowen</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Sheer</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2003</pubdate>
            <volume>278</volume>
            <fpage>5214</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M206621200</pubid>
                  <pubid idtype="pmpid" link="fulltext">12435746</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Molecular analysis of repetitive DNA elements from Entamoeba histolytica, which encode small RNAs and contain matrix/scaffold attachment recognition sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Banerjee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lohia</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Biochem Parasitol</source>
            <pubdate>2003</pubdate>
            <volume>126</volume>
            <fpage>35</fpage>
            <lpage>42</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0166-6851(02)00244-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12554082</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Identification and characterization of a silkgland-related matrix association region in Bombyx mori</p>
            </title>
            <aug>
               <au>
                  <snm>Zhou</snm>
                  <fnm>CZ</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2001</pubdate>
            <volume>277</volume>
            <fpage>139</fpage>
            <lpage>44</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(01)00693-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">11602351</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A map of nuclear matrix attachment regions within the breast cancer loss-of-heterozygosity region on human chromosome 16q22.1</p>
            </title>
            <aug>
               <au>
                  <snm>Shaposhnikov</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Akopov</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Chernov</snm>
                  <fnm>IP</fnm>
               </au>
               <au>
                  <snm>Thomsen</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Joergensen</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Frengen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Nikolaev</snm>
                  <fnm>LG</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2007</pubdate>
            <volume>89</volume>
            <fpage>354</fpage>
            <lpage>61</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ygeno.2006.11.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">17188460</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Large-scale identification and mapping of nuclear matrix-attachment regions in the distal imprinted domain of mouse chromosome 7</p>
            </title>
            <aug>
               <au>
                  <snm>Purbowasito</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Suda</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yokomine</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zubair</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sado</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tsutsui</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sasaki</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>DNA Res</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <fpage>391</fpage>
            <lpage>407</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/dnares/11.6.391</pubid>
                  <pubid idtype="pmpid" link="fulltext">15871462</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>EMBOSS marscan</p>
            </title>
            <url>http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/marscan.html</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Stein</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Bao</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Blasiar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Blumenthal</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Chinwalla</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Clee</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Coghlan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Coulson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>D'eustachio</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Fitch</snm>
                  <fnm>DHA</fnm>
               </au>
               <au>
                  <snm>Fulton</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Fulton</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Hillier</snm>
                  <fnm>LW</fnm>
               </au>
               <au>
                  <snm>Kamath</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kuwabara</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Mardis</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Marra</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Miner</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Minx</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mullikin</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Plumb</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schein</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Sohrmann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Spieth</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stajich</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Willey</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2003</pubdate>
            <volume>1</volume>
            <fpage>E45</fpage>
            <lpage>E45</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">261899</pubid>
                  <pubid idtype="pmpid" link="fulltext">14624247</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0000045</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Genome sequence of the nematode C. elegans: a platform for investigating biology</p>
            </title>
            <aug>
               <au>
                  <cnm>C elegans Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <fpage>2012</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.282.5396.2012</pubid>
                  <pubid idtype="pmpid" link="fulltext">9851916</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>GC/AT-content spikes as genomic punctuation marks</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cantor</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Broude</snm>
                  <fnm>NE</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>16855</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">534751</pubid>
                  <pubid idtype="pmpid" link="fulltext">15548610</pubid>
                  <pubid idtype="doi">10.1073/pnas.0407821101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>BaNG Bioinformatics</p>
            </title>
            <url>http://www.nematodes.org/bioinformatics/index.shtml</url>
         </bibl>
         <bibl id="B31">
            <title>
               <p>WormBase</p>
            </title>
            <url>ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/sequences/dna</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>BioMart (MartView)</p>
            </title>
            <url>http://www.wormbase.org/biomart/martview/</url>
         </bibl>
         <bibl id="B33">
            <title>
               <p>C. elegans (current release)</p>
            </title>
            <url>http://www.wormbase.org/db/seq/gbrowse/wormbase/</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>WormBase</p>
            </title>
            <url>ftp://ftp.wormbase.org/pub/wormbase/genomes/briggsae/sequences/dna</url>
         </bibl>
         <bibl id="B35">
            <title>
               <p>No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution</p>
            </title>
            <aug>
               <au>
                  <snm>Workman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>4816</fpage>
            <lpage>4822</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148783</pubid>
                  <pubid idtype="pmpid" link="fulltext">10572183</pubid>
                  <pubid idtype="doi">10.1093/nar/27.24.4816</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The Gene Ontology</p>
            </title>
            <url>http://www.geneontology.org/</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Parallel evolution of conserved non-coding elements that target a common set of developmental regulatory genes from worms to humans</p>
            </title>
            <aug>
               <au>
                  <snm>Vavouri</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Walter</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Gilks</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Lehner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Elgar</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>R15</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1852409</pubid>
                  <pubid idtype="pmpid" link="fulltext">17274809</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-2-r15</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Controlling the false discovery rate: a practical and powerful approach to multiple testing</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hochberg</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J R Stat Soc</source>
            <pubdate>1995</pubdate>
            <volume>57</volume>
            <fpage>289</fpage>
            <lpage>300</lpage>
         </bibl>
      </refgrp>
   </bm>
</art>
