<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-7-323</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Conservation of noncoding microsatellites in plants: implication for gene regulation</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Zhang</snm>
               <fnm>Lida</fnm>
               <insr iid="I1"/>
               <email>zhangld@sjtu.edu.cn</email>
            </au>
            <au id="A2">
               <snm>Zuo</snm>
               <fnm>Kaijing</fnm>
               <insr iid="I1"/>
               <email>kjzuo@sjtu.edu.cn</email>
            </au>
            <au id="A3">
               <snm>Zhang</snm>
               <fnm>Fei</fnm>
               <insr iid="I1"/>
               <email>gardener@sjtu.edu.cn</email>
            </au>
            <au id="A4">
               <snm>Cao</snm>
               <fnm>Youfang</fnm>
               <insr iid="I1"/>
               <email>yfcao@sjtu.edu.cn</email>
            </au>
            <au id="A5">
               <snm>Wang</snm>
               <fnm>Jiang</fnm>
               <insr iid="I1"/>
               <email>wangjiang@sjtu.edu.cn</email>
            </au>
            <au id="A6">
               <snm>Zhang</snm>
               <fnm>Yidong</fnm>
               <insr iid="I1"/>
               <email>zyd@sjtu.edu.cn</email>
            </au>
            <au id="A7">
               <snm>Sun</snm>
               <fnm>Xiaofen</fnm>
               <insr iid="I2"/>
               <email>xfsun1@163.com</email>
            </au>
            <au id="A8" ca="yes">
               <snm>Tang</snm>
               <fnm>Kexuan</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>kxtang@sjtu.edu.cn</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Plant Biotechnology Research Center, Fudan-SJTU-Nottingham Plant Biotechnology R&amp;D Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200030, China</p>
            </ins>
            <ins id="I2">
               <p>State Key Laboratory of Genetic Engineering, Fudan-SJTU-Nottingham Plant Biotechnology R&amp;D Center, School of Life Sciences, Morgan-Tan International Center for Life Sciences, Fudan University, Shanghai 200433, China</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>323</fpage>
         <url>http://www.biomedcentral.com/1471-2164/7/323</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17187690</pubid>
               <pubid idtype="doi">10.1186/1471-2164-7-323</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>21</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>25</day>
               <month>12</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>25</day>
               <month>12</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Zhang et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Microsatellites are extremely common in plant genomes, and in particular, they are significantly enriched in the 5' noncoding regions. Although some 5' noncoding microsatellites involved in gene regulation have been described, the general properties of microsatellites as regulatory elements are still unknown. To address the question of microsatellites associated with regulatory elements, we have analyzed the conserved noncoding microsatellite sequences (CNMSs) in the 5' noncoding regions by inter- and intragenomic phylogenetic footprinting in the <it>Arabidopsis </it>and <it>Brassica </it>genomes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We identified 247 <it>Arabidopsis-Brassica </it>orthologous and 122 <it>Arabidopsis </it>paralogous CNMSs, representing 491 CT/GA and CTT/GAA repeats, which accounted for 10.6% of these types located in the 500-bp regions upstream of coding sequences in the <it>Arabidopsis </it>genome. Among these identified CNMSs, 18 microsatellites show high conservation in the regulatory regions of both orthologous and paralogous genes, and some of them also appear in the corresponding positions of more distant homologs in <it>Arabidopsis</it>, as well as in other plants. A computational scan of CNMSs for known <it>cis</it>-regulatory elements showed that light responsive elements were clustered in the region of CT/GA repeats, as well as salicylic acid responsive elements in the (CTT)<sub>n</sub>/(GAA)<sub>n </sub>sequences. Patterns of gene expression revealed that 70&#8211;80% of CNMS (CTT)<sub>n</sub>/(GAA)<sub>n </sub>associated genes were regulated by salicylic acid, which was consistent with the prediction of regulatory elements <it>in silico</it>.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our analyses showed that some noncoding microsatellites were conserved in plants and appeared to be ancient. These CNMSs served as regulatory elements involved in light and salicylic acid responses. Our findings might have implications in the common features of the over-represented microsatellites for gene regulation in plant-specific pathways.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Microsatellites, as one of the major repeat classes, are extremely common in eukaryotic genomes <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. They are generally thought to result from the mutation effects of replication slippage <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Different from the origin of microsatellites from repetitive DNA in animals <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, plant microsatellites show a significant association with nonrepetitive DNA <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. They can be found abundantly within or near genes in plant genomes, and in particular, some types are significantly enriched within the 5' noncoding regions of plant genes <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. For example, in <it>Arabidopsis thaliana</it>, this feature is mostly attributable to the fact that CT/GA and CTT/GAA repeats are more frequently found in 5'-flanks than in other genomic regions, suggesting that they can potentially function as factors in regulating gene expression <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <p>For quite a long time, microsatellites were only considered as genetic markers in DNA fingerprinting and diversity studies due to the extensive length polymorphisms. However, recent findings show that some of them act as <it>cis</it>-regulatory elements which can be recognized by transcription factors <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. It has been well known for so-called GAGA elements, comprising the dinucleotide repeat sequence (GA)<sub>n </sub>to be present in promoters regulating numerous developmental genes in animals <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. Similarly, the (GA)<sub>n </sub>sequences in regulatory regions of some plant genes can also be recognized by GAGA-binding factors <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, and more generally, the GA-rich element, a more complex 9 base pairs (bp) based (GA)<sub>n </sub>repeat, has been shown to have protein-binding affinity <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Another major microsatellite in plants, the trinucleotide repeat sequence (GAA)<sub>n </sub>presented within 5'UTR of <it>ntp303 </it>was found important in the modulation of transcription and translation efficiency <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Furthermore, some unusual phenotypic variations were found to be associated with the length of 5' noncoding microsatellites. A typical example was reported by Bao and his colleagues that variation in the number of CT/GA repeats in the 5'UTR of the <it>waxy </it>gene was correlated with amylose content in rice <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Although the mechanism is still unclear, the microsatellite length polymorphism is thought to affect the expression of the related genes of amylose synthesis.</p>
         <p>Regions of DNA involved in gene regulation are expected to exhibit sequence conservation between related species over evolutionary time due to functional constraints. It has been recognized that comparative analyses of noncoding DNA sequences in multiple species, known as phylogenetic footprinting, can help identify conserved putative regulatory elements <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Successful identification of conserved noncoding sequences in comparisons among different grass genomes and cruciferous species, as well as between closely related genomic sequences from <it>Arabidopsis </it>and <it>Brassica </it>species has provided some good references for discovery of Conserved Noncoding Microsatellite Sequences (CNMSs) by phylogenetic footprinting in plants <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>.</p>
         <p>If microsatellites are important for regulating gene expression, they should be conserved in the homologous promoters through gene duplication or speciation during plant evolution. To address the question of microsatellites associated with gene regulatory elements, we used inter- and intragenomic phylogenetic footprinting to analyze the dominant microsatellites in the 5' noncoding regions of <it>Arabidopsis </it>and <it>Brassica oleracea </it>genes for CNMSs. About 10% of 5' noncoding CT/GA and CTT/GAA repeats are conserved in the <it>Arabidopsis </it>genome, and they are preferentially involved in gene regulation in plant-specific pathways.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Distribution of microsatellites in different genomic regions</p>
            </st>
            <p>The characteristics of microsatellite occurrences were surveyed among the different genomic regions in the <it>Arabidopsis </it>genome. It was obvious that microsatellites were found to be highly abundant in the regulatory regions, and the over-representation of CT/GA and CTT/GAA repeats contributed most to the increase of microsatellites in these regions (Figure <figr fid="F1">1A</figr>, <figr fid="F1">1B</figr>). This preference of CT/GA and CTT/GAA repeat occurrences indicated that they might have the role in regulating genes.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Distribution of microsatellites in different genomic regions in <it>Arabidopsis</it></p>
               </caption>
               <text>
                  <p><b>Distribution of microsatellites in different genomic regions in <it>Arabidopsis</it></b>. (A) Frequencies of dinucleotide repeats in different genomic fractions. Other dimers include all dimers except CT/GA repeats. (B) Frequencies of trinucleotide repeats in different genomic fractions. Other triplets include all triplets except CTT/GAA repeats. 5'-flanks correspond to the 500 bp sequences upstream of the initiation codon.</p>
               </text>
               <graphic file="1471-2164-7-323-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Conservation of microsatellites in <it>Arabidopsis</it></p>
            </st>
            <p>Regulatory sequence elements within promoter DNA are often short, orientation independent and contain frequent gaps of variable size. Thus, we determined the conserved noncoding microsatellite sequences as candidate regulatory elements based upon the following criteria: that there were at least 6-bp overlapping regions of the corresponding microsatellites between the aligned sequences. According to the criteria, we identified 247 <it>Arabidopsis-Brassica </it>orthologous CNMSs and 122 <it>Arabidopsis </it>paralogous CNMSs [see <supplr sid="S1">Additional file 1</supplr>], involving 491 CT/GA and CTT/GAA repeats respectively (Table <tblr tid="T1">1</tblr>), which accounted for 10.6% of these types located in the 500-bp regions upstream of coding sequences in the <it>Arabidopsis </it>genome. These CNMSs do not randomly occur in different noncoding regions and they tend to be found more frequently near the initiation codon (Figure <figr fid="F2">2A</figr>, <figr fid="F2">2B</figr>).</p>
            <p>In order to validate the above study and to ensure that the observation of CNMSs was not simply due to its over-representation in plant genomes, a similar analysis was carried out on three different random datasets, i.e. the 1000 homologous pairs of 5' noncoding sequences in <it>Arabidopsis </it>as dataset 1, the 1000 randomly shuffled pairs of 5' noncoding sequences as dataset 2 and the 1000 random pairs of genomic DNA sequences as dataset 3, as well as the three corresponding datasets of <it>Arabidopsis </it>and <it>Brassica </it>sequence pairs with the same data size. Figure <figr fid="F3">3</figr> showed the frequencies of CNMS (CT/GA)<sub>n </sub>and (CTT/GAA)<sub>n </sub>in dataset 1, dataset 2 and dataset 3, respectively. Obviously, there was very little probability that CNMSs were found in the 500-bp genomic DNA sequence pair by chance. In contrast with the random pairs of noncoding sequences, the homologous noncoding sequences showed significant high in the frequency of CNMS occurrences. Taken together, these tests indicated that some microsatellites in regulatory regions were conserved from common ancestors during plant evolution.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>The full list of pairs of conserved noncoding microsatellites identified in this study.</p>
               </text>
               <file name="1471-2164-7-323-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Summary of <it>Arabidopsis-Brassica </it>and <it>Arabidopsis-Arabidopsis </it>CNMSs</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4" ca="center">
                        <p>CNMSs</p>
                     </c>
                     <c ca="center">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p><it>Arabidopsis </it>CNMS genes</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>(GA)<sub>n</sub></p>
                     </c>
                     <c ca="center">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="center">
                        <p>(GAA)<sub>n</sub></p>
                     </c>
                     <c ca="center">
                        <p>(CTT)<sub>n</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Arabidopsis-Brassica</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>131</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>247</p>
                     </c>
                     <c ca="center">
                        <p>242</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Arabidopsis-Arabidopsis</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>59</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>122</p>
                     </c>
                     <c ca="center">
                        <p>234</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Distribution of (A) orthologous and (B) paralogous CNMSs in the 5' noncoding regions <it>in Arabidopsis</it></p>
               </caption>
               <text>
                  <p><b>Distribution of (A) orthologous and (B) paralogous CNMSs in the 5' noncoding regions <it>in Arabidopsis</it></b>. The position is indicated as segments of 100 bp.</p>
               </text>
               <graphic file="1471-2164-7-323-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Occurrences of CNMSs in the random datasets</p>
               </caption>
               <text>
                  <p><b>Occurrences of CNMSs in the random datasets</b>. Dataset 1, dataset 2 and dataset 3 respectively correspond to the 1000 homologous pairs, the 1000 shuffled pairs of noncoding sequences and the 1000 random pairs of genomic sequences in the analysis. Sequence length is 500 bp. Occurrences of CNMSs were analyzed in analogous manner for 10 different random sets with equal data size. Means of CNMS occurrences are indicated on the y axis, and error bars represent SEs.</p>
               </text>
               <graphic file="1471-2164-7-323-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Evolution of conserved microsatellites in <it>Arabidopsis</it></p>
            </st>
            <p>To gain insight into the evolutionary relationship of <it>Arabidopsis-Brassica </it>and <it>Arabidopsis-Arabidopsis </it>CNMSs, the synonymous substitution rate (<it>Ks</it>) was calculated for the corresponding gene pairs. For <it>Arabidopsis-Brassica </it>orthologous CNMS gene pairs, the frequency distribution showed a clear peak for <it>Ks </it>values of 0.4 to 0.5 (Figure <figr fid="F4">4A</figr>), suggesting that these CNMSs were conserved from a common ancestor over a 15 million years (Myr) period, which was consistent with the divergence time frame estimated at 14.5 to 20.4 Myr based on mitochondrial DNA data <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. On the other hand, we noticed two peaks in the <it>Ks </it>distribution of <it>Arabidopsis </it>paralogous CNMS gene pairs, and the <it>Ks </it>values were 0.8 to 0.9 and 1.2 to 1.3, respectively (Figure <figr fid="F4">4B</figr>). The former group contained most of the paralogous CNMSs which were originated from large scale gene duplication over 28 Myr ago, which was consistent with the recent polyploidization event during evolution of the <it>Arabidopsis </it>genome <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The latter group were duplicated from the common ancestor over 42 Myr ago, which probably occurred at the time of the divergence of brassicaceae family <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Distribution of synonymous substitution rate (<it>Ks</it>) for CNMS gene sets</p>
               </caption>
               <text>
                  <p><b>Distribution of synonymous substitution rate (<it>Ks</it>) for CNMS gene sets</b>. (A) Distribution of <it>Ks </it>values for <it>Arabidopsis-Brassica </it>orthologous CNMS gene set. A clear peak for <it>Ks </it>values of 0.4 to 0.5. (B) Distribution of <it>Ks </it>values for paralogous CNMS gene set in <it>Arabidopsis</it>. Two clear peak for the <it>Ks </it>values of 0.8 to 0.9 and 1.2 to 1.3.</p>
               </text>
               <graphic file="1471-2164-7-323-4"/>
            </fig>
            <p>The results from the evolutionary relationships of <it>Arabidopsis-Brassica </it>and <it>Arabidopsis-Arabidopsis </it>CNMSs suggested that most paralogous CNMSs pre-dated the divergence of the two species; hence, many paralogous CNMSs in <it>Arabidopsis </it>were likely to find their counterparts in <it>Brassica</it>. Further comparisons of paralogous and orthologous genes from <it>Arabidopsis </it>and <it>Brassica </it>were made for common CNMSs (Figure <figr fid="F5">5A</figr>, <figr fid="F5">5B</figr>). With the same criteria, we identified 18 CNMSs found in <it>Arabidopsis </it>paralogous pairs that also were coincident with CNMSs from at least one orthologs in <it>Brassica </it>(Table <tblr tid="T2">2</tblr>). We called these conserved elements, shared among paralogous and orthologous genes, Ultra-CNMSs. An example of such Ultra-CNMSs was shown in Figure <figr fid="F5">5C</figr>, and the three homologous CT repeats were highly conserved from a common ancestor over 48 Myr.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>An example of microsatellites conserved among paralogous and orthologous genes</p>
               </caption>
               <text>
                  <p><b>An example of microsatellites conserved among paralogous and orthologous genes</b>. Sequence alignments and positions for the microsatellites conserved among (A) <it>Arabidopsis </it>paralogous genes, (B) <it>Arabidopsis-Brassica </it>orthologous genes and (C) homologous genes from <it>Arabidopsis </it>and <it>Brassica</it>. Boxed regions indicate CNMSs. Dots indicate omits of alignment. Nucleotide positions are given relative to the initiation codon.</p>
               </text>
               <graphic file="1471-2164-7-323-5"/>
            </fig>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Ultra-CNMSs in <it>Arabidopsis-Brassica </it>orthologs and <it>Arabidopsis </it>paralogs</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>CNMSs</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Orthologs</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Orthologs</p>
                     </c>
                     <c ca="left">
                        <p>Function description</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Arabidopsis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Brassica</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Arabidopsis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Brassica</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(GA)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At2g16780</p>
                     </c>
                     <c ca="left">
                        <p>BOMMT72TR</p>
                     </c>
                     <c ca="left">
                        <p>At4g35050</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>WD-40 repeat protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(GA)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At4g25620</p>
                     </c>
                     <c ca="left">
                        <p>gn1ltil104045340</p>
                     </c>
                     <c ca="left">
                        <p>At5g52430</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>hydroxyproline-rich glycoprotein protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(GAA)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>Atlg12420</p>
                     </c>
                     <c ca="left">
                        <p>BONGX39TF</p>
                     </c>
                     <c ca="left">
                        <p>At4g22780</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>ACT domain protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(GAA)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At4g29000</p>
                     </c>
                     <c ca="left">
                        <p>BOHVH87TR</p>
                     </c>
                     <c ca="left">
                        <p>At2g20110</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>TSO1-like CXC domain protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>Atlg07870</p>
                     </c>
                     <c ca="left">
                        <p>BONGE63TR</p>
                     </c>
                     <c ca="left">
                        <p>At2g28590</p>
                     </c>
                     <c ca="left">
                        <p>BONRT22TR</p>
                     </c>
                     <c ca="left">
                        <p>protein kinase</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>Atlg14870</p>
                     </c>
                     <c ca="left">
                        <p>BONHS20TF</p>
                     </c>
                     <c ca="left">
                        <p>At5g35525</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>Atlg68360</p>
                     </c>
                     <c ca="left">
                        <p>BOIHI85TR</p>
                     </c>
                     <c ca="left">
                        <p>Atlg67030</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>zinc finger</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>Atlg77660</p>
                     </c>
                     <c ca="left">
                        <p>BOOAX17TF</p>
                     </c>
                     <c ca="left">
                        <p>Atlg21920</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>phosphatidylinositol-4-phos phate 5-kinase-related</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At2g28890</p>
                     </c>
                     <c ca="left">
                        <p>BOHQE28TR</p>
                     </c>
                     <c ca="left">
                        <p>Atlg07630</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>protein phosphatase 2C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At2g45050</p>
                     </c>
                     <c ca="left">
                        <p>BONAX55TR</p>
                     </c>
                     <c ca="left">
                        <p>At3g60530</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>zinc finger</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At2g47440</p>
                     </c>
                     <c ca="left">
                        <p>BOHWG09TF</p>
                     </c>
                     <c ca="left">
                        <p>At3g62570</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>heat shock protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At3g06760</p>
                     </c>
                     <c ca="left">
                        <p>BOICE92TR</p>
                     </c>
                     <c ca="left">
                        <p>At5g49230</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>drought-responsive protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At2g47485</p>
                     </c>
                     <c ca="left">
                        <p>BOIFI37TR</p>
                     </c>
                     <c ca="left">
                        <p>At3g62650</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At3g22270</p>
                     </c>
                     <c ca="left">
                        <p>BOGAX66TF</p>
                     </c>
                     <c ca="left">
                        <p>At4g14990</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At4g34720</p>
                     </c>
                     <c ca="left">
                        <p>BOHJQ57TF</p>
                     </c>
                     <c ca="left">
                        <p>At2g16510</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>vacuolar ATP synthase</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At5g01240</p>
                     </c>
                     <c ca="left">
                        <p>BOHGK06TF</p>
                     </c>
                     <c ca="left">
                        <p>At2g38120</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>amino acid permease</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At5g11930</p>
                     </c>
                     <c ca="left">
                        <p>BOHPT64TF</p>
                     </c>
                     <c ca="left">
                        <p>At4g33040</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>glutaredoxin protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CTT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>At5g27430</p>
                     </c>
                     <c ca="left">
                        <p>gn1ltil103985703</p>
                     </c>
                     <c ca="left">
                        <p>At3g05230</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>signal peptidase subunit</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Conservation of microsatellites in plants</p>
            </st>
            <p>As expected, analysis of regulatory regions of related gene families revealed that many Ultra-CNMSs were conserved across a number of more distantly homologous genes in Brassicaceae species and other plants. Figure <figr fid="F6">6A</figr> showed that CNMSs (CT)<sub>n </sub>were conserved among orthologous genes from <it>Arabidopsis, Brassica, Medicago </it>and rice, as well as among more distantly paralogous genes in <it>Arabidopsis</it>. These genes are representatives of a larger family of transmembrane receptor kinases and related non-transmembrane kinases in plant genomes. Many of them arised from a common ancestor of dicots and monocots. Another striking CNMS was found in the regulatory regions of GATA transcription factor genes from <it>Brassica, Arabidopsis </it>and rice. Of 14 members in subfamily I in the <it>Arabidopsis </it>genome <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, five of them have the same CNMSs found in their regulatory regions (Figure <figr fid="F6">6B</figr>). These CNMS associated transcription factor genes that fell into two subgroups indicated they diverged before the dicotyledonous and monocotyledonous plants <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. It was obvious that these Ultra-CNMSs had been passed down from a common ancestor of dicots and monocots under extreme purifying selection for more than 170 Myr <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Conservation of microsatellites in plants</p>
               </caption>
               <text>
                  <p><b>Conservation of microsatellites in plants</b>. (A) Sequence alignments for the CNMS conserved among the homologous protein kinase genes from <it>Arabidopsis, Brassica, Medicago </it>and rice. Atlg07870, At2g28590, BONGE63TR, BONRT22TR, AC135230_13 and Os07g49470 are orthologs for the four species. Sequences of <it>Medicago </it>(AC135230_13) and rice (Os07g49470) obtained from TIGR plant genome sequence database. (B) Sequence alignments for the CNMS conserved among the homologous GATA transcription factor genes from <it>Arabidopsis, Brassica </it>and rice. BONAX55TR and At2g45050 are orthologs for <it>Brassica </it>and <it>Arabidopsis</it>. The homologous gene Os05g44400 from rice.</p>
               </text>
               <graphic file="1471-2164-7-323-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Annotation enrichment and depletion of CNMS associated genes</p>
            </st>
            <p>We tested whether CNMSs in genes were influenced by the function of the proteins they encode. There were 206 <it>Arabidopsis-Brassica </it>and 194 <it>Arabidopsis-Arabidopsis </it>CNMS associated genes with known function in the <it>Arabidopsis </it>genome. We looked for categories of biological process and molecular function defined in the Gene Ontology (GO) database that were significantly enriched or depleted in these genes <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. These CNMS associated genes showed significant functional enrichment for transcription factor activity (P &lt; 1.8 &#215; 10<sup>-7 </sup>for orthologous CNMS genes, and P &lt; 7.7 &#215; 10<sup>-7 </sup>for paralogous CNMS genes, against all GO annotated <it>Arabidopsis </it>genes) and transcription (P &lt; 4.5 &#215; 10<sup>-4 </sup>for orthologs and P &lt; 6.6 &#215; 10<sup>-5 </sup>for paralogs) (Figure <figr fid="F7">7</figr>), and genes that performed the functions of the two types accounted for about 23% of all known genes. However, they were obviously depleted for DNA metabolism (P &lt; 2.2 &#215; 10<sup>-4 </sup>for orthologous CNMS genes, and P &lt; 1.8 &#215; 10<sup>-2 </sup>for paralogous CNMS genes). These findings suggested that CNMSs might be specifically associated with regulation of transcription at the DNA level, but not involved in DNA metabolism.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Annotation enrichment and depletion of CNMS associated genes</p>
               </caption>
               <text>
                  <p><b>Annotation enrichment and depletion of CNMS associated genes</b>. In the top half of the figure, the maroon bars ("observed") give the numbers of orthologous CNMS genes that are annotated in the <it>Arabidopsis </it>GO database with molecular function "transcription factor activity" or biological process "transcription regulation" and "DNA metabolism". The blue bars ("expected") give the number of genes that one would expect to obtain if the same number of genes were chosen at random among all genes annotated in the relevant database. The bottom half of the figure gives similar information for paralogous CNMS genes in <it>Arabidopsis</it>.</p>
               </text>
               <graphic file="1471-2164-7-323-7"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>CNMSs as regulatory elements in plants</p>
            </st>
            <p>To further investigate the regulatory nature of these CNMSs, we employed a computational method to discover <it>cis</it>-elements that were similar to function assigned elements based on the PlantCARE <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp> and PLACE databases <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. The identification of <it>cis</it>-elements showed that some binding sites were clustered in the consevered microsatellite regions, and these regulatory elements were involved in plant-specific functions in response to some environmental stimuli (Table <tblr tid="T3">3</tblr>). The CNMS (CT)<sub>n </sub>include the TCTCtCT sequences similar to the TCCC motif known as part of conserved DNA module array AtpCD-CMA involved in light responsiveness <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>. Another function of CNMS (CT)<sub>n </sub>may be as an enhancer due to the same motif (TCTCTCTCT) found in a 60-nt region downstream of the transcription start site of the CaMV 35S RNA, which can enhance gene translation in plant protoplasts <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. As complementary sequences to (CT)<sub>n</sub>, (GA)<sub>n </sub>serve as regulatory element having similar functions, which contain a series of overlapped GAG motifs (AGAGAGa) involved in light regulation <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B37">37</abbr></abbrgrp>. In soybean, it is clear that the 18-bp GAGA element sequence within the <it>Gsal </it>promoter can be recognized by GBP encoded by a light-regulated gene <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The CNMS (CTT)<sub>n </sub>contain sequences similar to the TCA-element (TCATCTTCTT) which is a binding site for salicylic acid-inducible proteins <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Similarly, the CNMS (GAA)<sub>n </sub>contain AGAA sequences having the characteristic of the core recognition sequence (tcAGAAgagg) for salicylic acid-responsive genes <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Prediction of CNMSs serve as regulatory elements <it>in silico</it></p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>CNMSs</p>
                     </c>
                     <c ca="left">
                        <p>Method</p>
                     </c>
                     <c ca="left">
                        <p>Motif Name</p>
                     </c>
                     <c ca="left">
                        <p>Recognized Sequence</p>
                     </c>
                     <c ca="left">
                        <p>Motif Function</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(GA)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>PlantCARE</p>
                     </c>
                     <c ca="left">
                        <p>GAG-motif</p>
                     </c>
                     <c ca="left">
                        <p>AGAGAGa</p>
                     </c>
                     <c ca="left">
                        <p>part of a light responsive element [35,37]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PLACE</p>
                     </c>
                     <c ca="left">
                        <p>GAGAGMGSA1</p>
                     </c>
                     <c ca="left">
                        <p>(GA)<sub>9</sub></p>
                     </c>
                     <c ca="left">
                        <p>Binding site for GAGA-binding factor, and <it>Gbp </it>is a light-responsive gene [12].</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(GAA)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>PlantCARE</p>
                     </c>
                     <c ca="left">
                        <p>TCA-element</p>
                     </c>
                     <c ca="left">
                        <p>aAGAAgaaga</p>
                     </c>
                     <c ca="left">
                        <p>salicylic acid responsive element [39]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CT)<sub>n</sub></p>
                     </c>
                     <c ca="left">
                        <p>PlantCARE</p>
                     </c>
                     <c ca="left">
                        <p>TCCC-motif</p>
                     </c>
                     <c ca="left">
                        <p>TCTCtCT</p>
                     </c>
                     <c ca="left">
                        <p>part of a light responsive element [34,35].</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PLACE</p>
                     </c>
                     <c ca="left">
                        <p>CTRMCAMV35S</p>
                     </c>
                     <c ca="left">
                        <p>TCTCTCTCT</p>
                     </c>
                     <c ca="left">
                        <p>CT-rich motif found in a 60-nt region downstream of the transcription start site of the CaMV 35S RNA; Can enhance gene expression [36].</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>(CTT)<sub>n</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>TCA-element</p>
                     </c>
                     <c ca="left">
                        <p>TCtTCTTCTT</p>
                     </c>
                     <c ca="left">
                        <p>salicylic acid responsive element [38]</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Although CNMSs (CT)<sub>n</sub>/(GA)<sub>n </sub>and (CTT)<sub>n</sub>/(GAA)<sub>n </sub>are similar to known regulatory elements, most of them have no experimental verification for their functions. Therefore, all CNMS (CTT)<sub>n</sub>/(GAA)<sub>n </sub>associated genes were selected to investigate their changes in expression levels after the treatment of salicylic acid. The abundance of gene transcripts evaluated by the MPSS showed these CNMS (CTT)<sub>n</sub>/(GAA)<sub>n </sub>associated genes had distinct expression characters with salicylic acid treatment (Figure <figr fid="F8">8A</figr>, <figr fid="F8">8B</figr>). About 70&#8211;80% of CNMS (CTT)<sub>n</sub>/(GAA)<sub>n </sub>associated genes in <it>Arabidopsis </it>leaves were regulated by salicylic acid, while others were undetectable with and without salicylic acid treatment. Among these salicylic acid-responsive genes, only about 15&#8211;23% of them were up-regulated by salicylic acid, and most of them were inhibited after the treatment. Seven CNMS (CTT)<sub>n</sub>/(GAA)<sub>n </sub>associated genes were additionally analyzed for expression patterns after salicylic acid treatment. The RT-PCR showed that these investigated genes, excepted At2g05920 and At5g67360, were obviously down-regulated by salicylic acid (Figure <figr fid="F9">9</figr>), which were consistent with the patterns of gene expression from the <it>Arabidopsis </it>MPSS database. According to the expression patterns by RT-PCR, we found the preliminary correlation between repeat number of CTT/GAA motif and gene in response to salicylic acid. The (CTT)<sub>4</sub>/(GAA)<sub>4 </sub>sequences were associated with gene down-regulation with salicylic acid stimulus, but the (CTT)<sub>5 </sub>and (CTT)<sub>7 </sub>associated genes were not obviously regulated by salicylic acid. These findings implied that regulation of CNMS associated gene expression by salicylic acid might be dependent on the number of CTT/GAA repeats.</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Expression patterns of (A) orthologous and (B) paralogous CNMS (CTT)<sub>n</sub>/(GAA)<sub>n </sub>associated genes in <it>Arabidopsis </it>leaves with salicylic acid treatment</p>
               </caption>
               <text>
                  <p><b>Expression patterns of (A) orthologous and (B) paralogous CNMS (CTT)<sub>n</sub>/(GAA)<sub>n </sub>associated genes in <it>Arabidopsis </it>leaves with salicylic acid treatment</b>. Genes were inhibited after 4 hours of treatment with salicylic acid (green), induced after 4 hours of treatment with salicylic acid (red) and kept inactive until 52 hours of treatment with salicylic acid (black). The gene expression levels were estimated by <it>Arabidopsis </it>MPSS data from three different libraries generated by untreated leaves and treated leaves 4 (S04) and 52 hours (S52) after salicylic acid treatment, respectively. TPM is normalized value in transcripts per million for each signature in the library.</p>
               </text>
               <graphic file="1471-2164-7-323-8"/>
            </fig>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Expression patterns of <it>Arabidopsis </it>CNMS (CTT)<sub>n</sub>/(GAA)<sub>n </sub>associated genes and their related sequence information</p>
               </caption>
               <text>
                  <p><b>Expression patterns of <it>Arabidopsis </it>CNMS (CTT)<sub>n</sub>/(GAA)<sub>n </sub>associated genes and their related sequence information</b>. The expression of genes were assayed by RT-PCR. Lanes 1&#8211;5: control (untreated), 1, 4, 12 and 48 hours after treatment with 1 mM salicylic acid, respectively. The <it>actin2 </it>gene was used as an internal control in the RT-PCR reaction.</p>
               </text>
               <graphic file="1471-2164-7-323-9"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Microsatellites (CT)<sub>n</sub>/(GA)<sub>n </sub>and (CTT)<sub>n</sub>/(GAA)<sub>n </sub>are well presented in the <it>Arabidopsis </it>genome, and in particular, they are preferentially located within the 5' noncoding regions. In this study, we identified 491 conserved CT/GA and CTT/GAA repeats for candidate regulatory elements by inter- and intragenomic phylogenetic footprinting. These CNMSs tend to occur within these regions near the initiation codon with the preference of CT and CTT motifs, which are consistent with the characteristic of pyrimidine-rich repeat distribution in these regions <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr></abbrgrp>. Another striking feature of CNMS distribution is that they are rarely found in the peri-centromeric regions; in contrast, their related genes are always clustered in chromosome arms (data not shown). The reasons for the absence of CNMS on peri-centromeric regions are still unclear, but CNMS associated genes occurring in clusters on chromosome arms is probably attributable to co-expression.</p>
         <p>Microsatellites generally evolve rapidly, but there are about 10% of 5' noncoding CT/GA and CTT/GAA repeats which show high conservation in occurrences and appear to be ancient. In particular, the Ultra-CNMSs have been under purifying selection for more than 42 Myr, and some of them for at least 170 Myr. This conservation may be explained by function constraint so that many homologous genes have the corresponding microsatellite sequences in their regulatory regions. Most microsatellites of CT/GA and CTT/GAA types seem to be originated by recent mutations under positive selection <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B7">7</abbr></abbrgrp>, which lead to the significant over-representation of microsatellites in the 5' noncoding regions compared with other genomic fractions. The reasons of positive selection for some repeat occurrences are still unknown. However, at least, they may provide opportunities for rapid adaptive changes in these regulatory regions or play specific roles in gene regulation.</p>
         <p>It is well known that intergenomic phylogenetic footprinting is an effective method for the discovery of regulatory elements in a set of orthologous noncoding regions from multiple species <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. In plant genomes, intragenomic phylogenetic footprinting represents another powerful strategy to detect regulatory elements due to the facts that most plant genomes are rich in duplicated genes and large fractions of these gene pairs share transcriptional characteristics <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. Although detection of the full complement of <it>cis</it>-elements is not feasible by this approach due to potential acquisition and loss of individual regulatory elements between duplicated promoters, we can readily identify several specific regulatory elements which show high conservation in duplicated genes. Using this approach, we have successfully identified 122 <it>Arabidopsis </it>CNMSs as candidate regulatory elements of plant-specific function. Most of paralogous CNMSs were originated from the recent polyploidization event before the divergence between <it>Arabidopsis </it>and <it>Brassica </it><abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, implying that they might be conserved with their counterparts in <it>Brassica</it>. We compared the data generated by inter- and intragenomic phylogenetic footprinting and found 18 CNMSs highly conserved in both orthologous and paralogous sequences. The number of the identified ultra-CNMSs may be underestimated for the incomplete reference genome sequences of <it>Brassica </it>or the false orthologous relationships. These conserved microsatellites occurring among three or more homologous genes provides greater evidence that these CNMS are likely to be significant in gene regulation.</p>
         <p>Functional annotation showed that CNMS associated genes were obviously depleted for DNA metabolism, such as DNA replication, DNA recombination and DNA repair. It is possible that genes that are essential for survival, lack CNMSs within their 5' noncoding regions because these genes do not need some specific regulatory elements. In contrast, these CNMS genes are preferentially associated with regulation of transcription in plants. CNMSs serve as regulatory elements and their related genes can be responsive to one or more forms of environmental stimuli (Table <tblr tid="T3">3</tblr>). The functional biases imply that CNMS associated genes (e.g. transmembrane receptor kinase genes and transcription factor genes) encoding proteins are involved in upstream pathways of defense responses in plants.</p>
         <p>Although GAGA elements are known to be involved in the regulation of numerous developmental genes in animals <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, we believe that CNMSs (CT)<sub>n</sub>/(GA)<sub>n </sub>are likely to be associated with transcriptional regulation in light signaling pathways in plants <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. These CNMSs are often found in a number of different light-regulated genes <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B41">41</abbr></abbrgrp>. Although expression of most CNMS (CT)<sub>n</sub>/(GA)<sub>n </sub>associated genes was not significantly changed with light/dark transitions, three Ultra-CNMSs related genes (At5g52430, Atlg21920 and At3g62650) were obviously induced with longer periods of darkness according to microarray gene expression data of 7800 unique <it>Arabidopsis </it>genes <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. This was consistent with the fact that about 9% of these CNMS genes were significantly down-regulated, while only 2% of them were up-regulated for light by a whole-genome expression analysis in seedling of <it>Arabidopsis </it><abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. It is possible that the expression level changes of most CNMS (CT)<sub>n</sub>/(GA)<sub>n </sub>associated genes are not obvious under light since they are always in upstream of related pathways. However, CNMSs (CT)<sub>n</sub>/(GA)<sub>n</sub>, at least parts of them, may be the binding sites for <it>trans</it>-acting regulators involved in light signaling pathways and their associated genes can be induced under darkness.</p>
         <p>Salicylic acid is well known as an important signaling molecule involved in both locally and systemically induced disease resistance responses <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Many salicylic acid responsive genes have been found in plant defense pathways. The CNMS (CTT)<sub>n</sub>/(GAA)<sub>n </sub>associated genes exhibit distinct expression characters with salicylic acid treatment, implying that they may be associated with a range of different stresses <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. CNMSs (CTT)<sub>n</sub>/(GAA)<sub>n </sub>as regulatory elements regulating gene expression are associated with the repeat number in salicylic acid signaling pathways. They may not act as isolated transcription factor binding sites to regulate gene expression. Instead, they are likely to co-operate with other elements to perform complex regulatory functions in transcription. Perhaps some of them may perform roles in RNA interference by forming RNA duplexes with complementary antisense microsatellite sequences, which lead to quite a few CNMS genes whose transcripts are undetectable in <it>Arabidopsis </it>leaves.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Microsatellites (CT)<sub>n</sub>/(GA)<sub>n </sub>and (CTT)<sub>n</sub>/(GAA)<sub>n </sub>are preferentially associated the 5' noncoding regions in the <it>Arabidopsis </it>genome. Parts of them are conserved among the homologous genes and appear to be ancient. The computational prediction and gene expression analysis indicated that CNMSs (CT)<sub>n</sub>/(GA)<sub>n </sub>and (CTT)<sub>n</sub>/(GAA)<sub>n </sub>acted as regulatory elements involved in light and salicylic acid responses. From our analysis, the presence of CT/GA and CTT/GAA repeats in regulatory regions may be particularly useful as a guide for further experiments of plant regulatory networks in response to environmental stimulus.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Plant materials</p>
            </st>
            <p>The <it>Arabidopsis </it>plants were grown in soil in a growth chamber at 20&#176;C with 8 hours of light for 40 days. Plants were sprayed to run-off with 1 mM salicylic acid in 0.5% dimethyl sulfoxide (DMSO) for different time scales. One, four, twelve and forty-eight hours post-treatment, leaves were cut and harvested respectively, quick-frozen in liquid nitrogen, then stored at -80&#176;C. Total RNA was later extracted using Plant RNA Mini Kit (Watson Biotechnologies INC., China).</p>
         </sec>
         <sec>
            <st>
               <p>Sequence data sources</p>
            </st>
            <p>The annotated sequences of the five chromosomes of <it>Arabidopsis </it>(accession numbers: <ext-link ext-link-type="gen" ext-link-id="NC_003070">NC_003070</ext-link>, <ext-link ext-link-type="gen" ext-link-id="NC_003071">NC_003071</ext-link>, <ext-link ext-link-type="gen" ext-link-id="NC_003074">NC_003074</ext-link>, <ext-link ext-link-type="gen" ext-link-id="NC_003075">NC_003075</ext-link>, and <ext-link ext-link-type="gen" ext-link-id="NC_003076">NC_003076</ext-link>, updated 25-JAN-2005) were downloaded from the Genomes Division of GenBank <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>. Intergenic regions were defined as being a part of DNA from the end of the last exon of one gene to the beginning of the first exon of the following gene. A set of 16223 full-length cDNA sequences containing both 5' and 3'UTRs for <it>Arabidopsis </it>was extracted from the TAIR database <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. The preliminary sequences of <it>Brassica </it>genome were obtained from The Institute for Genomic Research website <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of orthologous and paralogous gene pairs</p>
            </st>
            <p>To identify putative <it>Arabidopsis-Brassica </it>orthologous gene sets, each preliminary sequence from <it>Brassica </it>was searched against 1-kb sequences (fragments from the position -500 to +500 relative to the translation initiation) of all genes from <it>Arabidopsis </it>using BLASTN <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> and then the fragments from <it>Brassica </it>were clustered according to the best match gene of the <it>Arabidopsis </it>genome. Conversely, each 1-kb gene sequence from <it>Arabidopsis </it>was searched against the contigs from <it>Brassica</it>. Two sequences were defined as orthologs if each of them was the best hit of the other in the aligned regions and if the expect value (E) was &lt;le-10. A list of the identified <it>Arabidopsis-Brassica </it>orthologs in the study is provided as supplementary data [see <supplr sid="S2">Additional file 2</supplr>].</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>List of the orthologs between <it>Arabidopsis </it>and <it>Brassica</it>.</p>
               </text>
               <file name="1471-2164-7-323-S2.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>For identifying the paralogous gene pairs from a recently common ancestor in the <it>Arabidopsis </it>genome, each annotated coding sequence was searched against all other coding sequences using BLASTN. The best pair was considered significant if each of them was the best hit of the other and the expect value was &lt;le-10. A file of the list of the paralogous gene pairs is included as supplementary data [see <supplr sid="S3">Additional file 3</supplr>]. To avoid the negative conservation of microsatellites caused by the effects of insufficient randomizing mutations, the tandemly repeated gene pairs separated by less than 25 intermediate genes were ignored in further analysis.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>List of the paralogous gene pairs in the <it>Arabidopsis </it>genome.</p>
               </text>
               <file name="1471-2164-7-323-S3.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Microsatellite detection</p>
            </st>
            <p>Microsatellites were found in sequences using the modified Sputnik repeat-finder <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. Di-and trinucleotide repeats were identified when a total size of at least 12-bp, allowing up to about 10% deviation from a perfect repeat. Repeat motifs consisting of different frames (e.g. GAA, AGA and AAG) were regarded as the same type of repeat.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of CNMSs</p>
            </st>
            <p>Because gene fragments of <it>Brassica </it>were derived from preliminary contigs with no annotated open reading frames, each pair of <it>Arabidopsis-Brassica </it>sequences were aligned using DiAlign2 with translation option to identify the 5' noncoding sequences and coding regions in the <it>Brassica </it>orthologs <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. The 5' noncoding sequence pairs were aligned using DiAlign2 for finding conserved microsatellites. To exclude nonspecific alignments, a stringent threshold parameter of 3 was used. The CNMSs were identified when the corresponding loci had at least 6-bp overlapping sequences between the aligned microsatellite sequences.</p>
         </sec>
         <sec>
            <st>
               <p>Selection of random data sets</p>
            </st>
            <p>To ensure that CNMSs were not to occur by chance, we used two different datasets of random pairs as negative controls to validate the results. One control dataset contained 1000 random pairs of 500-bp upstream noncoding sequences in the <it>Arabidopsis </it>genome, and another control dataset of 1000 pairs was randomly generated from the 500-bp sequences of <it>Arabidopsis </it>genomic DNA fragments. The reference dataset of equal data size was randomly selected from the 500-bp paralogous noncoding sequence pairs in <it>Arabidopsis</it>. The 1000 paralogous pairs, the 1000 shuffled pairs of noncoding sequences and the 1000 random pairs of genomic sequences were respectively referred as dataset 1, dataset 2 and dataset 3 in further analysis. Similarly, three corresponding datasets of <it>Arabidopsis </it>and <it>Brassica </it>sequence pairs were generated with the same data size. The dataset 1 consisted of 1000 <it>Arabidopsis-Brassica </it>orthologous pairs of 5' noncoding sequences, and the dataset 2 contained 1000 random pairs of <it>Arabidopsis </it>and <it>Brassica </it>upstream noncoding sequences, and the dataset 3 of 1000 pairs was randomly generated from the <it>Arabidopsis </it>and <it>Brassica </it>genomic DNA sequences. The same criteria of CNMS detection was applied in the test. Occurrences of CNMSs were analyzed in analogous manner for 10 different random sets with equal data size.</p>
         </sec>
         <sec>
            <st>
               <p>Estimation of duplication and speciation time</p>
            </st>
            <p>We used the level of synonymous substitution of CNMS associated coding sequences to estimate the <it>Ks </it>of CNMSs. For each pair of CNMS associated genes, the two protein sequences were aligned by ClustalW, and the resulting alignment was then used as a guide to align the nucleotide sequences <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. After removing gaps, the level of synonymous substitution was estimated using the yn00 program in PAML <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The time of divergence (T), between two sequences was calculated from this as T = <it>Ks</it>/2&#955;, where <it>Ks </it>is the fraction of synonymous substitutions per synonymous site and &#955; is the mean rate of synonymous substitution. The estimate value for &#955; in dicots is 1.5 synonymous substitutions per 10<sup>8 </sup>years <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Estimation of gene expression level</p>
            </st>
            <p>Gene expression level was estimated using the data from the Massively Parallel Signature Sequencing (MPSS) database of <it>Arabidopsis </it><abbrgrp><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr></abbrgrp>. The MPSS data of three different libraries was generated from untreated leaves and treated leaves 4 and 52 hours after salicylic acid treatment, respectively. For the three libraries, a total of 9,081,200 17-bp signatures were obtained in multiple sequencing runs and in two sequencing frames. The abundance for each signature was normalized to transcripts per million (TPM) to facilitate comparisons across libraries.</p>
            <p>RT-PCR of <it>Arabidopsis </it>CNMS associated genes was conducted using the one-step RNA PCR kit (TaKaRa) with gene specific primers [see <supplr sid="S4">Additional file 4</supplr>]. The 0.5 &#956;g total RNA was used as the template to be amplified with the following program: an initial 50&#176;C for 30 min and 94&#176;C for 2 min, followed by 25 cycles of 94&#176;C for 30s, 54&#176;C for 30s and 72&#176;C for 1 min. The house-keeping gene <it>actin</it>2 (At3gl8780) was used as an internal control in RT-PCR reaction.</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>The primer sequences used for RT-PCR.</p>
               </text>
               <file name="1471-2164-7-323-S4.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>ZL designed and conducted the study on microsatellite detection, identification of CNMSs, comparative genome analysis, evolution, and drafted the manuscript. ZK and ZF provided the plant materials and participated in gene expression analysis. CY performed database searches. WJ and ZY participated in data analysis and manuscript revision. SX and TK participated in research design and in the drafting of the manuscript. All authors read and approved the final manuscript</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This research was supported by National Sciences Foundation of China (No. 30600348), and China '973' project. Preliminary sequence data was obtained from The Institute for Genomic Research website. Sequencing of <it>Brassica oleracea </it>was funded by the "National Science Foundation".</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Microsatellites in different eukaryotic genomes: survey and analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Toth</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gaspari</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>967</fpage>
            <lpage>981</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310925</pubid>
                  <pubid idtype="pmpid" link="fulltext">10899146</pubid>
                  <pubid idtype="doi">10.1101/gr.10.7.967</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Slipped-strand mispairing: a major mechanism for DNA sequence evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Levinson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gutman</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1987</pubdate>
            <volume>4</volume>
            <fpage>203</fpage>
            <lpage>221</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3328815</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Microsatellite spreading in the human genome: evolutionary mechanisms and structural implications</p>
            </title>
            <aug>
               <au>
                  <snm>Nadir</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Margalit</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Gallily</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ben-Sasson</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1996</pubdate>
            <volume>93</volume>
            <fpage>6470</fpage>
            <lpage>6475</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">39047</pubid>
                  <pubid idtype="pmpid" link="fulltext">8692839</pubid>
                  <pubid idtype="doi">10.1073/pnas.93.13.6470</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Morgante</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hanafey</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Powell</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>194</fpage>
            <lpage>200</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng822</pubid>
                  <pubid idtype="pmpid" link="fulltext">11799393</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A novel feature of microsatellites in plants: a distribution gradient along the direction of transcription</p>
            </title>
            <aug>
               <au>
                  <snm>Fujimori</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Washio</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Higo</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ohtomo</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Murakami</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Matsubara</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kawai</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hayashizaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kikuchi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tomita</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>2003</pubdate>
            <volume>554</volume>
            <fpage>17</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0014-5793(03)01041-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">14596907</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Lawson</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>R14</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1431726</pubid>
                  <pubid idtype="pmpid" link="fulltext">16507170</pubid>
                  <pubid idtype="doi">10.1186/gb-2006-7-2-r14</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Preference of simple sequence repeats in coding and non-coding regions of <it>Arabidopsis thaliana</it></p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Yuan</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>ZG</fnm>
               </au>
               <au>
                  <snm>Cao</snm>
                  <fnm>YF</fnm>
               </au>
               <au>
                  <snm>Miao</snm>
                  <fnm>ZQ</fnm>
               </au>
               <au>
                  <snm>Qian</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>KX</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>1081</fpage>
            <lpage>1086</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth043</pubid>
                  <pubid idtype="pmpid" link="fulltext">14764542</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Some microsatellites may act as novel polymorphic <it>cis</it>-regulatory elements through transcription factor binding</p>
            </title>
            <aug>
               <au>
                  <snm>Iglesias</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Kindlund</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Tammi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wadelius</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2004</pubdate>
            <volume>341</volume>
            <fpage>149</fpage>
            <lpage>165</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2004.06.035</pubid>
                  <pubid idtype="pmpid" link="fulltext">15474298</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Microsatellite instability regulates transcription factor binding and gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Martin</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Makepeace</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Hood</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Moxon</snm>
                  <fnm>ER</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>102</volume>
            <fpage>3800</fpage>
            <lpage>3804</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.0406805102</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A developmentally regulated GAGA box-binding factor and Sp1 are required for transcription of the hsp70.1 gene at the onset of mouse zygotic genome activation</p>
            </title>
            <aug>
               <au>
                  <snm>Bevilacqua</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fiorenza</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Mangia</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>2000</pubdate>
            <volume>127</volume>
            <fpage>1541</fpage>
            <lpage>1551</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10704399</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>The MCP silencer of the Drosophila Abd-B gene requires both pleiohomeotic and GAGA factor for the maintenance of repression</p>
            </title>
            <aug>
               <au>
                  <snm>Busturia</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lloyd</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bejarano</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Zavortink</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Xin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Sakonju</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>2001</pubdate>
            <volume>128</volume>
            <fpage>2163</fpage>
            <lpage>2173</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11493537</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Identification of a soybean protein that interacts with GAGA element dinucleotide repeat DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Sangwan</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>O'Brian</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2002</pubdate>
            <volume>129</volume>
            <fpage>1788</fpage>
            <lpage>1794</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">166767</pubid>
                  <pubid idtype="pmpid" link="fulltext">12177492</pubid>
                  <pubid idtype="doi">10.1104/pp.002618</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The GA octodinucleotide repeat binding factor BBR participates in the transcriptional regulation of the homeobox gene Bkn3</p>
            </title>
            <aug>
               <au>
                  <snm>Santi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Stile</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Berendzen</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wanke</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Roig</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Pozzi</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rohde</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Salamini</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Plant J</source>
            <pubdate>2003</pubdate>
            <volume>34</volume>
            <fpage>813</fpage>
            <lpage>826</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-313X.2003.01767.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12795701</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Definition and interactions of a positive regulatory element of the Arabidopsis INNER NO OUTER promoter</p>
            </title>
            <aug>
               <au>
                  <snm>Meister</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Monfared</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Gallagher</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Kraft</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Gasser</snm>
                  <fnm>CS</fnm>
               </au>
            </aug>
            <source>Plant J</source>
            <pubdate>2004</pubdate>
            <volume>37</volume>
            <fpage>426</fpage>
            <lpage>438</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-313X.2003.01971.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">14731261</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>BASIC PENTACYSTEINE1, a GA binding protein that induces conformational changes in the regulatory region of the homeotic Arabidopsis gene SEEDSTICK</p>
            </title>
            <aug>
               <au>
                  <snm>Kooiker</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Airoldi</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Losa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Manzotti</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Finzi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kater</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Colombo</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>2005</pubdate>
            <volume>17</volume>
            <fpage>722</fpage>
            <lpage>729</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1069694</pubid>
                  <pubid idtype="pmpid" link="fulltext">15722463</pubid>
                  <pubid idtype="doi">10.1105/tpc.104.030130</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The 5'-untranslated region of the ntp303 gene strongly enhances translation during pollen tube growth, but not during pollen maturation</p>
            </title>
            <aug>
               <au>
                  <snm>Hulzink</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>de Groot</snm>
                  <fnm>PF</fnm>
               </au>
               <au>
                  <snm>Croes</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Quaedvlieg</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Twell</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Wullems</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Van Herpen</snm>
                  <fnm>MM</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2002</pubdate>
            <volume>129</volume>
            <fpage>342</fpage>
            <lpage>353</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155897</pubid>
                  <pubid idtype="pmpid" link="fulltext">12011364</pubid>
                  <pubid idtype="doi">10.1104/pp.001701</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Microsatellites in starch-synthesizing genes in relation to starch physicochemical properties in waxy rice (Oryza sativa L.)</p>
            </title>
            <aug>
               <au>
                  <snm>Bao</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Corke</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Theor Appl Genet</source>
            <pubdate>2002</pubdate>
            <volume>105</volume>
            <fpage>898</fpage>
            <lpage>905</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00122-002-1049-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">12582915</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Conserved noncoding sequences are reliable guides to regulatory elements</p>
            </title>
            <aug>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>369</fpage>
            <lpage>372</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02081-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">10973062</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Conserved noncoding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Guo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Moose</snm>
                  <fnm>SP</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>2003</pubdate>
            <volume>15</volume>
            <fpage>1143</fpage>
            <lpage>1158</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">153722</pubid>
                  <pubid idtype="pmpid" link="fulltext">12724540</pubid>
                  <pubid idtype="doi">10.1105/tpc.010181</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Conserved noncoding sequences in the grasses</p>
            </title>
            <aug>
               <au>
                  <snm>Inada</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Bashir</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Ko</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Goff</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Freeling</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>2030</fpage>
            <lpage>2041</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403677</pubid>
                  <pubid idtype="pmpid" link="fulltext">12952874</pubid>
                  <pubid idtype="doi">10.1101/gr.1280703</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing</p>
            </title>
            <aug>
               <au>
                  <snm>Hong</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Hamaguchi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Busch</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Weigel</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>2003</pubdate>
            <volume>15</volume>
            <fpage>1296</fpage>
            <lpage>1309</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">156367</pubid>
                  <pubid idtype="pmpid" link="fulltext">12782724</pubid>
                  <pubid idtype="doi">10.1105/tpc.009548</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Using cauliflower to find conserved non-coding regions in Arabidopsis</p>
            </title>
            <aug>
               <au>
                  <snm>Colinas</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Birnbaum</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Benfey</snm>
                  <fnm>PN</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2002</pubdate>
            <volume>129</volume>
            <fpage>451</fpage>
            <lpage>454</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1540231</pubid>
                  <pubid idtype="pmpid" link="fulltext">12068091</pubid>
                  <pubid idtype="doi">10.1104/pp.002501</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>YW</fnm>
               </au>
               <au>
                  <snm>Lai</snm>
                  <fnm>KN</fnm>
               </au>
               <au>
                  <snm>Tai</snm>
                  <fnm>PY</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1999</pubdate>
            <volume>48</volume>
            <fpage>597</fpage>
            <lpage>604</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/PL00006502</pubid>
                  <pubid idtype="pmpid" link="fulltext">10198125</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome</p>
            </title>
            <aug>
               <au>
                  <snm>Blanc</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hokamp</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>KH</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>137</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">420368</pubid>
                  <pubid idtype="pmpid" link="fulltext">12566392</pubid>
                  <pubid idtype="doi">10.1101/gr.751803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae)</p>
            </title>
            <aug>
               <au>
                  <snm>Koch</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Haubold</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mitchell-Olds</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>1483</fpage>
            <lpage>1498</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11018155</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The GATA family of transcription factors in Arabidopsis and rice</p>
            </title>
            <aug>
               <au>
                  <snm>Reyes</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Muro-Pastor</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Florencio</snm>
                  <fnm>FJ</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2004</pubdate>
            <volume>134</volume>
            <fpage>1718</fpage>
            <lpage>1732</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">419845</pubid>
                  <pubid idtype="pmpid" link="fulltext">15084732</pubid>
                  <pubid idtype="doi">10.1104/pp.103.037788</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology</p>
            </title>
            <aug>
               <au>
                  <snm>Soltis</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Soltis</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Chase</snm>
                  <fnm>MW</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>402</volume>
            <fpage>402</fpage>
            <lpage>404</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/46528</pubid>
                  <pubid idtype="pmpid" link="fulltext">10586878</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Gene Ontology: tool for the unification of biology</p>
            </title>
            <aug>
               <au>
                  <cnm>The Gene Ontology Consortium</cnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>25</fpage>
            <lpage>29</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/75556</pubid>
                  <pubid idtype="pmpid" link="fulltext">10802651</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>GOToolBox: functional investigation of gene datasets based on Gene Ontology</p>
            </title>
            <aug>
               <au>
                  <snm>Martin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brun</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Remy</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mouren</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Thieffry</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Jacq</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R101</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545796</pubid>
                  <pubid idtype="pmpid" link="fulltext">15575967</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-12-r101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>PlantCARE, a database of plant <it>cis</it>-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Lescot</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dehais</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Thijs</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Marchal</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Van de Peer</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Rouze</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rombauts</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>325</fpage>
            <lpage>327</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99092</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752327</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.325</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>PlantCARE database</p>
            </title>
            <url>http://bioinformatics.psb.ugent.be/webtools/plantcare/html/</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Plant <it>cis</it>-acting regulatory DNA elements (PLACE) database: 1999</p>
            </title>
            <aug>
               <au>
                  <snm>Higo</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ugawa</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Iwamoto</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Korenaga</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>297</fpage>
            <lpage>300</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148163</pubid>
                  <pubid idtype="pmpid" link="fulltext">9847208</pubid>
                  <pubid idtype="doi">10.1093/nar/27.1.297</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>PLACE database</p>
            </title>
            <url>http://www.dna.affrc.go.jp/PLACE/signalscan.html</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>The spinach AtpC and AtpD genes contain elements for light-regulated, plastid-dependent and organ-specific expression in the vicinity of the transcription start sites</p>
            </title>
            <aug>
               <au>
                  <snm>Bolle</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kusnetsov</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Herrmann</snm>
                  <fnm>RG</fnm>
               </au>
               <au>
                  <snm>Oelmuller</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Plant J</source>
            <pubdate>1996</pubdate>
            <volume>9</volume>
            <fpage>21</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-313X.1996.09010021.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">8580971</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Ancestral multipartite units in light-responsive plant promoters have structural features correlating with specific phototransduction pathways</p>
            </title>
            <aug>
               <au>
                  <snm>Arguello-Astorga</snm>
                  <fnm>GR</fnm>
               </au>
               <au>
                  <snm>Herrera-Estrella</snm>
                  <fnm>LR</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>1996</pubdate>
            <volume>112</volume>
            <fpage>1151</fpage>
            <lpage>1166</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">158042</pubid>
                  <pubid idtype="pmpid" link="fulltext">8938415</pubid>
                  <pubid idtype="doi">10.1104/pp.112.3.1151</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The cauliflower mosaic virus 35S promoter extends into the transcribed region</p>
            </title>
            <aug>
               <au>
                  <snm>Pauli</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rothnie</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Hohn</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Virol</source>
            <pubdate>2004</pubdate>
            <volume>78</volume>
            <fpage>12120</fpage>
            <lpage>12128</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">525061</pubid>
                  <pubid idtype="pmpid" link="fulltext">15507598</pubid>
                  <pubid idtype="doi">10.1128/JVI.78.22.12120-12128.2004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Localization of light-inducible and tissue-specific regions of the spinach ribulose bisphosphate carboxylase/oxygenase (rubisco) activase promoter in transgenic tobacco plants</p>
            </title>
            <aug>
               <au>
                  <snm>Orozco</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Ogren</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1993</pubdate>
            <volume>23</volume>
            <fpage>1129</fpage>
            <lpage>1138</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00042347</pubid>
                  <pubid idtype="pmpid">8292778</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Salicylic acid-inducible binding of a tobacco nuclear protein to a 10 bp sequence which is highly conserved amongst stress-inducible genes</p>
            </title>
            <aug>
               <au>
                  <snm>Goldsbrough</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Albrecht</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Stratford</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Plant J</source>
            <pubdate>1993</pubdate>
            <volume>3</volume>
            <fpage>563</fpage>
            <lpage>571</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-313X.1993.03040563.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">8220463</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Rapid induction by wounding and bacterial infection of an S gene family receptor-like kinase gene in Brassica oleracea</p>
            </title>
            <aug>
               <au>
                  <snm>Pastuglia</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Roby</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dumas</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cock</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>1997</pubdate>
            <volume>9</volume>
            <fpage>49</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">156900</pubid>
                  <pubid idtype="pmpid" link="fulltext">9014364</pubid>
                  <pubid idtype="doi">10.1105/tpc.9.1.49</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Transcriptional similarities, dissimilarities, and conservation of <it>cis</it>-elements in duplicated genes of Arabidopsis</p>
            </title>
            <aug>
               <au>
                  <snm>Haberer</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hindemitt</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Meyers</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Mayer</snm>
                  <fnm>KF</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2004</pubdate>
            <volume>136</volume>
            <fpage>3009</fpage>
            <lpage>3022</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">523363</pubid>
                  <pubid idtype="pmpid" link="fulltext">15489284</pubid>
                  <pubid idtype="doi">10.1104/pp.104.046466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Arabidopsis thaliana GATA factors: organisation, expression and DNA-binding characteristics</p>
            </title>
            <aug>
               <au>
                  <snm>Teakle</snm>
                  <fnm>GR</fnm>
               </au>
               <au>
                  <snm>Manfield</snm>
                  <fnm>IW</fnm>
               </au>
               <au>
                  <snm>Graham</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Gilmartin</snm>
                  <fnm>PM</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>50</volume>
            <fpage>43</fpage>
            <lpage>57</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1016062325584</pubid>
                  <pubid idtype="pmpid" link="fulltext">12139008</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Microarray analysis of diurnal and circadian-regulated genes in Arabidopsis</p>
            </title>
            <aug>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Landgraf</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Accerbi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Larson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wisman</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>2001</pubdate>
            <volume>13</volume>
            <fpage>113</fpage>
            <lpage>123</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102203</pubid>
                  <pubid idtype="pmpid" link="fulltext">11158533</pubid>
                  <pubid idtype="doi">10.1105/tpc.13.1.113</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Organ-specific expression of Arabidopsis genome during development</p>
            </title>
            <aug>
               <au>
                  <snm>Ma</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Jiao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Deng</snm>
                  <fnm>XW</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2005</pubdate>
            <volume>138</volume>
            <fpage>80</fpage>
            <lpage>91</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1104164</pubid>
                  <pubid idtype="pmpid" link="fulltext">15888681</pubid>
                  <pubid idtype="doi">10.1104/pp.104.054783</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Systemic Acquired Resistance</p>
            </title>
            <aug>
               <au>
                  <snm>Ryals</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Neuenschwander</snm>
                  <fnm>UH</fnm>
               </au>
               <au>
                  <snm>Willits</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Molina</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Steiner</snm>
                  <fnm>HY</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>MD</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>1996</pubdate>
            <volume>8</volume>
            <fpage>1809</fpage>
            <lpage>1819</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">161316</pubid>
                  <pubid idtype="pmpid" link="fulltext">12239363</pubid>
                  <pubid idtype="doi">10.1105/tpc.8.10.1809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Analysis of the genome sequence of the flowering plant Arabidopsis thaliana</p>
            </title>
            <aug>
               <au>
                  <cnm>The <it>Arabidopsis </it>Genome Initiative</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>408</volume>
            <fpage>796</fpage>
            <lpage>815</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35048692</pubid>
                  <pubid idtype="pmpid" link="fulltext">11130711</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Arabidopsis genome in GenBank</p>
            </title>
            <url>ftp://ftp.ncbi.nih.gov/genomes/Arabidopsis_thaliana</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>TAIR database</p>
            </title>
            <url>ftp://ftp.arabidopsis.org/home/tair/home/tair/Sequences/</url>
         </bibl>
         <bibl id="B48">
            <title>
               <p>TIGR website</p>
            </title>
            <url>ftp://ftp.tigr.org/pub/data/b_oleracea/wgs_seq/</url>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>The motified sputnik repeat-finder</p>
            </title>
            <url>http://capb.dbi.udel.edu/main/tools.htm</url>
         </bibl>
         <bibl id="B51">
            <title>
               <p>DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <fpage>211</fpage>
            <lpage>218</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/15.3.211</pubid>
                  <pubid idtype="pmpid" link="fulltext">10222408</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308517</pubid>
                  <pubid idtype="pmpid" link="fulltext">7984417</pubid>
                  <pubid idtype="doi">10.1093/nar/22.22.4673</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Molecular systematics of the Brassicaceae: Evidence from coding plastidic matK and nuclear Chs sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Koch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Haubold</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mitchell-Olds</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Am J Bot</source>
            <pubdate>2001</pubdate>
            <volume>88</volume>
            <fpage>534</fpage>
            <lpage>544</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11250830</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Arabidopsis MPSS: an online resource for quantitative expression analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Meyers</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Vu</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Tej</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Edberg</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Matvienko</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tindell</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2004</pubdate>
            <volume>135</volume>
            <fpage>801</fpage>
            <lpage>813</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">514116</pubid>
                  <pubid idtype="pmpid" link="fulltext">15173564</pubid>
                  <pubid idtype="doi">10.1104/pp.104.039495</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Arabidopsis MPSS database</p>
            </title>
            <url>http://mpss.udel.edu/at</url>
         </bibl>
      </refgrp>
   </bm>
</art>
