<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2148-5-30</ui>
   <ji>1471-2148</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p><it>Diaspora</it>, a large family of <it>Ty3</it>-<it>gypsy </it>retrotransposons in <it>Glycine max</it>, is an envelope-less member of an endogenous plant retrovirus lineage</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Yano</snm>
               <mi>T</mi>
               <fnm>Sho</fnm>
               <insr iid="I1"/>
               <email>syano@uchicago.edu</email>
            </au>
            <au id="A2">
               <snm>Panbehi</snm>
               <fnm>Bahman</fnm>
               <insr iid="I2"/>
               <email>bpanbehi@wisc.edu</email>
            </au>
            <au id="A3">
               <snm>Das</snm>
               <fnm>Arpita</fnm>
               <insr iid="I3"/>
               <email>arpitadas@netscape.net</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Laten</snm>
               <mi>M</mi>
               <fnm>Howard</fnm>
               <insr iid="I4"/>
               <email>hlaten@luc.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Molecular Genetics and Cell Biology, University of Chicago, Chicago, IL 60637 USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biomolecular Chemistry, University of Wisconsin, Madison, WI 53706 USA</p>
            </ins>
            <ins id="I3">
               <p>Neuronautics, Inc., Evanston, IL 60201 USA</p>
            </ins>
            <ins id="I4">
               <p>Department of Biology, Loyola University Chicago, Chicago, IL 60626 USA</p>
            </ins>
         </insg>
         <source>BMC Evolutionary Biology</source>
         <issn>1471-2148</issn>
         <pubdate>2005</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>30</fpage>
         <url>http://www.biomedcentral.com/1471-2148/5/30</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15876351</pubid>
               <pubid idtype="doi">10.1186/1471-2148-5-30</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>23</day>
               <month>12</month>
               <year>2004</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>05</day>
               <month>5</month>
               <year>2005</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>05</day>
               <month>5</month>
               <year>2005</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2005</year>
         <collab>Yano et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The chromosomes of higher plants are littered with retrotransposons that, in many cases, constitute as much as 80% of plant genomes. Long terminal repeat retrotransposons have been especially successful colonizers of the chromosomes of higher plants and examinations of their function, evolution, and dispersal are essential to understanding the evolution of eukaryotic genomes. In soybean, several families of retrotransposons have been identified, including at least two that, by virtue of the presence of an envelope-like gene, may constitute endogenous retroviruses. However, most elements are highly degenerate and are often sequestered in regions of the genome that sequencing projects initially shun. In addition, finding potentially functional copies from genomic DNA is rare. This study provides a mechanism to surmount these issues to generate a consensus sequence that can then be functionally and phylogenetically evaluated.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p><it>Diaspora </it>is a multicopy member of the <it>Ty3</it>-<it>gypsy</it>-like family of LTR retrotransposons and comprises at least 0.5% of the soybean genome. Although the <it>Diaspora </it>family is highly degenerate, and with the exception of this report, is not represented in the Genbank nr database, a full-length consensus sequence was generated from short overlapping sequences using a combination of experimental and <it>in silico </it>methods. <it>Diaspora </it>is 11,737 bp in length and contains a single 1892-codon ORF that encodes a gag-pol polyprotein. Phylogenetic analysis indicates that it is closely related to <it>Athila </it>and <it>Calypso </it>retroelements from <it>Arabidopsis </it>and soybean, respectively. These in turn form the framework of an endogenous retrovirus lineage whose members possess an envelope-like gene. <it>Diaspora </it>appears to lack any trace of this coding region.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>A combination of empirical sequencing and retrieval of unannotated Genome Survey Sequence database entries was successfully used to construct a full-length representative of the <it>Diaspora </it>family in <it>Glycine max. Diaspora </it>is presently the only fully characterized member of a lineage of putative plant endogenous retroviruses that contains virtually no trace of an extra coding region. The loss of an envelope-like coding domain suggests that non-infectious retrotransposons could swiftly evolve from infectious retroviruses, possibly by anomalous splicing of genomic RNA.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="refman"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Eukaryotic genomes are littered with dozens to tens of thousands of copies of reverse transcriptase (RT)-based retroelements <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. Among these are a diverse collection of elements characterized by long terminal repeats (LTR) that include the <it>Ty1-copia</it>-like and <it>Ty3</it>-<it>gypsy</it>-like retrotransposon families, endogenous retroviruses, and mammalian lentiviruses <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. LTR retrotransposons have been especially successful colonizers of the chromosomes of higher plants where they constitute as much as 80% of these genomes <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. In soybean, several families of LTR retrotransposons have been identified <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, including at least two that possess an <it>env</it>-like ORF and resemble mammalian endogenous retroviruses <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>.</p>
         <p>The evolutionary relationship between retrotransposons and retroviruses has been well established by phylogenetic tree constructions. However, the branches linking these groups are, not unexpectedly, long ones <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B10">10</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. The major structural difference between retrotransposon and retrovirus genomes is the presence of an envelope gene (<it>env</it>) in the latter. Retroviral envelope proteins sponsor receptor binding, cell fusion, and particle budding, and contain transmembrane and coiled-coil domains<abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. While the <it>de novo </it>acquisition of an env-like coding region by transduction could conceivably occur in a single step, the functional evolution of such a coding domain might be expected to occur over considerable stretches of evolutionary time <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B17">17</abbr></abbrgrp>. But could the loss of such a coding domain occur in a single step? This question is far from implausible, considering that all retroelement genomes are RNA transcripts and many are substrates for splicing reactions. A single event of anomalous packaging of an improperly spliced subgenomic RNA, followed by reverse transcription could lead to an <it>env</it>-less element in an evolutionary blink of an eye.</p>
         <p>In the present study, the characterization of the soybean retrotransposon, <it>Diaspora</it>, provides evidence for a relatively rapid transition between enveloped retroelements and non-enveloped retrotransposons. Our phylogenetic analysis suggests that the <it>Diaspora </it>retrotransposon emerged from a lineage of plant endogenous retroviruses that possesses an <it>env</it>-like gene <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p><it>Diaspora </it>was initially encountered in a genomic clone as a 5'and 3'-truncated copy nested between copies of another LTR retroelement (Laten, unpublished). Using both direct sequencing and <it>in silico </it>analysis, we generated a full-length consensus copy of <it>Diaspora </it>and confirmed 1) its membership in the <it>Ty3-gypsy</it>-like family of LTR retrotransposons and 2) its status as the only member of an endogenous retrovirus lineage lacking an <it>env</it>-like gene. The <it>in silico </it>procedure can be extended to construct consensus sequences for other repetitive DNA families from degenerate elements and from single-pass-read genome survey sequences, provided the copy numbers are sufficiently high and constitute a robust collection of overlapping sequences.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>AF095730 is related to <it>gypsy </it>group LTR retrotransposons</p>
            </st>
            <p>Sequencing of subclone pAMH3C [GenBank: U96295] initially led to the characterization of an <it>env</it>-like gene and the 3' LTR of the <it>SIRE</it>1 endogenous retrovirus belonging to the <it>Ty1-copia </it>group of retroelements <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. The DNA adjacent to the <it>SIRE</it>1 LTR constituted an initially unidentified 544 bp ORF that gave no hits in BLASTn or BLASTx searches. However, when the sequence of the adjacent subclones, pAMH3G and pAMH3D, were addended to pAMH3C and assembled into a contig [Genbank: AF095730], a single, 1383-codon ORF with a nonsense mutation at position 2604, and frameshifts at 2813 and 3139 was generated (Laten and Das, unpublished). The frameshifts occurred in runs of six thymidines and five adenosines, respectively. When the frameshifts were adjusted and the conceptual translation was used to query the Genbank protein database, numerous high scoring hits to retrotransposon reverse transcriptases and integrases were obtained (Laten and Das, unpublished).</p>
            <p>The large collection of sequences of reverse transcriptases and integrases that were retrieved, most as contiguous polyproteins, all belonged to the <it>Ty3-gypsy </it>group of LTR retrotransposons. While the BLASTp search identified AF095730 homology to numerous accessions from residue 135 to the carboxyl terminal, no sequences with similarity to the first 134 amino acids were found. The highest scoring hits were identified as <it>Athila</it>-like, related to the <it>Ty3</it>-<it>gypsy </it>group element from <it>A. thaliana </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp> that has subsequently been shown to be present in a wide range of plant genomes, including soybean, other dicots, and monocots <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. We have named the new soybean element family <it>Diaspora</it>.</p>
         </sec>
         <sec>
            <st>
               <p><it>Diaspora </it>is a multi-copy family</p>
            </st>
            <p>Because the <it>Diaspora </it>DNA in the original genomic clone was truncated at both ends, we initially probed the &#955;FIXII genomic library for additional <it>Diaspora </it>copies. Hybridization detected a few thousand positive plaques, confirming the moderately high copy number of this family. DNAs from a random sample of twenty positive clones were amplified using primers derived from the ends of AF095730 (PDIA01F/PDIA02R and PDIA03F/PDIA04R). No clone produced amplicons with both primer pairs, suggesting that all copies of <it>Diaspora </it>in these clones were either 5'- and/or 3'- truncated, or polymorphic at the primer sites (data not shown). We inferred that the library would not readily yield full-length elements. Retrospectively, this finding would have been anticipated had the unusual length of <it>Diaspora</it>, approaching that of the average insert size in &#955;FIXII, been known (see below). With the availability of BAC soybean genomic libraries with inserts in excess of 100 kb <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>, the possibility of isolating full-length <it>Diaspora </it>copies became a virtual certainty and the &#955; clones were abandoned in favor of BACs. Filters containing microarrays of BAC clones derived from <it>G. max </it>cv. Forest <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> were probed for the presence of <it>Diaspora</it>. Hundreds of clones hybridized to a <it>prot-rt </it>probe (pAMH3D) and based on the number of hybridizing clones in the library, we estimated that <it>Diaspora </it>represents at least 0.5% of the <it>G. max </it>genome.</p>
         </sec>
         <sec>
            <st>
               <p><it>Diaspora </it>family members are truncated and heterogeneous</p>
            </st>
            <p>DNA was recovered from twenty, randomly chosen BAC clones that hybridized to pAMH3D, and PCR-amplified using the primer pairs derived from the ends of AF095730 (PDIA01F/PDIA02R and PDIA03F/PDIA04R). Surprisingly, only five clones were amplified by both primer pairs, suggesting that many copies were either 5' or 3' truncated or markedly polymorphic. Truncation would be consistent with the characterization of disrupted and nested retrotransposons first reported in maize <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Of those containing both termini, none successfully served as templates for more than two additional PCR-amplifications using the complete set of AF095730-based primer pairs (PDIA5F through 13R). These findings suggested that <it>Diaspora </it>is a relatively heterogeneous family. This was confirmed by limited sequencing of &#955;FIXII [GenBank: AF095730 and AY656632-AY656653] and BAC [GenBank: AY656654-AY656662] clones using the primers listed in Table <tblr tid="T1">1</tblr>. A total of 15,433 nucleotides were sequenced, of which 7293 were non-overlapping. It appeared that sequencing individual members of the <it>Diaspora </it>family directly from genomic clones would not lead to satisfactory descriptions of functional coding or regulatory regions, so an <it>in silico </it>strategy for these characteristics was pursued.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Primers used for PCR amplification and DNA sequencing.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Oligomer</p>
                     </c>
                     <c ca="center">
                        <p>Sequence</p>
                     </c>
                     <c ca="center">
                        <p>Oligomer</p>
                     </c>
                     <c ca="center">
                        <p>Sequence</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA01F</p>
                     </c>
                     <c ca="left">
                        <p>AACCTCAACAGCAAAATCAACCA</p>
                     </c>
                     <c ca="left">
                        <p>PDIA12R</p>
                     </c>
                     <c ca="left">
                        <p>CACTTTGCGAGCTGTCCTTTGA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA02R</p>
                     </c>
                     <c ca="left">
                        <p>GAGGGCTGGACCATCTGAGGT</p>
                     </c>
                     <c ca="left">
                        <p>PDIA13F</p>
                     </c>
                     <c ca="left">
                        <p>TGCGGATTCACCCATTC</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA03F</p>
                     </c>
                     <c ca="left">
                        <p>TGGGCACATCGGACTGCTTAC</p>
                     </c>
                     <c ca="left">
                        <p>PDIA14R</p>
                     </c>
                     <c ca="left">
                        <p>CCAAAGACAACCCGATAAGGAG</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA04R</p>
                     </c>
                     <c ca="left">
                        <p>GACATGCCTTTCCAAAGACAACC</p>
                     </c>
                     <c ca="left">
                        <p>PDIA15F</p>
                     </c>
                     <c ca="left">
                        <p>TTCCTATCTCCTTCTTTGCTTT</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA05F</p>
                     </c>
                     <c ca="left">
                        <p>GGCCCAAGCAGACCATACA</p>
                     </c>
                     <c ca="left">
                        <p>PDIA16F</p>
                     </c>
                     <c ca="left">
                        <p>TTGCCCCATTGATTGCTTG</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA06R</p>
                     </c>
                     <c ca="left">
                        <p>TAAAAATCAACAGGGAAAATCAGT</p>
                     </c>
                     <c ca="left">
                        <p>PDIA17R</p>
                     </c>
                     <c ca="left">
                        <p>TTTCAAATCACAAAATGTCAAG</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA07F</p>
                     </c>
                     <c ca="left">
                        <p>TGTCTCCGCATTGATTGGTAAA</p>
                     </c>
                     <c ca="left">
                        <p>PDIA18R</p>
                     </c>
                     <c ca="left">
                        <p>TGTAAGTCAGATGGATTGCCA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA08R</p>
                     </c>
                     <c ca="left">
                        <p>ATTGGCTGTCGGAGATAGGATAAA</p>
                     </c>
                     <c ca="left">
                        <p>PDIA19R</p>
                     </c>
                     <c ca="left">
                        <p>GCTCCAAGGTCCATCACGA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA09F</p>
                     </c>
                     <c ca="left">
                        <p>AAACCAGTAAGACAGCCACAGAGA</p>
                     </c>
                     <c ca="left">
                        <p>PDIA20R</p>
                     </c>
                     <c ca="left">
                        <p>GGACATCCTCATCAGGGTATTG</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA10R</p>
                     </c>
                     <c ca="left">
                        <p>CAAGGACAGCCCCCAATG</p>
                     </c>
                     <c ca="left">
                        <p>PDIA21F</p>
                     </c>
                     <c ca="left">
                        <p>CATGGGTGCTTTGAGGGTAA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDIA11F</p>
                     </c>
                     <c ca="left">
                        <p>GAGGTGCGATCTTTTCTTGGTC</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p><it>Diaspora </it>sequences recovered by BLASTn queries</p>
            </st>
            <p>Prior to the initiation of plant genome sequencing projects, AF095730 was used to search Genbank for related sequences. At that time, BLASTp searches returned a sizeable collection of previously characterized pol polyproteins from <it>Ty3</it>-<it>gypsy</it>-like retrotransposons (Laten, unpublished). As more and more soybean BAC-end sequences were deposited in the GSS database <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>; J. Shultz, K. Meksem, J. Shetty, C. Town, H. Koo, J. Potter, K. Wakefield, H. Zhang, C. Wu and D. Lightfoot, unpublished] the growing robustness of our BLASTn results made it clear that <it>Diaspora </it>was a high copy-number retrotransposon and that the database hits derived exclusively from BAC ends might be assembled into a contiguous, full-length, consensus <it>Diaspora </it>sequence.</p>
            <p>Genbank sequences retrieved using sequentially selected segments of AF095730 as queries were assembled into an expanding contig. The BAC-end sequences ranged from 400 to 900 nucleotides in length. Hits with bit scores &#8805;200 were added to the contig and this cutoff value generated a manageable collection of sequences. Sequences anchored to the ends of AF095730 were used to extend the consensus beyond the 5' and 3' ends of AF095730 and the assembly of an expanding contig was launched. Primers generated from these flanking regions (PDIA15F through 21F) were also used to amplify and sequence additional regions from two of the full-length Diaspora candidates in BAC clones.</p>
            <p>Two hundred seven Genbank accessions, including thirty submissions from the present study, totaling 141,423 nucleotides were collected to generate the contig. To avoid bias, duplicate sequences from different accessions were purged from the alignment. There were only three positions for which a strict consensus nucleotide could not be assigned.</p>
            <p>Fig. <figr fid="F1">1</figr> is a histogram of the density distribution of sequences used to generate the contig. The average coverage across the length of the contig was 14.8 accessions, although inclusion of many additional sequences that met the scoring criterion in regions of high sequence conservation was not pursued. Because the initial soybean BAC libraries were created by <it>Eco</it>RI, <it>Hin</it>dIII, or <it>Bam</it>HI digestion, the local robustness of the assembly was dependent on the density of these sites in <it>Diaspora</it>. Sub-families that lacked a particular cleavage site would be under-represented. In contrast, the assembly of regions far from these sites was made possible by sub-families with additional recognition sites for one of these enzymes.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Histogram of local densities of Genbank Accessions used to construct a <it>Diaspora </it>consensus contig</p>
               </caption>
               <text>
                  <p>Histogram of local densities of Genbank Accessions used to construct a <it>Diaspora </it>consensus contig. Recognition sites of restriction enzymes used to generate BAC libraries are indicated. Restriction sites in () are found in &lt; 50% of sequences. The right LTR is not shown. The contig was assembled from the following Genbank Accessions: AF095730 (this study), AY656632-AY656656 (this study), AY656659-AY656662 (this study), AQ989187, AQ989208, AQ989232, AQ989271, AQ989295, AZ044709, AZ045083, AZ221405, AZ301361, AZ302029, AZ536637, AZ933330, AZ936131, BE611677, BH000863, BH000924, BH001187, BH023628, BH023632, BH023632, BH173556, BH405523, BH405626, BH405659, BH405669, BH610143, BH610157, BH610193, BH840834, BH854486, BH888573, BH897988, BH912698, BI974271, BU546431, CC062189, CC062259, CC062269, CC062279, CC062321, CC062333, CC062399, CC062412, CC062425, CC062501, CC062524, CC062576, CC062745, CC062865, CG811196, CG812831, CG813036, CG813244, CG813336, CG813336, CG813447, CG813495, CG813591, CG813669, CG813710, CG813854, CG813944, CG814001, CG814027, CG814297, CG814428, CG814537, CG814691, CG814705, CG814739, CG814773, CG814814, CG814837, CG814944, CG814960, CG815296, CG815349, CG815376, CG815566, CG815593, CG815931, CG815990, CG816077, CG816195, CG816437, CG816499, CG816820, CG816902, CG816924, CG816965, CG817175, CG817237, CG817248, CG817294, CG817426, CG817444, CG817647, CG817665, CG817749, CG817754, CG817777, CG817807, CG817873, CG817996, CG818405, CG818428, CG818443, CG818626, CG818673, CG818711, CG819087, CG819204, CG819222, CG819552, CG819604, CG819672, CG819766, CG819790, CG819813, CG819936, CG819977, CG820067, CG820103, CG820158, CG820299, CG820411, CG820560, CG820627, CG820654, CG820656, CG820670, CG820673, CG820702, CG820718, CG820816, CG820848, CG820850, CG820868, CG821026, CG821085, CG821093, CG821150, CG821179, CG821206, CG821219, CG821294, CG821311, CG821532, CG821597, CG821693, CG821710, CG821772, CG821963, CG822140, CG822195, CG822264, CG822361, CG822369, CG822426, CG822466, CG822466, CG822582, CG823113, CG823202, CG823294, CG823320, CG823499, CG823505, CG823511, CG823713, CG824266, CG824332, CG824372, CG824380, CG824407, CG824533, CG825062, CG825163, CG825591, CG825777, CG825811, CG825933, CG826013, CL867862, CL8811208, CL881708, CL882298, CL886562, CL891285, CL899081</p>
               </text>
               <graphic file="1471-2148-5-30-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Structure of <it>Diaspora</it></p>
            </st>
            <p>The length of the <it>Diaspora </it>consensus is 11,737 bp (Fig. <figr fid="F2">2</figr>), far longer than all but a handful of retrotransposons. The exceptional length of <it>Diaspora </it>is due primarily to the unusual length of its LTRs and the long gap between the upstream LTR and the <it>gag </it>start codon (Fig. <figr fid="F2">2</figr>). Like nearly all other retroelements, the LTRs terminate in TG...CA. The element is characterized by a contiguous 1892-codon ORF whose conceptual translation yields a single gag-pol polyprotein (Fig. <figr fid="F3">3</figr>) characteristic of <it>Ty3-gypsy</it>-like retrotransposons <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Not surprisingly, the consensus contains neither nonsense codons nor frameshifts. This translated ORF possesses core domains for gag (CDD17379), reverse transcriptase (CDD16610) and integrase (CDD25582). There is also a CX<sub>2</sub>CX<sub>4</sub>HX<sub>4</sub>C zinc finger motif in gag and a conserved protease catalytic domain motif, AMLDLGAS (Fig. <figr fid="F3">3</figr>). Interestingly, the first thirty amino acids of the translated ORF are not similar to the amino termini of any gag proteins in Genbank. Similarities to several gag proteins begin at position 31. Translation of the other five reading frames yielded no lengthy ORFs nor any similarities to any sequences in BLASTP searches.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>A. Consensus nucleotide sequence of <it>Diaspora</it></p>
               </caption>
               <text>
                  <p>A. Consensus nucleotide sequence of <it>Diaspora</it>. LTR in red, PBS in green, ORF in blue, PPT in maroon. B. Structural organization of <it>Diaspora</it>. PBS: tRNA primer binding site; Gag: Gag core domain (CDD17379); Z: CCHC Zn finger domain; P: protease catalytic core; RT: reverse transcriptase core domain (CDD16610); Int: integrase core domain (CDD25582); PPT: polypurine tract. (&#8680;) ORF. Consensus restriction sites as in Fig. 2 H: <it>Hin</it>dIII; E: <it>Eco</it>RI; B: <it>Bam</it>HI.</p>
               </text>
               <graphic file="1471-2148-5-30-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Conceptual translation of the <it>Diaspora </it>ORF</p>
               </caption>
               <text>
                  <p>Conceptual translation of the <it>Diaspora </it>ORF. Teal: Gag core domain; blue: Zn finger domain; red: protease catalytic core; green: RT core domain; violet: integrase core domain</p>
               </text>
               <graphic file="1471-2148-5-30-3"/>
            </fig>
            <p>As in the <it>Calypso </it>group <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, the tRNA primer binding site (PBS) begins 5 bp beyond the 3' end of the LTR and is perfectly complementary to the 3' terminal 18 bases of tRNA<sup>Asp </sup>from <it>Glycine max </it><abbrgrp><abbr bid="B24">24</abbr></abbrgrp> (Fig. <figr fid="F4">4</figr>). At 873 bp, the distance between the LTR and the putative <it>gag </it>start codon is unusually long and not shared by related elements. This region contains no extended ORFs and neither BLASTn nor tBLASTn searches of the nr database retrieved significant hits.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Base pairing of PBS (upper case) from <it>Diaspora </it>and <it>Calypso </it>with the 3' end of tRNA<sup>Asp</sup></p>
               </caption>
               <text>
                  <p>Base pairing of PBS (upper case) from <it>Diaspora </it>and <it>Calypso </it>with the 3' end of tRNA<sup>Asp</sup>. LTR terminus underlined.</p>
               </text>
               <graphic file="1471-2148-5-30-4"/>
            </fig>
            <p>Five potential splice donor sites, all in the LTR between 1400 and 2200 bp upstream of the <it>gag-pol </it>ORF, were predicted with medium confidence and two potential acceptor sites flanked the start codon. Although without splicing, the 5'UTR of any <it>Diaspora </it>transcript would be exceptionally long, the biological relevance of these sites is not known, and there are no reported examples of introns upstream of <it>gag </it>for any retrovirus or LTR retrotransposon.</p>
            <p>The <it>pol </it>stop codon is 128 bp upstream of the polypurine tract (PPT) that abuts the 3' LTR. Thus <it>Diaspora </it>contains no envelope-like coding sequence beyond <it>pol </it>unlike those reported for its closest relatives, including members of the <it>Athila </it>and <it>Calypso </it>families (Wright and Voytas, 2002), <it>BAGY-2 </it>(Vicient et al., 2001), and <it>Cyclops-1 </it><abbrgrp><abbr bid="B8">8</abbr><abbr bid="B10">10</abbr><abbr bid="B13">13</abbr></abbrgrp>. When this short region was used in BLASTn and tBLASTx searches, no additional sequences with significant probabilities were recovered. Interestingly, translation of this short region yields a strongly predicted transmembrane domain, although it is interrupted by two stop codons (data not shown).</p>
            <p>The <it>Diaspora </it>LTR is 2524 bp in length (Fig. <figr fid="F2">2</figr>), making it one of the longest among retrotransposons and contributing to its unusual length. By comparison, the <it>RIRE3 </it>LTR is 2316 <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, <it>BARE-</it>1 is 1829 <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, <it>Athila</it>1-1 is 1539, and <it>Cyclops</it>-1 is 1504 <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Only the LTRs from <it>Ogre </it>and <it>BAGY-1</it>, at over 5,000 and 4200 bp, respectively <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>, are longer.</p>
            <p>The length of the <it>Diaspora </it>LTR made it impossible to construct unique 5' or 3' LTRs by the <it>in silico </it>method employed. In addition, the absence of a contiguous element prohibited characterization of target site duplications. However, we identified eight accessions from the database that contained the tRNA PBS and part of the adjacent upstream LTR. The longest of these extended 491 bp upstream of the PBS. Twenty-two sequences contained the PPT and part of the adjacent downstream LTR. The longest of these extended 596 bp into the LTR. Thus, the central 1437 bp could not be uniquely assigned to either LTR. As a consequence, the available LTR sequences were merged to generate a single, consensus LTR that was affixed to both <it>Diaspora </it>ends.</p>
            <p>Thirteen LTR sequences were 5' junctions and sixteen were 3', based on the complete absence of sequence similarity beyond the 5' or 3' ends, respectively, of the aligned LTR sequences. When the flanking DNAs of these 29 sequences, were used in BLASTn searches to query the GSS database, all but three generated dozens of hits (data not shown), and thus constituted repetitive elements themselves. Of the repetitive flanking DNAs, 75% represented <it>Diaspora </it>insertions into the coding regions of other retrotransposons. In addition, insertions into the coding regions of transposons related to En/Spm and Tam3 were also found. The identity of the three low- or single-copy sequences could not be ascertained. The <it>Diaspora </it>family therefore appears to be embedded in retrotransposon and transposon-rich regions. We have made similar observations for the <it>SIRE1 </it>retroelement (unpublished). Searches focused on the region upstream of the PPT failed to uncover any <it>Diaspora </it>copies with additional DNA between <it>pol </it>and the PPT.</p>
            <p>Among the sequences used to assemble the contig, non-coding regions contained a variety of short indels, especially in homonucleotide runs and dinucleotide repeats, presumably from replication slippage. The sequences in the GSS collection from which the consensus was built represented unedited submissions, and excluding single base indels that might have been the result of unedited miscalls, most of the indels in the ORF retained the correct reading frame. Eight accessions: BH023632, CG813336, CG820702, CG821179, CG822466, AY656639, AY656648, and AY656656, were chimeric and probably represented truncated copies. All but two of these sequences were within <it>gag </it>or the putative 5'UTR. Since chimeric sequences in GSS would invariably produce lower bit scores, they were generally excluded form the contig. Consequently, many of the slightly lower scoring sequences initially retrieved in our BLASTn search were also chimeric, but were not retained for the assembly and were not further characterized.</p>
         </sec>
         <sec>
            <st>
               <p><it>Diaspora </it>is phylogenetically related to plant endogenous retroviruses</p>
            </st>
            <p>The conserved region of <it>RT </it>was translated and the region representing peptide domains 2 though 7 <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> was used in a BLASTp search to retrieve closely related accessions from Genbank. All of the sequences retrieved were from higher plants and their distribution among species reflected, to a large extent, the current progress of genome sequencing projects. The sequences were aligned (see <supplr sid="S1">Additional file 1</supplr>) and a neighbor joining tree was generated (Fig. <figr fid="F5">5</figr>) and was rooted to the RT from <it>gypsy</it>.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Neighbor-joining phylogenetic tree using p-distances based on conserved RT domains 2 through 7 [29] of <it>gypsy</it>-like LTR retroelements from higher plants</p>
               </caption>
               <text>
                  <p>Neighbor-joining phylogenetic tree using p-distances based on conserved RT domains 2 through 7 [29] of <it>gypsy</it>-like LTR retroelements from higher plants. The tree is rooted to <it>gypsy</it>. Bootstrap values from 1000 pseudo-replicates shown as percentages only at nodes with > 50% support. Vertical line indicates genus; key below. Named elements followed by Genbank Accession numbers; unnamed elements designated by Accession Number and, for translated nucleotide sequences, first nucleotide position. Wilma 634M12: AY494981 [51]; Wilma 426K20: AY146588 [51]; Wilma 107M9: AY368673 [51]; BAGY-2: AJ279072 [13]; Tmt1-1: AC146683 (115510-125622); Athila4-1: AC007209 [10]; Athila6-1: AF104920 [10]; Athila1-1: AB005248 [52]; Athila5-1: AF147260 [10]; AP005726: 133249; AC136972: 155124; <it>Calypso</it>2-1: AF186183 [10]; <it>Calypso</it>3-1: AF186185 [10]; <it>Calypso</it>5-1: AF186186 [10]; <it>Calypso</it>4-1: AF186185 [10]; Cyclops-1: AJ000639 [8]; AP004896: 78828; Tlc1-1: AP006432 (23839-35862); Tlc1-2: AP006350 (29612-19200); BBRE1: T12085; TfcII sr1: AF219199; TfcII sr25: AF219208; TfcII sr18: AF219207; cot8-6: AF378037 [10]; cot5-3: AF378037 [10]; cot8-7: AAL06412 [10]; Tpb1-1: AC149297 (90224-102138); syc2-3: AF378052 [10]; syc4-2: AF378053 [10]; <it>Diaspora</it>Lc: AP007806: 43868; Tat4-1: AB005247 [44]; Cinful-1: AF049110 [45]; Grande1-4: X97604 [46]; RIRE2: AB030283 [47]; Reina: U69258 [48]; Cereba: AY040832 [49]; RIRE7: BAA89466 [50]; RIRE7-2: AL731604 (96205-102279); Dea1: T07863 [51]; del1-46: X13886 [52]; BAGY-1: Y14573 [27]; Tekay: AAL59229 [53]; RIRE3-2: AC123974 (48149-59938); RIRE3: AB014738 [50]; Retrosat2: AAM74400; Retrosat2-2: AL662955 (58578-70224); Gypsy: P10401 [54]. <sup>a</sup><it>Triticum</it>; <sup>b</sup><it>Hordeum</it>; <sup>c</sup><it>Medicago</it>; <sup>d</sup><it>Arabidopsis</it>; <sup>e</sup><it>Oryza</it>; <sup>f</sup><it>Glycine</it>; <sup>g</sup><it>Pisum</it>; <sup>h</sup><it>Lotus</it>; <sup>i</sup><it>Vicia</it>; <sup>j</sup><it>Fritillaria</it>; <sup>k</sup><it>Gossypium</it>; <sup>l</sup><it>Populus</it>; <sup>m</sup><it>Platanus</it>; <sup>n</sup><it>Zea</it>; <sup>o</sup><it>Ananas</it>; <sup>p</sup><it>Lilium</it>; <sup>q</sup><it>Sorghum</it>; <sup>r</sup><it>Drosophila</it></p>
               </text>
               <graphic file="1471-2148-5-30-5"/>
            </fig>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p>ClustalW alignment of conserved reverse transcriptase domains for selected plant Ty3-gypsy family retroelements. Amino acid sequence alignments generated by ClustalW using the Megalign program from Lasergene 5 were imported into MEGA2 for phylogenetic analysis.</p>
               </text>
               <file name="1471-2148-5-30-S1.meg">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>The tree resolves two major clades, designated A and B (Fig. <figr fid="F5">5</figr>). With respect to coding potential beyond <it>pol</it>, clade B members have none, and in all cases, the <it>pol </it>stop codon is closely followed by a PPT and the LTR. In contrast, with the exception of <it>Diaspora</it>, all members of clade A for which sequences downstream of <it>pol </it>are available contain a putative <it>env</it>-like pseudogene. The major structural difference between <it>Diaspora </it>and other members of this group is illustrated in Fig. <figr fid="F6">6</figr>. Clade A is further partitioned into sister clades AI and AII with 94% bootstrap support. Clade AI is further divided, with 100% bootstrap support, into AIa/b and AIc. The bifurcation of AIa and AIb in Fig. <figr fid="F5">5</figr> is only weakly supported (40%) and may not be significant. Clades AIa and AIb are populated exclusively with <it>env</it>-containing members, including the Athila and <it>Calypso </it>families. The only full length members of AIc are <it>Diaspora</it>, <it>DiasporaLc</it>, and Tpb1-1. The other members of the AIc lineage are sequences from PCR-amplified <it>rt </it>fragments from sycamore and cotton <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, and three representative <it>rt</it>-containing genomic clones from <it>Fritillaria</it>. DNA downstream of the <it>Fritillaria rt </it>has not been characterized (C. Baysdorfer, personal communication). In contrast to <it>Diaspora </it>in <it>G. max </it>and its close relative in <it>L. corniculatus</it>, 1479 bp separate the <it>pol </it>stop codon from the LTR in Tpb1-1. While this region contains no identifiable or extended ORFs, there is a proximal 19-codon ORF whose conceptual translation is predicted with very high confidence to be a transmembrane domain (data not shown). There is also a 47-base polyA segment in the middle of this region, suggesting the interval contains an integrated cDNA. Thus, the AIIc lineage is not monophyletic for the absence of a long <it>pol</it>-LTR interval, but whether this region in Tpb1-1 represents a degenerate <it>env </it>cannot be determined.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Structural organization of <it>Athila</it>, <it>Calypso</it>, and <it>Diaspora </it>consensus elements</p>
               </caption>
               <text>
                  <p>Structural organization of <it>Athila</it>, <it>Calypso</it>, and <it>Diaspora </it>consensus elements.</p>
               </text>
               <graphic file="1471-2148-5-30-6"/>
            </fig>
            <p>In conclusion, the nesting of clade AIc within a much larger group of elements with an <it>env</it>-like gene suggests that at least <it>Diaspora </it>suffered a complete and nearly precise loss of a coding region, rather than failed to acquire one. To more exhaustively search for other members of this group, the RT from members of lineage AIc were used in tBLASTn searches. All hits, however, were already in the tree.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The DNA sequence of the previously unreported <it>Diaspora </it>retrotransposon was created by a combination of experimental and <it>in silico </it>methods utilizing <it>Glycine max </it>sequences currently available as single-pass-read accessions in public databases. To date, the publicly available <it>G. max </it>sequence collections, including the NR and HTGS databases, contain no full-length copies of this element. Consensus sequences for transposons and retroelements have frequently been generated from alignments of multiple family members <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>, but the construction of a full-length consensus sequence of a new element from large numbers of short overlapping fragments has not. While appropriate for early stage genome projects like that of soybean, resorting to such a strategy is neither required nor efficacious in genomes that have been extensively sequenced, like those of <it>Arabidopsis</it>, rice, <it>Drosophila </it>and humans.</p>
         <p><it>Diaspora </it>has a single uninterrupted ORF encoding gag, protease, RT, and integrase as a single polyprotein. Consensus assemblies for <it>Athila </it>and <it>Calypso </it>elements also contain a single ORF for these proteins <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. While we cannot infer that functional copies of <it>Diaspora </it>still exist in the <it>G. max </it>genome, the density of the contig assembly and the presence of a strongly conserved consensus nucleotide at virtually every position of the assembly supports the argument that a reasonable facsimile of a past functional element is depicted.</p>
         <p>The <it>Diaspora </it>family is also present in the <it>Lotus corniculatus </it>genome, where we discovered an apparently 5'truncated copy on a Phase I HTGS clone, AP007806. Excluding indels, the lotus sequence shares approximately 80% nucleotide identity with the <it>Diaspora </it>consensus sequence over a length of 7 kb. Most of the indels in the coding region are in-frame. Like <it>Diaspora</it>, the lotus element lacks an <it>env</it>-like region. With the exception of two 7-bp and one 15-bp indels, the short intervals between the <it>pol </it>termination codon and the LTR are 88% identical between the two. Additional truncated copies of <it>Diaspora </it>family members are present on ten other Phase I HTGS clones from <it>L. corniculatus</it>.</p>
         <p><it>Diaspora </it>is unusual in several respects. 1) It has unusually long LTRs. 2) At 873 bp, the distance between the LTR, which should contain the promoter and transcriptional start sites, and the <it>gag </it>start codon is far longer than every other characterized retroelement except one. And 3) <it>Diaspora </it>is the only characterized envelope-less member of a lineage of plant <it>gypsy</it>-like endogenous retroviruses.</p>
         <p>The significance of the extended length of the LTRs found in <it>BAGY-1 </it><abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, <it>Ogre </it><abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, <it>RIRE3 </it><abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, and <it>Diaspora </it>are difficult to ascertain since virtually nothing is known about the biology of these elements. The same is true for the unusually long regions between the LTR and the <it>gag </it>start codon in <it>Diaspora </it>and <it>Ogre</it>, although in the case of <it>Ogre</it>, this region contains a 550-codon ORF whose conceptual translation yields a polypeptide of unknown function. Since transcriptional start sites are always found at the U3-R junction of the LTR, an exceptionally long 5'UTR would result unless splicing occurred. However, there have been no introns reported upstream of <it>gag </it>in any LTR retroelement, including <it>Ogre </it><abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, for which transcripts have been characterized,.</p>
         <p>Unlike all other characterized members of an apparent plant endogenous retrovirus lineage <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, <it>Diaspora </it>lacks an envelope-like coding domain downstream of <it>pol</it>. Few members of this lineage contain functional <it>gag-pol </it>genes, based on the presence of nonsense and frameshift mutations, and none contain a functional <it>env</it>-like gene based on these same criteria. The only other fully sequenced member of clade AIIc, Tpb1-1, contains a strongly predicted TM just downstream of the pol stop codon, but is contaminated by an apparent retrogene, and the region could not be characterized as <it>env</it>-like based on amino acid similarity.</p>
         <p>The hypothetical env-like proteins exhibit little primary sequence similarity, and only those found in <it>Calypso </it>and Cyclops-2, which share 29% amino acid identity, appear to be homologous <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B12">12</abbr></abbrgrp>. Without significant sequence similarity, we are reluctant to speculate whether the predicted transmembrane domain for the translated 128 bp fragment between <it>pol </it>and the PPT in <it>Diaspora </it>reflects a vestige of an <it>env</it>-like gene.</p>
         <p>The tree in Fig. <figr fid="F5">5</figr> is similar to that generated by Wright and Voytas <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. In their study, many members of the endogenous retrovirus lineage were derived from <it>rt</it>-delimited PCR amplicons, few of which are included in on our tree because the presence of an <it>env</it>-like region was not empirically determined. Our analysis, however, includes several additional full-length elements whose <it>env</it>-like status has been determined. We infer from this analysis that an ancestral <it>Diaspora </it>element suffered a deletion of this region. Whether Tpb1-1 suffered a similar fate but subsequently acquired a retrogene is open to speculation. Using this region to query Genbank in BLASTn and tBLASTx searches yielded no hits. While envelope capture by LTR retrotransposons has been credited with the creation of infectious retroviruses <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B17">17</abbr></abbrgrp>, only the <it>env </it>genes of invertebrate elements have been phylogenetically linked to unrelated viruses <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The failure to uncover an analogous linkage in retroviruses has been attributed, in part, to accelerated divergence promoted by host-induced immune responses that fuel positive selection for envelope variants <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
         <p>Studies focused on envelope loss have not been reported, although phylogenetic relatedness between mammalian retroviruses and endogenous retroviruses with <it>env </it>pseudogenes is recognized <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Although the <it>env </it>genes of most human endogenous retrovirus (HERV) families are marked by frameshifts, nonsense mutations, and deletions <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, in only one family, HERV-L, are all vestiges of the <it>env </it>gene lost <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. In the case of HERV-L, the region between <it>pol </it>and the LTR is occupied by a dUTPase coding domain <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Other members of the class III HERV clade that contains HERV-L, including HERV-S, contain <it>env </it>pseudogenes <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Interestingly, with a copy number of 575, the HERV-L family is second only to the HERV-H family in abundance <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
         <p><it>Diaspora </it>and <it>Diaspora</it>Lc possess no trace of the <it>env</it>-like genes that are present in all members of clades AIa, AIb, and AII (Fig. <figr fid="F5">5</figr>). Whether other members of clade AIc, from cotton (<it>Gossypium</it>), sycamore (<it>Platanus</it>) and lily <it>(Fritillaria</it>), also lack an <it>env </it>region is not known, and the precise node within this clade that represents envelope loss cannot be assessed.</p>
         <p>One explanation for an abrupt and complete loss of <it>env </it>is anomalous splicing of a genomic transcript containing <it>gag-pol-env</it>. Retroelement genomes are packaged as genomic RNA transcripts and retroviral transcripts destined for translation are often substrates for a complex pattern of splicing <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Alternatively, illegitimate recombination could lead to DNA loss and has been proposed as a major component of element elimination from plant genomes <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. Many of the individual sequences that made up the contig contained short deletions not shared by others. This was especially true in non-coding regions (data not shown). Whatever the explanation, <it>Diaspora </it>appears to be an example of a retrotransposon that evolved from an endogenous retrovirus.</p>
         <p>The nature of selective forces, if any, that might drive the loss of an <it>env </it>gene is open to speculation. <it>Env </it>genes, required for retroviral infectivity, are not thought to be required for retrotransposition, and it is possible that for some retroelements the gene or its protein product might attenuate the process, promoting selection for their inactivation, but with concomitant loss of infectivity. As noted above, in one of the largest families of HERV, the env gene has been replaced with a dUTPase. In plant genomes, however, the copy numbers of both putative endogenous retroviruses like <it>SIRE1 </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, <it>Calypso</it><abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, and BAGY-2 <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and retrotransposons like BARE-1 <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, Opie-2 <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, and <it>Diaspora </it>reach into the thousands. On the other hand, <it>env </it>genes in both mammalian and plant endogenous retroviruses are far more degenerate than those in <it>pol</it>, suggesting they are far less sensitive to purifying selection. The proliferation of one retroelement form or the other may be the result of random mutation and genetic drift. Nonetheless, retrotransposition, with or without an <it>env </it>gene, has been a far more successful long term reproductive strategy than retroviral infection.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>DNA isolation, amplification, and sequencing</p>
            </st>
            <p>DNA containing the <it>SIRE1 </it>endogenous retrovirus was recovered from a &#955;FIXII soybean genomic library (Stratagene) by standard plaque hybridization <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, and <it>Hin</it>dIII-digested fragments were sub-cloned into pSPORT1 (Life Technologies) as described <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. DNA from three contiguous subclones, pAMH3C, pAMH3G, and pAMH3D were isolated and sequenced as described <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. The junctions and contiguity of these subclones were confirmed by direct sequencing of the intact &#955;FIXII genomic clone across the <it>Hin</it>dIII junctions. These sequences were previously deposited [Genbank: U96295 and AF095730]. A BLASTp search with the conceptual translation of AF095730 (see below) indicated that this accession contained the <it>pol </it>region of an uncharacterized retrotransposon. Several additional positive clones from this library were recovered and segments of the isolated DNAs were sequenced directly or amplified using Taq DNA Polymerase (Promega). For amplifications, reactions were preheated for 3 min. at 94&#176;C, then 30 cycles were run at 94&#176;C for 30 sec., 54&#176;C for 30 sec., and 72&#176;C for 1 to 2 min. Amplicons were spin column-purified (Qiagen) and sequenced as described <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Sequences were deposited [GenBank: AY656632-AY656653].</p>
            <p>pAMH3D, compromising the protease and RT coding domains, was used to probe a soybean BAC library <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> (generously provided by K. Meksem) under moderate stringency <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> for the presence of sequences related to AF095730. Ten clones were chosen arbitrarily for amplification and sequencing. DNAs from BAC clones were recovered using Procipitate (Ligochem) and selected regions were amplified using Taq DNA Polymerase (Promega). After preheating reactions for 3 min. at 94&#176;C, 30 cycles were run at 94&#176;C for 30 sec., 54&#176;C for 30 sec., and 72&#176;C for 1 min. Regions within pAMH3D were first PCR-amplified using primer pairs PDIA01F-02R, PDIA03F-04R, PDIA05F-06R, PDIA07F-08R, PDIA09F-10R, PDIA11F-12R, and PDIA13F-14R (see Table <tblr tid="T1">1</tblr>). The amplicons were purified on Qiagen spin columns and sequenced directly without cloning as described <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Sequences for regions beyond the ends of AF095730 were generated directly from BAC DNA using outward facing primers (PDIA02R and 03F) derived from the termini of AF095730, followed by additional outward extensions with primers PDIA16F, 17R, 18R, 19R, 20R, 21F (Table <tblr tid="T1">1</tblr>). BAC clone sequences have been deposited [GenBank: AY656654-AY656662].</p>
         </sec>
         <sec>
            <st>
               <p><it>In silico </it>methods</p>
            </st>
            <p>Selected regions of AF095730 and their conceptual translations were used to query all relevant Genbank databases, including nr, Genome Sequence Survey (GSS) and Expressed Sequence Tag (EST), with BLASTn and BLASTp searches <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. Conserved protein domains were identified with CDD <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>.</p>
            <p>For assembly of the consensus nucleotide sequence, <it>Glycine max </it>accessions from the GSS and EST databases with bit scores greater than 200 (E values &lt; 10<sup>-52</sup>) were added to the consensus construct. These criteria generally reflect >90% DNA sequence identity over at least 200 bp of overlap. The nr nucleotide database contained no significant hits. New additions to the ends of the expanding consensus were used to re-query the databases until the LTR redundancy was recognized and the contig formed a circle. Contigs were assembled using the Seqman program from Lasergene 5 (DNAStar). To locate the LTR, direct repeats greater than 25 bp were first identified using Lasergene GeneQuest (DNAStar) and the termini of the LTRs were confirmed by manual inspection, as were other non-coding features of the sequence. Because LTR junction sequences at both termini contained either internal element DNA or external flanking DNAs, these were carefully examined for consensus DNA (internal) or unique DNAs (external). External DNAs were trimmed from the contig. Potential splice junctions were evaluated using GeneSplicer <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> and NetGene2 <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. Transmembrane domains were predicted using TMPred <abbrgrp><abbr bid="B49">49</abbr></abbrgrp></p>
            <p>The <it>pol </it>region of the conceptually translated consensus sequence was used to query the Genbank protein database for related sequences. A ClustalW alignment (see <supplr sid="S1">Additional file 1</supplr>) was generated from a contiguous region of RT representing conserved domains two through seven <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> using Lasergene 5 (DNAStar), and a neighbor joining tree using p distances with 1000 bootstrap pseudo-replicates was constructed using MEGA2 <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>LTR: long terminal repeat; RT: reverse transcriptase; prot: protease; env: envelope; PBS: tRNA primer binding site; PPT: polypurine tract; GSS: genome survey sequence; HTGS: high throughput genomic sequence</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>SY identified, recovered, amplified, and sequenced DNA from BAC library clones and sequenced DNA from the &#955; library clones. PB identified and recovered DNA from the &#955; library clones. HL isolated and sequenced DNA from &#955; library sub-clones and performed all the <it>in silico </it>and phylogenetic analyses. HL prepared the manuscript for review by the authors and all authors approved the final draft.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>Supported in part with Loyola Mulcahy Undergraduate Research Fellowships to STY and PB. Thanks to H. Mears, J. Smith, and J. Damergis for providing extra hands and spiritual support, and to K. Meksem, A. Jamai, and J. Shultz for BAC filters. This work was supported in part by U.S. Dept. of Defense Advanced Research Projects Grant N66001-03-1-8941.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Mobile elements: Drivers of genome evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Kazazian</snm>
                  <fnm>HH</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>303</volume>
            <fpage>1626</fpage>
            <lpage>1632</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1089670</pubid>
                  <pubid idtype="pmpid" link="fulltext">15016989</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Mobile Elements in Animal and Plant Genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Deininger</snm>
                  <fnm>PL</fnm>
               </au>
               <au>
                  <snm>Roy-Engel</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Mobile DNA II</source>
            <publisher>Washington, D.C., ASM Press</publisher>
            <editor>Craig NL, Craigie R, Gellert M and Lambowitz AM</editor>
            <pubdate>2002</pubdate>
            <fpage>1074</fpage>
            <lpage>1092</lpage>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Plant transposable elements: Where genetics meets genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Feschotte</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Wessler</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>329</fpage>
            <lpage>341</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg793</pubid>
                  <pubid idtype="pmpid" link="fulltext">11988759</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Origins and Evolution of Retrotransposons</p>
            </title>
            <aug>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Malik</snm>
                  <fnm>HS</fnm>
               </au>
            </aug>
            <source>Mobile DNA II</source>
            <publisher>Washington, D.C., ASM Press</publisher>
            <editor>Craig NL, Craigie R, Gellert M and Lambowitz AM</editor>
            <pubdate>2002</pubdate>
            <fpage>1111</fpage>
            <lpage>1144</lpage>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Evolutionary history of Oryza sativa LTR retrotransposons: a preliminary survey of the rice genome sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Gao</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>McCarthy</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Ganko</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>18</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">373447</pubid>
                  <pubid idtype="pmpid" link="fulltext">15040813</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-5-18</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Legume genomes: more than peas in a pod.</p>
            </title>
            <aug>
               <au>
                  <snm>Young</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Mudge</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>TH</fnm>
               </au>
            </aug>
            <source>Curr Opin Plant Biol</source>
            <pubdate>2003</pubdate>
            <volume>6</volume>
            <fpage>199</fpage>
            <lpage>204</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1369-5266(03)00006-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">12667879</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Transposable elements, genes and recombination in a 215-kb contig from wheat chromosome 5A(m)</p>
            </title>
            <aug>
               <au>
                  <snm>SanMiguel</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Ramakrishna</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Busso</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Dubcovsky</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Funct Integr Genomics</source>
            <pubdate>2002</pubdate>
            <volume>2</volume>
            <fpage>70</fpage>
            <lpage>80</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s10142-002-0056-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">12021852</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Structure and evolution of Cyclops: a novel giant retrotransposon of the Ty3/Gypsy family highly amplified in pea and other legume species</p>
            </title>
            <aug>
               <au>
                  <snm>Chavanne</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>DX</fnm>
               </au>
               <au>
                  <snm>Liaud</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Cerff</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>37</volume>
            <fpage>363</fpage>
            <lpage>375</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1005969626142</pubid>
                  <pubid idtype="pmpid" link="fulltext">9617807</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>A copia-like retrotransposon Tgmr closely linked to the Rps1-k allele that confers race-specific resistance of soybean to Phytophthora sojae</p>
            </title>
            <aug>
               <au>
                  <snm>Bhattacharyya</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Gonzales</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Kraft</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Buzzell</snm>
                  <fnm>RI</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>34</volume>
            <fpage>255</fpage>
            <lpage>264</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1005851623493</pubid>
                  <pubid idtype="pmpid" link="fulltext">9207841</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Athila4 of Arabidopsis and Calypso of soybean define a lineage of endogenous plant retroviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Wright</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Voytas</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>122</fpage>
            <lpage>131</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155253</pubid>
                  <pubid idtype="pmpid" link="fulltext">11779837</pubid>
                  <pubid idtype="doi">10.1101/gr.196001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>SIRE-1, a copia/Ty1-like retroelement from soybean, encodes a retroviral envelope-like protein</p>
            </title>
            <aug>
               <au>
                  <snm>Laten</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Majumdar</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gaucher</snm>
                  <fnm>EA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>6897</fpage>
            <lpage>6902</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">22677</pubid>
                  <pubid idtype="pmpid" link="fulltext">9618510</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.12.6897</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Retroviruses in plants?</p>
            </title>
            <aug>
               <au>
                  <snm>Peterson-Burch</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Wright</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Laten</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Voytas</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>151</fpage>
            <lpage>152</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)01981-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">10729827</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Envelope-class retrovirus-like elements are widespread, transcribed and spliced, and insertionally polymorphic in plants</p>
            </title>
            <aug>
               <au>
                  <snm>Vicient</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Kalendar</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schulman</snm>
                  <fnm>AH</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>2041</fpage>
            <lpage>2049</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">311225</pubid>
                  <pubid idtype="pmpid" link="fulltext">11731494</pubid>
                  <pubid idtype="doi">10.1101/gr.193301</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Phylogenetic evidence for Ty1-copia-like endogenous retroviruses in plant genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Laten</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>Genetica</source>
            <pubdate>1999</pubdate>
            <volume>107</volume>
            <fpage>87</fpage>
            <lpage>93</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1003901009861</pubid>
                  <pubid idtype="pmpid">10952201</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Poised for contagion: Evolutionary origins of the infectious abilities of invertebrate retroviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Malik</snm>
                  <fnm>HS</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1307</fpage>
            <lpage>1318</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.145000</pubid>
                  <pubid idtype="pmpid" link="fulltext">10984449</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The HIV Env-mediated fusion reaction</p>
            </title>
            <aug>
               <au>
                  <snm>Gallo</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Finnegan</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Viard</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Raviv</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Dimitrov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rawat</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Puri</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Durell</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Blumenthal</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Biochim Biophys Acta - Biomembranes</source>
            <pubdate>2003</pubdate>
            <volume>1614</volume>
            <fpage>36</fpage>
            <lpage>50</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0005-2736(03)00161-5</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Emergence of vertebrate retroviruses and envelope capture</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>FJ</fnm>
               </au>
               <au>
                  <snm>Battini</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Manel</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sitbon</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Virology</source>
            <pubdate>2004</pubdate>
            <volume>318</volume>
            <fpage>183</fpage>
            <lpage>191</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.virol.2003.09.026</pubid>
                  <pubid idtype="pmpid" link="fulltext">14972546</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Athila, a new retroelement from Arabidopsis thaliana</p>
            </title>
            <aug>
               <au>
                  <snm>Pelissier</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tutois</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Deragon</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Tourmente</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Genestier</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Picard</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>29</volume>
            <fpage>441</fpage>
            <lpage>452</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00020976</pubid>
                  <pubid idtype="pmpid">8534844</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Bac contig development by fingerprint analysis in soybean</p>
            </title>
            <aug>
               <au>
                  <snm>Marek</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Genome</source>
            <pubdate>1997</pubdate>
            <volume>40</volume>
            <fpage>420</fpage>
            <lpage>427</lpage>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Two large-insert soybean genomic libraries constructed in a binary vector: applications in chromosome walking and genome wide physical mapping</p>
            </title>
            <aug>
               <au>
                  <snm>Meksem</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Zobrist</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ruben</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hyten</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Quanzhou</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>HB</fnm>
               </au>
               <au>
                  <snm>Lightfoot</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Theor Appl Genet</source>
            <pubdate>2000</pubdate>
            <volume>101</volume>
            <fpage>747</fpage>
            <lpage>755</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/s001220051540</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Nested retrotransposons in the intergenic regions of the maize genome</p>
            </title>
            <aug>
               <au>
                  <snm>SanMiguel</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tikhonov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Jin</snm>
                  <fnm>YK</fnm>
               </au>
               <au>
                  <snm>Motchoulskaia</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Zakharov</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Melake-Berhan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Springer</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Avramova</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>274</volume>
            <fpage>765</fpage>
            <lpage>768</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.274.5288.765</pubid>
                  <pubid idtype="pmpid" link="fulltext">8864112</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Soybean genomic survey: BAC-end sequences near RFLP and SSR markers</p>
            </title>
            <aug>
               <au>
                  <snm>Marek</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Mudge</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Darnielle</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Grant</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hanson</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Paz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yan</snm>
                  <fnm>HH</fnm>
               </au>
               <au>
                  <snm>Denny</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Larson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Foster-Hartnett</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Danesh</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Larsen</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Schmidt</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Staggs</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Crow</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Retzel</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Genome</source>
            <pubdate>2001</pubdate>
            <volume>44</volume>
            <fpage>572</fpage>
            <lpage>581</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1139/gen-44-4-572</pubid>
                  <pubid idtype="pmpid" link="fulltext">11550890</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Plant retrotransposons</p>
            </title>
            <aug>
               <au>
                  <snm>Kumar</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>1999</pubdate>
            <volume>33</volume>
            <fpage>479</fpage>
            <lpage>532</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genet.33.1.479</pubid>
                  <pubid idtype="pmpid" link="fulltext">10690416</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Plant tRNA genes: putative soybean genes for tRNAasp and tRNAmet</p>
            </title>
            <aug>
               <au>
                  <snm>Waldron</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wills</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gesteland</snm>
                  <fnm>RF</fnm>
               </au>
            </aug>
            <source>J Mol Appl Genet</source>
            <pubdate>1985</pubdate>
            <volume>3</volume>
            <fpage>7</fpage>
            <lpage>17</lpage>
            <xrefbib>
               <pubid idtype="pmpid">4040149</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Identification and characterization of novel retrotransposons of the gypsy type in rice</p>
            </title>
            <aug>
               <au>
                  <snm>Kumekawa</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Ohtsubo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Horiuchi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ohtsubo</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Mol Gen Genet</source>
            <pubdate>1999</pubdate>
            <volume>260</volume>
            <fpage>593</fpage>
            <lpage>602</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s004380050933</pubid>
                  <pubid idtype="pmpid">9928939</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>BARE-1, a copia-like retroelement in barley (Hordeum vulgare L.)</p>
            </title>
            <aug>
               <au>
                  <snm>Manninen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Schulman</snm>
                  <fnm>AH</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1993</pubdate>
            <volume>22</volume>
            <fpage>829</fpage>
            <lpage>846</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00027369</pubid>
                  <pubid idtype="pmpid">7689350</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Highly abundant pea LTR retrotransposon Ogre is constitutively transcribed and partially spliced</p>
            </title>
            <aug>
               <au>
                  <snm>Neumann</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pozarkova</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Macas</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>2003</pubdate>
            <volume>53</volume>
            <fpage>399</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/B:PLAN.0000006945.77043.ce</pubid>
                  <pubid idtype="pmpid" link="fulltext">14750527</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>A contiguous 60 kb genomic stretch from barley reveals molecular evidence for gene islands in a monocot genome</p>
            </title>
            <aug>
               <au>
                  <snm>Panstruga</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Buschges</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Piffanelli</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Schulze-Lefert</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>1056</fpage>
            <lpage>1062</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147355</pubid>
                  <pubid idtype="pmpid" link="fulltext">9461468</pubid>
                  <pubid idtype="doi">10.1093/nar/26.4.1056</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Origin and evolution of retroelements based upon their reverse transcriptase sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Xiong</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1990</pubdate>
            <volume>9</volume>
            <fpage>3353</fpage>
            <lpage>3362</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1698615</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Molecular resurrection of an extinct ancestral promoter for mouse L1</p>
            </title>
            <aug>
               <au>
                  <snm>Adey</snm>
                  <fnm>NB</fnm>
               </au>
               <au>
                  <snm>Tollefsbol</snm>
                  <fnm>TO</fnm>
               </au>
               <au>
                  <snm>Sparks</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Edgell</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Hutchison</snm>
                  <fnm>CAIII</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1994</pubdate>
            <volume>91</volume>
            <fpage>1569</fpage>
            <lpage>1573</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">43201</pubid>
                  <pubid idtype="pmpid" link="fulltext">8108446</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Molecular reconstruction of Sleeping Beauty, a Tc1-like transposon from fish, and its transposition in human cells</p>
            </title>
            <aug>
               <au>
                  <snm>Ivics</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Hackett</snm>
                  <fnm>PB</fnm>
               </au>
               <au>
                  <snm>Plasterk</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Izsvak</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1997</pubdate>
            <volume>91</volume>
            <fpage>501</fpage>
            <lpage>510</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)80436-5</pubid>
                  <pubid idtype="pmpid">9390559</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Repbase update: a database and an electronic journal of repetitive elements</p>
            </title>
            <aug>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>418</fpage>
            <lpage>420</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02093-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">10973072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>MER53, a non-autonomous DNA transposon associated with a variety of functionally related defense genes in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Kapitonov</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>DNA Seq</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>277</fpage>
            <lpage>288</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10993599</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Bmmar6, a second mori subfamily mariner transposon from the silkworm moth Bombyx mori</p>
            </title>
            <aug>
               <au>
                  <snm>Robertson</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Walden</snm>
                  <fnm>KK</fnm>
               </au>
            </aug>
            <source>Insect Mol Biol</source>
            <pubdate>2003</pubdate>
            <volume>12</volume>
            <fpage>167</fpage>
            <lpage>171</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2583.2003.00398.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12653938</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Members of the pogo superfamily of DNA-mediated transposons in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Robertson</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>Mol Gen Genet</source>
            <pubdate>1996</pubdate>
            <volume>252</volume>
            <fpage>761</fpage>
            <lpage>766</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s004380050288</pubid>
                  <pubid idtype="pmpid">8917322</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The evolution, distribution and diversity of endogenous retroviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Gifford</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tristem</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Virus Genes</source>
            <pubdate>2003</pubdate>
            <volume>26</volume>
            <fpage>291</fpage>
            <lpage>316</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1024455415443</pubid>
                  <pubid idtype="pmpid" link="fulltext">12876457</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Identification and characterization of novel human endogenous retrovirus families by phylogenetic screening of the Human Genome Mapping Project database</p>
            </title>
            <aug>
               <au>
                  <snm>Tristem</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Virology</source>
            <pubdate>2000</pubdate>
            <volume>74</volume>
            <fpage>3715</fpage>
            <lpage>3730</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">111881</pubid>
                  <pubid idtype="pmpid" link="fulltext">10729147</pubid>
                  <pubid idtype="doi">10.1128/JVI.74.8.3715-3730.2000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A second exon splicing silencer within human immunodeficiency virus type 1 tat exon 2 represses splicing of Tat mRNA and binds protein hnRNP H</p>
            </title>
            <aug>
               <au>
                  <snm>Jacquenet</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mereau</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bilodeau</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Damier</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Stoltzfus</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Branlant</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2001</pubdate>
            <volume>276</volume>
            <fpage>40464</fpage>
            <lpage>40475</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M104070200</pubid>
                  <pubid idtype="pmpid" link="fulltext">11526107</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Rapid genome divergence at orthologous low molecular weight glutenin loci of the A and A(m) genomes of wheat</p>
            </title>
            <aug>
               <au>
                  <snm>Wicker</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yahiaoui</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Guyot</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schlagenhauf</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>ZD</fnm>
               </au>
               <au>
                  <snm>Dubcovsky</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Keller</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>2003</pubdate>
            <volume>15</volume>
            <fpage>1186</fpage>
            <lpage>1197</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">153725</pubid>
                  <pubid idtype="pmpid" link="fulltext">12724543</pubid>
                  <pubid idtype="doi">10.1105/tpc.011023</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis</p>
            </title>
            <aug>
               <au>
                  <snm>Devos</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1075</fpage>
            <lpage>1079</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186626</pubid>
                  <pubid idtype="pmpid" link="fulltext">12097344</pubid>
                  <pubid idtype="doi">10.1101/gr.132102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice</p>
            </title>
            <aug>
               <au>
                  <snm>Ma</snm>
                  <fnm>JX</fnm>
               </au>
               <au>
                  <snm>Devos</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>860</fpage>
            <lpage>869</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">479113</pubid>
                  <pubid idtype="pmpid" link="fulltext">15078861</pubid>
                  <pubid idtype="doi">10.1101/gr.1466204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Retrotransposon BARE-1 is a major, dispersed component of the barley (Hordeum vulgare L.) genome</p>
            </title>
            <aug>
               <au>
                  <snm>Suoniemi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Anamthawat-Jonsson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Arna</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schulman</snm>
                  <fnm>AH</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1996</pubdate>
            <volume>30</volume>
            <fpage>1321</fpage>
            <lpage>1329</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00019563</pubid>
                  <pubid idtype="pmpid">8704140</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <aug>
               <au>
                  <snm>Sambrook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fritsch</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Maniatis</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Molecular Cloning</source>
            <publisher>Cold Spring Harbor, NY, Cold Spring Harbor Laboratory Press</publisher>
            <pubdate>1989</pubdate>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
                  <pubid idtype="pmpid" link="fulltext">2231712</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Gapped blast and psi-blast - a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>CDD: a curated Entrez database of conserved domain alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Marchler-Bauer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Anderson</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>DeWeese-Scott</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fedorova</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Geer</snm>
                  <fnm>LY</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>SQ</fnm>
               </au>
               <au>
                  <snm>Hurwitz</snm>
                  <fnm>DI</fnm>
               </au>
               <au>
                  <snm>Jackson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Jacobs</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Lanczycki</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Liebert</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Madej</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Marchler</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Mazumder</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Nikolskaya</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>Panchenko</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Rao</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Simonyan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Song</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Thiessen</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Vasudevan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>YL</fnm>
               </au>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>383</fpage>
            <lpage>387</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165534</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520028</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg087</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>GeneSplicer: a new computational method for splice site prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Pertea</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>1185</fpage>
            <lpage>1190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29713</pubid>
                  <pubid idtype="pmpid" link="fulltext">11222768</pubid>
                  <pubid idtype="doi">10.1093/nar/29.5.1185</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Splice site prediction in Arabidopsis thaliana DNA by combining local and global sequence information</p>
            </title>
            <aug>
               <au>
                  <snm>Hebsgaard</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Korning</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Tolstrup</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Engelbrecht</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rouze</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1996</pubdate>
            <volume>24</volume>
            <fpage>3439</fpage>
            <lpage>3452</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146109</pubid>
                  <pubid idtype="pmpid" link="fulltext">8811101</pubid>
                  <pubid idtype="doi">10.1093/nar/24.17.3439</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>TMbase - A database of membrane spanning protein segments</p>
            </title>
            <aug>
               <au>
                  <snm>Hofman</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Stoffel</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Biol Chem Hoppe Seyler</source>
            <pubdate>1993</pubdate>
            <volume>374</volume>
            <fpage>166</fpage>
         </bibl>
         <bibl id="B50">
            <title>
               <p>MEGA2: molecular evolutionary genetics analysis software</p>
            </title>
            <aug>
               <au>
                  <snm>Kumar</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tamura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jakobsen</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>1244</fpage>
            <lpage>1245</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.12.1244</pubid>
                  <pubid idtype="pmpid" link="fulltext">11751241</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Rapid Genome Evolution Revealed by Comparative Sequence Analysis of Orthologous Regions from Four Triticeae Genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Gu</snm>
                  <fnm>YQ</fnm>
               </au>
               <au>
                  <snm>Coleman-Derr</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kong</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Anderson</snm>
                  <fnm>OD</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2004</pubdate>
            <volume>135</volume>
            <fpage>459</fpage>
            <lpage>470</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">429398</pubid>
                  <pubid idtype="pmpid" link="fulltext">15122014</pubid>
                  <pubid idtype="doi">10.1104/pp.103.038083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Potential retroviruses in plants: Tat1 is related to a group of Arabidopsis thaliana Ty3/gypsy retrotransposons that encode envelope- like proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Wright</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Voytas</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1998</pubdate>
            <volume>149</volume>
            <fpage>703</fpage>
            <lpage>715</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9611185</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
