<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2164-12-99</ui><ji>1471-2164</ji><fm>
<dochead>Research article</dochead>
<bibl>
<title>
<p>
<it>De novo </it>characterization of the gametophyte transcriptome in bracken fern, <it>Pteridium aquilinum</it>
</p>
</title>
<aug>
<au ca="yes" id="A1"><snm>Der</snm><mi>P</mi><fnm>Joshua</fnm><insr iid="I1"/><insr iid="I3"/><email>jpd18@psu.edu</email></au>
<au id="A2"><snm>Barker</snm><mi>S</mi><fnm>Michael</fnm><insr iid="I2"/><email>msbarker@email.arizona.edu</email></au>
<au id="A3"><snm>Wickett</snm><mi>J</mi><fnm>Norman</fnm><insr iid="I3"/><email>njw17@psu.edu</email></au>
<au id="A4"><snm>dePamphilis</snm><mi>W</mi><fnm>Claude</fnm><insr iid="I3"/><email>cwd3@psu.edu</email></au>
<au id="A5"><snm>Wolf</snm><mi>G</mi><fnm>Paul</fnm><insr iid="I1"/><email>paul.wolf@usu.edu</email></au>
</aug>
<insg>
<ins id="I1"><p>Department of Biology and Center for Integrated Biosystems, Utah State University, Logan, UT 84322-5305, USA</p></ins>
<ins id="I2"><p>Department of Ecology and Evolutionary Biology, University of Arizona, Tuscon, AZ 85721, USA</p></ins>
<ins id="I3"><p>Department of Biology, Institute of Molecular Evolutionary Genetics, and The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA</p></ins>
</insg>
<source>BMC Genomics</source>
<issn>1471-2164</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>1</issue>
<fpage>99</fpage>
<url>http://www.biomedcentral.com/1471-2164/12/99</url>
<xrefbib><pubidlist><pubid idtype="pmpid">21303537</pubid><pubid idtype="doi">10.1186/1471-2164-12-99</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>24</day><month>5</month><year>2010</year></date></rec><acc><date><day>8</day><month>2</month><year>2011</year></date></acc><pub><date><day>8</day><month>2</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Der et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Because of their phylogenetic position and unique characteristics of their biology and life cycle, ferns represent an important lineage for studying the evolution of land plants. Large and complex genomes in ferns combined with the absence of economically important species have been a barrier to the development of genomic resources. However, high throughput sequencing technologies are now being widely applied to non-model species. We leveraged the Roche 454 GS-FLX Titanium pyrosequencing platform in sequencing the gametophyte transcriptome of bracken fern (<it>Pteridium aquilinum</it>) to develop genomic resources for evolutionary studies.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>681,722 quality and adapter trimmed reads totaling 254 Mbp were assembled <it>de novo </it>into 56,256 unique sequences (i.e. unigenes) with a mean length of 547.2 bp and a total assembly size of 30.8 Mbp with an average read-depth coverage of 7.0&#215;. We estimate that 87% of the complete transcriptome has been sequenced and that all transcripts have been tagged. 61.8% of the unigenes had blastx hits in the NCBI nr protein database, representing 22,596 unique best hits. The longest open reading frame in 52.2% of the unigenes had positive domain matches in InterProScan searches. We assigned 46.2% of the unigenes with a GO functional annotation and 16.0% with an enzyme code annotation. Enzyme codes were used to retrieve and color KEGG pathway maps. A comparative genomics approach revealed a substantial proportion of genes expressed in bracken gametophytes to be shared across the genomes of <it>Arabidopsis</it>, <it>Selaginella </it>and <it>Physcomitrella</it>, and identified a substantial number of potentially novel fern genes. By comparing the list of <it>Arabidopsis </it>genes identified by blast with a list of gametophyte-specific <it>Arabidopsis </it>genes taken from the literature, we identified a set of potentially conserved gametophyte specific genes. We screened unigenes for repetitive sequences to identify 548 potentially-amplifiable simple sequence repeat loci and 689 expressed transposable elements.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>This study is the first comprehensive transcriptome analysis for a fern and represents an important scientific resource for comparative evolutionary and functional genomics studies in land plants. We demonstrate the utility of high-throughput sequencing of a normalized cDNA library for <it>de novo </it>transcriptome characterization and gene discovery in a non-model plant.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>As the sister lineage to seed plants, ferns represent a critical clade for comparative evolutionary studies in land plants <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
</abbrgrp>. In contrast to seed plants, ferns typically retain the ancestral condition for a suite of life history traits (e.g. the lack of secondary growth, homospory, motile sperm, and independent free-living gametophyte and sporophyte generations). Ferns are thus an important outgroup for studying the evolution of wood, seeds, pollen, flowers, and fruit among other economically important characteristics found in seed plants, as well as the evolution of development in these complex structures and the expansion of gene families associated with seed plant evolution (e.g. transcription associated proteins). For reasons not yet fully understood, ferns typically have much higher chromosome numbers and larger genomes than seed plants <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
</abbrgrp>.</p>
<p>Understanding the factors that influence these differences and their evolutionary consequences will require developing genomic resources in ferns to provide the necessary comparative context to understand the evolution of these traits <abbrgrp>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
<abbr bid="B5">5</abbr>
</abbrgrp>. Additionally, because ferns have evolved and maintained free-living and photosynthetic gametophyte and sporophyte life stages, they are an ideal group for studies of life-cycle evolution in land plants and genome function in independent haploid and diploid phases.</p>
<p>Among the genomic tools available for non-model organisms, expressed sequence tags (ESTs) have proven to be a rapid and cost effective strategy to develop sequence markers for comparative evolutionary and functional studies. While taxonomic sampling of plants in genome-scale projects has expanded substantially with dramatic decreases in sequencing cost, and increases in throughput, the development of genomic resources in ferns has lagged far behind those of other plant groups. This deficit has primarily been attributed to technical and economic barriers associated with the complex and large genomes in ferns, but is compounded by the limited agronomic value of most ferns <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>. To date (December 2010), genomic information in ferns is limited to a genetic linkage map <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp> and a modest EST data set comprised of about 5,000 Sanger cDNA sequences <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp> for <it>Ceratopteris richardii</it>, just over 30,500 ESTs for <it>Adiantum capillus-veneris </it>
<abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>, and over 2,600 ESTs in <it>Osmunda lancea </it>
<abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>. Fewer than 500 ESTs for <it>Pteridium aquilinum </it>have previously been sequenced and deposited in Genbank <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>.</p>
<p>With the introduction of cost efficient and massively-parallel high-throughput sequencing technologies, genome-scale studies in non-model organisms are being actively pursued for gene discovery, expression profiling, SNP and SSR marker development, and studies in functional, comparative, and evolutionary genomics in taxa where little or no previous genomic information exists <abbrgrp>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
<abbr bid="B26">26</abbr>
<abbr bid="B27">27</abbr>
</abbrgrp>. We chose the Roche 454 GS-FLX Titanium pyrosequencing technology to sequence a full length enriched normalized cDNA library for the gametophyte generation of the bracken fern, <it>Pteridium aquilinum </it>(L.) Kuhn.</p>
<p>
<it>Pteridium </it>(family: Dennstaedtiaceae) is a cosmopolitan fern genus comprised of several closely related species that are well differentiated from other genera in the family. <it>Pteridium aquilinum </it>is the most widespread of the bracken species and is distributed throughout the northern hemisphere and Africa <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. Bracken is notorious as a weed in open fields and is toxic to people and livestock. Despite its toxicity, bracken is eaten as a delicacy in several parts of the world, and due to its often high local abundance and large coarse stature, is sometimes used as thatching or packing material. Because bracken is common, easily cultured and manipulated, and can have a major economic impact, it has become one of the most intensively studied fern species.</p>
<p>Bracken has been used as a model system for the study of the fern life cycle <abbrgrp>
<abbr bid="B29">29</abbr>
<abbr bid="B30">30</abbr>
<abbr bid="B31">31</abbr>
<abbr bid="B32">32</abbr>
<abbr bid="B33">33</abbr>
<abbr bid="B34">34</abbr>
<abbr bid="B35">35</abbr>
<abbr bid="B36">36</abbr>
<abbr bid="B37">37</abbr>
</abbrgrp>, gametophyte development and the pheromonal mechanism of sex determination <abbrgrp>
<abbr bid="B38">38</abbr>
<abbr bid="B39">39</abbr>
<abbr bid="B40">40</abbr>
<abbr bid="B41">41</abbr>
<abbr bid="B42">42</abbr>
<abbr bid="B43">43</abbr>
<abbr bid="B44">44</abbr>
<abbr bid="B45">45</abbr>
</abbrgrp>, cyanogenesis <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp>, carcinogenesis <abbrgrp>
<abbr bid="B47">47</abbr>
<abbr bid="B48">48</abbr>
<abbr bid="B49">49</abbr>
</abbrgrp>, invasion ecology <abbrgrp>
<abbr bid="B50">50</abbr>
<abbr bid="B51">51</abbr>
<abbr bid="B52">52</abbr>
</abbrgrp>, and climate change <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp>. <it>Pteridium aquilinum </it>has a diploid chromosome count of 2n = 104 and a total genome size of about 9.8 Gbp <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp>.</p>
<p>This study was conceived to develop an extensive expressed gene sequence resource in ferns for evolutionary and functional genomics. We present the first comprehensive transcriptome characterization for a fern gametophyte, including an assessment of transcriptome coverage, gene family and functional representation, SSR identification, and a comparative analysis of gene sets across land plants.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<sec>
<st>
<p>Sequencing and <it>de novo </it>assembly</p>
</st>
<p>Raw Roche 454 GS-FLX Titanium reads were quality and adapter trimmed and size selected to yield 681,722 cleaned reads with a mean length of 372.6 bp and 254 Mbp of total sequence data (Table <tblr tid="T1">1</tblr>, Figure <figr fid="F1">1A</figr>). Reads were first assembled in MIRA v3.0rc4 <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp> and the resulting assembly was passed through a second assembly step in CAP3 <abbrgrp>
<abbr bid="B56">56</abbr>
</abbrgrp> to join additional contigs (Table <tblr tid="T2">2</tblr>). The resulting final assembly consisted of 56,256 unique sequences (i.e. retained singletons plus primary and secondary contigs, hereafter referred to as unigenes; Additional file <supplr sid="S1">1</supplr>). Unigenes had a mean length of 547.2 bp and summed to a total assembly length of 30.79 Mbp (Table <tblr tid="T1">1</tblr>, Figure <figr fid="F1">1B</figr>). The average read-depth coverage for the final unigene assembly was 7.0&#215; (Table <tblr tid="T2">2</tblr>). The distribution of unigene coverage was highly left-skewed toward low coverage with an extremely long tail (maximum coverage was 2,078&#215;; Figure <figr fid="F1">1C, D</figr>). The steep decline in read-depth coverage suggests that cDNA normalization was effective and is typical for a normalized library <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Sequence statistics</p></caption><tblbdy cols="4">
      <r>
         <c>
            <p/>
         </c>
         <c ca="right">
            <p>
               <b>Raw reads</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Cleaned reads</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Unigenes</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of sequences</p>
         </c>
         <c ca="right">
            <p>730,577</p>
         </c>
         <c ca="right">
            <p>681,722</p>
         </c>
         <c ca="right">
            <p>56,256</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mean length (bp)</p>
         </c>
         <c ca="right">
            <p>363.55</p>
         </c>
         <c ca="right">
            <p>372.60</p>
         </c>
         <c ca="right">
            <p>547.23</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Standard deviation on length (bp)</p>
         </c>
         <c ca="right">
            <p>118.60</p>
         </c>
         <c ca="right">
            <p>96.36</p>
         </c>
         <c ca="right">
            <p>276.06</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mode length (bp)</p>
         </c>
         <c ca="right">
            <p>416</p>
         </c>
         <c ca="right">
            <p>383</p>
         </c>
         <c ca="right">
            <p>466</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Median length (bp)</p>
         </c>
         <c ca="right">
            <p>398</p>
         </c>
         <c ca="right">
            <p>394</p>
         </c>
         <c ca="right">
            <p>479</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Range in length (bp)</p>
         </c>
         <c ca="right">
            <p>2 - 624</p>
         </c>
         <c ca="right">
            <p>78 - 624</p>
         </c>
         <c ca="right">
            <p>86 - 3229</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total length (Mbp)</p>
         </c>
         <c ca="right">
            <p>265.60</p>
         </c>
         <c ca="right">
            <p>254.01</p>
         </c>
         <c ca="right">
            <p>30.79</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Summary statistics for sequence data at different stages of processing.</p>
   </tblfn></tbl>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Overview of <it>P. aquilinum </it>transcriptome sequencing and assembly</p></caption><text>
   <p><b>Overview of <it>P. aquilinum </it>transcriptome sequencing and assembly</b>. (A) A histogram of the filter passed and adapter/quality trimmed Roche 454 GS-FLX Titanium read lengths. (B) A histogram of unigene lengths for the final unigene set after the 2-step assembly. Note that the longest unigene is 4,489 bp and the x-axis has been truncated at 3 kb. (C) A histogram of the average read-depth coverage for unigenes. The steep decline in coverage observed here is typical of normalized libraries <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Coverage values between 30&#215; and 1800&#215; have been binned (see the vertical axis in Figure 1D). (D) A density scatterplot showing the relationship between unigene length and coverage. Points with a higher local density are darker.</p>
</text><graphic file="1471-2164-12-99-1" hint_layout="double"/></fig>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>Assembly summary statistics</p></caption><tblbdy cols="3">
      <r>
         <c>
            <p/>
         </c>
         <c ca="right">
            <p>
               <b>Primary assembly (MIRA)</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Secondary assembly (MIRA+CAP3)</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of reads assembled into contigs</p>
         </c>
         <c ca="right">
            <p>574,134</p>
         </c>
         <c ca="right">
            <p>640,285</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of reads discarded during assembly</p>
         </c>
         <c ca="right">
            <p>31,723</p>
         </c>
         <c ca="right">
            <p>0</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of 454 reads retained as singletons</p>
         </c>
         <c ca="right">
            <p>75,865</p>
         </c>
         <c ca="right">
            <p>9,714</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of primary contigs (MIRA)</p>
         </c>
         <c ca="right">
            <p>91,100</p>
         </c>
         <c ca="right">
            <p>24,775</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of secondary contigs (CAP3)</p>
         </c>
         <c ca="right">
            <p>0</p>
         </c>
         <c ca="right">
            <p>21,767</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total number of unique sequences (unigenes)</p>
         </c>
         <c ca="right">
            <p>166,965</p>
         </c>
         <c ca="right">
            <p>56,256</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mean unigene length (bp)</p>
         </c>
         <c ca="right">
            <p>423.11</p>
         </c>
         <c ca="right">
            <p>547.23</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Largest unigene length (bp)</p>
         </c>
         <c ca="right">
            <p>1,746</p>
         </c>
         <c ca="right">
            <p>3,229</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total assembly length (Mbp)</p>
         </c>
         <c ca="right">
            <p>70.65</p>
         </c>
         <c ca="right">
            <p>30.79</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mean read depth coverage</p>
         </c>
         <c ca="right">
            <p>3.03</p>
         </c>
         <c ca="right">
            <p>6.96</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>A comparison of the primary and secondary assemblies. Secondary assembly was used to condense and join contigs and singletons from the primary assembly to reduce sequence level redundancy in the final unigene set.</p>
   </tblfn></tbl>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Unigene builds</b>. Unigene sequences in FASTA format, compressed zip file.</p>
</text>
<file name="1471-2164-12-99-S1.ZIP">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Transcriptome coverage and data quality</p>
</st>
<p>Because information about the actual size and composition of the transcriptome is often unknown, we utilized a simulation-based tool, ESTcalc <abbrgrp>
<abbr bid="B57">57</abbr>
</abbrgrp>, to estimate the expected depth and breadth of transcriptome coverage for this data set. The model for transcriptome coverage backing ESTcalc was parameterized using the well-characterized <it>Arabidopsis thaliana </it>transcriptome and several "next-generation" sequencing runs using normalized and non-normalized cDNA libraries <abbrgrp>
<abbr bid="B57">57</abbr>
</abbrgrp>. Using the results from these simulations (retrieved using ESTcalc), our dataset is expected to cover 87% of the nucleotide positions in the transcriptome (Table <tblr tid="T3">3</tblr>), with every gene represented by at least one read (i.e. percent of genes tagged).</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>Transcriptome coverage estimates: ESTcalc</p></caption><tblbdy cols="3">
      <r>
         <c ca="left">
            <p>
               <b>Input Parameters</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>ESTcalc estimate</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Actual</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of technologies</p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Technology</p>
         </c>
         <c ca="center">
            <p>454 GS-FLX</p>
         </c>
         <c ca="center">
            <p>454 GS-FLX (Titanium)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Library type</p>
         </c>
         <c ca="center">
            <p>normalized</p>
         </c>
         <c ca="center">
            <p>normalized</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>MB/Plate</p>
         </c>
         <c ca="center">
            <p>254</p>
         </c>
         <c ca="center">
            <p>254.0076</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Reads/Plate</p>
         </c>
         <c ca="center">
            <p>681,722</p>
         </c>
         <c ca="center">
            <p>681,722</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>BP/Read (mean)</p>
         </c>
         <c ca="center">
            <p>372.6</p>
         </c>
         <c ca="center">
            <p>372.6</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Predicted assembly statistics</b>
            </p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total Assembled Sequence (MB)</p>
         </c>
         <c ca="center">
            <p>26.2</p>
         </c>
         <c ca="center">
            <p>30.97</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Unigene count</p>
         </c>
         <c ca="center">
            <p>32,044</p>
         </c>
         <c ca="center">
            <p>56,256</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mean unigene length (bp)</p>
         </c>
         <c ca="center">
            <p>819</p>
         </c>
         <c ca="center">
            <p>547.23</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mean unigene length (longest unigene per gene, bp)</p>
         </c>
         <c ca="center">
            <p>1,143</p>
         </c>
         <c ca="center">
            <p>--</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Singleton yield (%)</p>
         </c>
         <c ca="center">
            <p>19</p>
         </c>
         <c ca="center">
            <p>17.2400</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Percent transcriptome (%)</p>
         </c>
         <c ca="center">
            <p>87</p>
         </c>
         <c ca="center">
            <p>--</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Percent of genes tagged (%)</p>
         </c>
         <c ca="center">
            <p>100</p>
         </c>
         <c ca="center">
            <p>--</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Percent of genes with 90% coverage (%)</p>
         </c>
         <c ca="center">
            <p>69.8</p>
         </c>
         <c ca="center">
            <p>--</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Percent of genes with 90% coverage by largest unigene (%)</p>
         </c>
         <c ca="center">
            <p>56.4</p>
         </c>
         <c ca="center">
            <p>--</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Percent of genes with 100% coverage (%)</p>
         </c>
         <c ca="center">
            <p>23.7</p>
         </c>
         <c ca="center">
            <p>--</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Percent of genes with 100% coverage by largest unigene (%)</p>
         </c>
         <c ca="center">
            <p>22.2</p>
         </c>
         <c ca="center">
            <p>--</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Estimates of transcriptome coverage based on simulations modeled using the <it>Arabidopsis thalliana </it>floral transcriptome <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>.</p>
   </tblfn></tbl>
<p>Additionally, 70% of the genes are predicted to be sequenced to 90% of their length. Consistent with these estimates, we were able to identify 333 of 357 (93.3%) <it>Arabidopsis </it>genes that are conserved as single copy genes across all Eukaryotes (i.e. ultra-conserved orthologs; UCOs <abbrgrp>
<abbr bid="B58">58</abbr>
</abbrgrp>). Similarly, we detected 754 of 959 (78.6%) shared single copy tribes from <it>Arabidopsis </it>
<it>thaliana</it>, <it>Populus trichocarpa</it>, <it>Vitis vinifera</it>, and <it>Oryza sativa </it>in our classification of unigenes in the PlantTribes database <abbrgrp>
<abbr bid="B59">59</abbr>
<abbr bid="B60">60</abbr>
</abbrgrp>). These two gene sets (UCOs and shared single copy tribes) represent a highly conserved subset of genes expected to be present in eukaryotic and plant genomes, respectively, and can be used as a proxy for gene detection and sampling breadth. As a final measure of gene detection in this data set, we utilized a bootstrapped data resampling approach using the distribution of reads in our final assembly (see methods section) to generate a unigene accumulation curve which plots the number of unigenes detected as a function of sequencing effort (Figure <figr fid="F2">2</figr>). Using this method, we estimate that on average 90%, 95% and 99% of the unigenes were tagged after approximately 378,683; 455,145; and 543,727 reads were sampled (Figure <figr fid="F2">2</figr>). On average, it took 59 reads to detect each of the last ten unigenes.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Unigene accumulation curve</p></caption><text>
   <p><b>Unigene accumulation curve</b>. The mean number of unigenes detected as a function of the number of reads sampled. The complete set of reads in the 2-step assembly were shuffled and drawn at random for 1,000 bootstrap replicates.</p>
</text><graphic file="1471-2164-12-99-2" hint_layout="double"/></fig>
<p>To identify potential contaminant sequences in the sample or sequencing library, we examined the taxonomic distribution of blastx hits for each unigene searched in the NCBI nr database. We examined both the taxonomic classification of the best hit as well as the lowest common ancestor (LCA) assignment for each unigene using MEGAN v.3.7.2 <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp>. 34,740 unigenes had a positive a blast hit, of which only 1.8% had a best hit to an organism outside of the green plants and 1.1% received an LCA assigned taxon which is not within, or a super set of land plants (Table <tblr tid="T4">4</tblr>). We also examined the unigene set for potential genomic DNA contamination by screening unigenes for blastn hits to the complete chloroplast genome sequence of <it>Pteridium aquilinum </it>(HM535629 <abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp>). None of the chloroplast sequences identified in the transcriptome were longer than 3.5 kb or contained more than five adjacent genes (most spanned only a single gene) and thus can reasonably be considered putative transcripts <abbrgrp>
<abbr bid="B63">63</abbr>
<abbr bid="B64">64</abbr>
</abbrgrp>. That we did not detect any long fragments of chloroplast DNA in the transcriptome assembly gives us confidence that our DNase treatment during RNA extraction was effective and the resulting cDNA library used in sequencing is free of contaminant genomic DNA.</p>
<tbl id="T4"><title><p>Table 4</p></title><caption><p>Taxonomic distribution of unigene blastx hits in the nr database</p></caption><tblbdy cols="5">
      <r>
         <c>
            <p/>
         </c>
         <c ca="right" cspan="2">
            <p>
               <b>Best blastx hit</b>
            </p>
         </c>
         <c ca="right" cspan="2">
            <p>
               <b>Lowest common ancestor for blastx hits</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Taxonomic category</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Number of unigenes</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Percent of unigenes with hits</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Number of unigenes</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Percent of Unigenes with hits</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Eukaryotes</p>
         </c>
         <c ca="right">
            <p>33,776</p>
         </c>
         <c ca="right">
            <p>97.2%</p>
         </c>
         <c ca="right">
            <p>32,059</p>
         </c>
         <c ca="right">
            <p>92.3%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>Green plants</p>
         </c>
         <c ca="right">
            <p>33,406</p>
         </c>
         <c ca="right">
            <p>96.2%</p>
         </c>
         <c ca="right">
            <p>31,373</p>
         </c>
         <c ca="right">
            <p>90.3%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="2">
            <p>"Green algae"</p>
         </c>
         <c ca="right">
            <p>175</p>
         </c>
         <c ca="right">
            <p>0.5%</p>
         </c>
         <c ca="right">
            <p>78</p>
         </c>
         <c ca="right">
            <p>0.2%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="2">
            <p>Land plants</p>
         </c>
         <c ca="right">
            <p>33,231</p>
         </c>
         <c ca="right">
            <p>95.7%</p>
         </c>
         <c ca="right">
            <p>30,822</p>
         </c>
         <c ca="right">
            <p>88.7%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="2">
            <p>"Bryophytes"</p>
         </c>
         <c ca="right">
            <p>394</p>
         </c>
         <c ca="right">
            <p>1.1%</p>
         </c>
         <c ca="right">
            <p>2,197</p>
         </c>
         <c ca="right">
            <p>6.3%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="2">
            <p>Vascular plants</p>
         </c>
         <c ca="right">
            <p>32,837</p>
         </c>
         <c ca="right">
            <p>94.5%</p>
         </c>
         <c ca="right">
            <p>16,731</p>
         </c>
         <c ca="right">
            <p>48.2%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="2">
            <p>Lycophytes</p>
         </c>
         <c ca="right">
            <p>74</p>
         </c>
         <c ca="right">
            <p>0.2%</p>
         </c>
         <c ca="right">
            <p>13</p>
         </c>
         <c ca="right">
            <p>0.0%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="2">
            <p>Ferns</p>
         </c>
         <c ca="right">
            <p>928</p>
         </c>
         <c ca="right">
            <p>2.7%</p>
         </c>
         <c ca="right">
            <p>435</p>
         </c>
         <c ca="right">
            <p>1.3%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="2">
            <p>Seed plants</p>
         </c>
         <c ca="right">
            <p>31,835</p>
         </c>
         <c ca="right">
            <p>91.6%</p>
         </c>
         <c ca="right">
            <p>16,015</p>
         </c>
         <c ca="right">
            <p>46.1%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="3">
            <p>Gymnosperms</p>
         </c>
         <c ca="right">
            <p>8,000</p>
         </c>
         <c ca="right">
            <p>23.0%</p>
         </c>
         <c ca="right">
            <p>866</p>
         </c>
         <c ca="right">
            <p>2.5%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="3">
            <p>Angiosperms</p>
         </c>
         <c ca="right">
            <p>23,835</p>
         </c>
         <c ca="right">
            <p>68.6%</p>
         </c>
         <c ca="right">
            <p>10,572</p>
         </c>
         <c ca="right">
            <p>30.4%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>Animals</p>
         </c>
         <c ca="right">
            <p>288</p>
         </c>
         <c ca="right">
            <p>0.8%</p>
         </c>
         <c ca="right">
            <p>63</p>
         </c>
         <c ca="right">
            <p>0.2%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>Fungi</p>
         </c>
         <c ca="right">
            <p>0</p>
         </c>
         <c ca="right">
            <p>0.0%</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
         <c ca="right">
            <p>0.0%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>Other eukaryotes</p>
         </c>
         <c ca="right">
            <p>77</p>
         </c>
         <c ca="right">
            <p>0.2%</p>
         </c>
         <c ca="right">
            <p>12</p>
         </c>
         <c ca="right">
            <p>0.0%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Bacteria</p>
         </c>
         <c ca="right">
            <p>22</p>
         </c>
         <c ca="right">
            <p>0.1%</p>
         </c>
         <c ca="right">
            <p>91</p>
         </c>
         <c ca="right">
            <p>0.3%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Artificial sequences, hits don't pass threshold, or taxon not assigned</p>
         </c>
         <c ca="right">
            <p>20</p>
         </c>
         <c ca="right">
            <p>0.1%</p>
         </c>
         <c ca="right">
            <p>216</p>
         </c>
         <c ca="right">
            <p>0.6%</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Unigenes were searched in the NCBI nr protien database using blastx with an e-value threshold of 1e-10, keeping the best ten hits. Of the 56,256 unigenes, 34,740 (61.8%) had a positive hit. The lowest common ancestor (LCA) assignment for a sequence was calculated using the LCA algorithm implemented in MEGAN v3.9 <abbrgrp><abbr bid="B61">61</abbr></abbrgrp> based on at least three blastx hits with a bitscore greater than 75 and within 10% of the best bitscore. Note: the predicted proteins from <it>Selaginella moellendorffii </it>are not currently included in the nr database and thus are not reflected in these results.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Functional annotation</p>
</st>
<p>Unigenes were annotated with gene ontology (GO) terms, enzyme codes, and conserved protein domain functions using the Blast2GO suite <abbrgrp>
<abbr bid="B65">65</abbr>
<abbr bid="B66">66</abbr>
<abbr bid="B67">67</abbr>
</abbrgrp>. Unigenes were first interrogated against the NCBI nr protein database using a blastx e-value threshold of 1e-10, keeping the top 10 high scoring alignments, resulting in 34,740 unigenes (61.8%) with positive blast hits. The best blastx hits for these unigenes corresponded to 22,596 unique protein accessions in the nr database. The longest open reading frame (uncorrected six-frame translations automated in Blast2GO) from 29,357 unigenes (52.2%) had positive matches to conserved protein domains using InterProScan (IPS) searches implemented in Blast2GO. These results (nr blastx and IPS) were used to assign 87,137 GO terms to 25,999 unigenes (Additional file <supplr sid="S2">2</supplr>). These GO terms were used to map 11,243 enzyme codes to 8,993 unigenes. Enzyme codes were then used then to retrieve and color 144 KEGG pathway maps. To assess whether the frequency of functional categories present in the <it>Pteridium </it>transcriptome differ significantly from the suite of gene functions present in other plants, we compared the GO terms assigned to <it>Pteridium </it>unigenes with the GO classification for all genes in the <it>Arabidopsis thalliana </it>genome (TAIR9 GO annotation downloaded on 13 September 2010) using a two-tailed FDR-corrected Fisher's exact test. When using the full GO classification, none of the GO terms in <it>Pteridium </it>are significantly enriched or underrepresented relative to the full GO classification for <it>Arabidopsis </it>(FDR-corrected alpha = 0.05). To examine broad-level classification of gene functions in the bracken transcriptome, we mapped GO terms to the GO-slim vocabulary (Figure <figr fid="F3">3</figr>) and repeated the Fisher's exact test. 42 GO-slim categories were overrepresented and 88 categories were underrepresented in the <it>Pteridium </it>transcriptome relative to the <it>Arabidopsis thalliana </it>GO-slim annotation (FDR-corrected alpha = 0.05; Additional file <supplr sid="S3">3</supplr>).</p>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>Unigene functional annotations from Blast2GO</b>. GO and EC functional classification for unigenes.</p>
</text>
<file name="1471-2164-12-99-S2.CSV">
   <p>Click here for file</p>
</file>
</suppl>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Distribution of plant GO-slim functional categories</p></caption><text>
   <p><b>Distribution of plant GO-slim functional categories</b>. The relative proportion of plant GO-slim terms represented by more than 150 unigenes for the three major categories in the GO vocabulary (biological process, cellular component, and molecular function).</p>
</text><graphic file="1471-2164-12-99-3" hint_layout="double"/></fig>
<suppl id="S3">
<title>
<p>Additional file 3</p>
</title>
<text>
<p>
<b>Functional annotation enrichment for GO-slim categories relative to the complete Arabidopsis genome</b>. FDR-corrected Fisher exact test for GO-slim categories represented in the <it>Pteridium </it>unigene set and the <it>Arabidopsis </it>genome.</p>
</text>
<file name="1471-2164-12-99-S3.CSV">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Comparative genomics</p>
</st>
<p>Unigenes were classified into 6,987 tribe (inflation level 3) and 9,395 orthogroup MCL clusters (Additional file <supplr sid="S4">4</supplr>) in the PlantTribes gene family database on the basis of the best blastx hit to the inferred protein models of ten complete plant genomes included in an updated version of the PlantTribes database (<abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp> and CWD, unpublished). To evaluate the level of gene overlap between the <it>Pteridium </it>gametophyte transcriptome and other land plants, we examined overlap in both PlantTribes orthogroup cluster membership and blastx hits for predicted proteins in <it>Physcomitrella patens</it>, <it>Selaginella moellendorffii</it>, and <it>Arabidopsis thaliana </it>(Figure <figr fid="F4">4</figr>). Among genes in the <it>Arabidopsis </it>genome with positive blastx hits with <it>Pteridium </it>unigenes, we examined for the presence of "gametophyte genes" previously identified in the literature. Honys and Twell <abbrgrp>
<abbr bid="B68">68</abbr>
</abbrgrp> used microarrays to identify 1,355 genes specifically expressed in haploid male gametophyte tissues in <it>Arabidopsis</it>, that is, genes consistently expressed in at least one of four male gametophyte developmental stages and absent in six sporophytic tissue gene expression profiles. Similarly, Yu et al. <abbrgrp>
<abbr bid="B69">69</abbr>
</abbrgrp> and Wuest et al. <abbrgrp>
<abbr bid="B70">70</abbr>
</abbrgrp> identified 911 genes (combined) that were significantly over-expressed in female gametophytic cells relative to sporophytic tissues. In total, we identified 1,156 known <it>Arabidopsis </it>gametophyte genes that produced significant alignments with <it>Pteridium </it>unigenes in our blastx search (Figure <figr fid="F5">5</figr>).</p>
<suppl id="S4">
<title>
<p>Additional file 4</p>
</title>
<text>
<p>
<b>PlantTribes2.0 gene family classification</b>. Tribe and orthogroup assignments for each unigene and cluster membership for the ten proteomes included in PlantTribes2.0. Functional and gene family descriptors for clusters are primarily inherited from the <it>Arabidopsis thaliana </it>genes included in the cluster.</p>
</text>
<file name="1471-2164-12-99-S4.CSV">
   <p>Click here for file</p>
</file>
</suppl>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Homologous gene detection in diverse plant proteomes</p></caption><text>
   <p><b>Homologous gene detection in diverse plant proteomes</b>. (A) Blastx: The complete unigene set was queried against the complete set of predicted proteins in the genomes of <it>Arabidopsis thaliana</it>, <it>Physcomitrella patens</it>, and <it>Selaginella moellendorfii </it>using an e-value cut off of 1e-5. Unigenes with positive hits in more than one proteome are shown in the intersect for those species. Of 56,256 total unigenes, 21,425 unigenes did not have a positive blast hit. (B) PlantTribes OrthoGroups: Unigenes were assigned to Tribe- and OrthoMCL clusters derived from the updated PlantTribes classification based on the best blast hit for each unigene. The presence of genes from <it>Arabidopsis thaliana</it>, <it>Physcomitrella patens</it>, and <it>Selaginella moellendorfii </it>in each OrthoGroup was evaluated. Of 56,256 total unigenes, 18,368 unigenes were not assigned to an OrthoGroup cluster and an additional 2,309 were assigned to clusters having no homologs from <it>Arabidopsis</it>, <it>Physcomitrella</it>, or <it>Selaginella</it>.</p>
</text><graphic file="1471-2164-12-99-4" hint_layout="double"/></fig>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Detection of homologs to <it>Arabidopsis </it>gametophyte genes</p></caption><text>
   <p><b>Detection of homologs to <it>Arabidopsis </it>gametophyte genes</b>. To screen for the presence of potential gametophyte genes, <it>Arabidopsis </it>genes producing significant alignments with <it>Pteridium </it>unigenes in a blastx search (e-value cutoff = 1e-10) were compared with the list of male gametophyte specific and female gametophyte enriched genes identified from the literature <abbrgrp><abbr bid="B68">68</abbr><abbr bid="B69">69</abbr><abbr bid="B70">70</abbr></abbrgrp>.</p>
</text><graphic file="1471-2164-12-99-5" hint_layout="single"/></fig>
</sec>
<sec>
<st>
<p>Repetitive sequence characterization</p>
</st>
<p>A total of 2,679 perfect di-, tri-, tetra-, and pentanucleotide simple sequence repeats (SSRs) longer than 9, 8, 6, and 5 repeats, respectively, were identified in 2,285 unigenes (Additional file <supplr sid="S5">5</supplr>) using msatCommander <abbrgrp>
<abbr bid="B71">71</abbr>
</abbrgrp>. Sufficient flanking sequences existed to design high quality primers for 548 potentially amplifiable SSR loci. PCR primers were chosen using Primer3 <abbrgrp>
<abbr bid="B72">72</abbr>
</abbrgrp> as implemented in msatCommander <abbrgrp>
<abbr bid="B71">71</abbr>
</abbrgrp> (Additional file <supplr sid="S6">6</supplr>). Since this RNA was extracted from gametophytes derived from spores collected from a single diploid sporophyte, we are unable to determine the level of variation present at these SSR loci in natural populations.</p>
<suppl id="S5">
<title>
<p>Additional file 5</p>
</title>
<text>
<p>
<b>SSR loci identified in msatCOMMANDER</b>. Repeat sequence information (repeat motif, location, and lenth) for SSR loci identified by msatCOMMANDER.</p>
</text>
<file name="1471-2164-12-99-S5.CSV">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S6">
<title>
<p>Additional file 6</p>
</title>
<text>
<p>
<b>Primer sequences and details for SSR loci</b>. Primer sequences for potentially amplifiable SSR loci selected using msatCOMMANDER.</p>
</text>
<file name="1471-2164-12-99-S6.CSV">
   <p>Click here for file</p>
</file>
</suppl>
<p>To identify and classify expressed repeat sequences, we screened the <it>Pteridium </it>unigenes with RepeatMasker, using RepBase sequences belonging to land plants (Embryophyta). In total, 416 retrotransposons were identified, representing 0.17% of the total unigene sequence length (Table <tblr tid="T5">5</tblr>). Additionally, 269 DNA transposons were identified, representing 0.07% of the total sequence length (Table <tblr tid="T5">5</tblr>).</p>
<tbl id="T5"><title><p>Table 5</p></title><caption><p>Repetitive transposon classification</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Transposon class</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Number of elements</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Total length</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Percentage of sequence</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Retroelements</p>
         </c>
         <c ca="right">
            <p>416</p>
         </c>
         <c ca="right">
            <p>51,070</p>
         </c>
         <c ca="right">
            <p>0.17%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>LINE/L1</p>
         </c>
         <c ca="right">
            <p>38</p>
         </c>
         <c ca="right">
            <p>2,458</p>
         </c>
         <c ca="right">
            <p>0.01%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>LTR</p>
         </c>
         <c ca="right">
            <p>378</p>
         </c>
         <c ca="right">
            <p>48,612</p>
         </c>
         <c ca="right">
            <p>0.16%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="2">
            <p>Copia</p>
         </c>
         <c ca="right">
            <p>183</p>
         </c>
         <c ca="right">
            <p>21,750</p>
         </c>
         <c ca="right">
            <p>0.07%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="2">
            <p>Gypsy</p>
         </c>
         <c ca="right">
            <p>195</p>
         </c>
         <c ca="right">
            <p>26,862</p>
         </c>
         <c ca="right">
            <p>0.09%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>DNA transposons</p>
         </c>
         <c ca="right">
            <p>269</p>
         </c>
         <c ca="right">
            <p>22,699</p>
         </c>
         <c ca="right">
            <p>0.07%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>hobo-Activator</p>
         </c>
         <c ca="right">
            <p>20</p>
         </c>
         <c ca="right">
            <p>2,022</p>
         </c>
         <c ca="right">
            <p>0.01%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>Tc1-IS630-Pogo</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>46</p>
         </c>
         <c ca="right">
            <p>0.00%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>En-Spm</p>
         </c>
         <c ca="right">
            <p>180</p>
         </c>
         <c ca="right">
            <p>13,395</p>
         </c>
         <c ca="right">
            <p>0.04%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>MuDR-IS905</p>
         </c>
         <c ca="right">
            <p>33</p>
         </c>
         <c ca="right">
            <p>1,834</p>
         </c>
         <c ca="right">
            <p>0.00%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>Harbinger</p>
         </c>
         <c ca="right">
            <p>6</p>
         </c>
         <c ca="right">
            <p>368</p>
         </c>
         <c ca="right">
            <p>0.00%</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>Rolling circle Helitrons</p>
         </c>
         <c ca="right">
            <p>29</p>
         </c>
         <c ca="right">
            <p>5,034</p>
         </c>
         <c ca="right">
            <p>0.02%</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Classification and frequency statistics for repetitive elements identified by RepeatMasker. The database used to screen unigenes was built with repeat sequences identified in RepBase as belonging to land plants (Embryophyta).</p>
   </tblfn></tbl>
</sec>
</sec>
<sec>
<st>
<p>Discussion</p>
</st>
<p>We have used high-throughput sequencing data to characterize the gametophyte transcriptome of <it>Pteridium aquilinum</it>, a species for which very little genomic data are available. These data represent an 865-fold increase over the expressed sequence data previously available for <it>Pteridium </it>in Genbank <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>.</p>
<sec>
<st>
<p>Assembly quality</p>
</st>
<p>Because contaminant adapter/primer sequences, polyA/T repeats, and low complexity end sequences can substantially compromise <it>de novo </it>assembly and can be difficult to completely remove (KM Dlugosch, personal communication), we aggressively filtered and trimmed the reads beyond the default instrument-level processing routines at the cost of sequence information loss (approximately 11.6 Mbp were removed, representing 4.4% of the filter-passed data).</p>
<p>Considering the sheer quantity and depth of sequencing produced by next-generation sequencing platforms, we deemed this an acceptable level of loss to improve accuracy in the assembly. We also used a two-step assembly strategy to minimize redundancy in our final unigene sequence set. We adopted this approach because MIRA is able to handle the large number of reads produced by 454 sequencing and utilizes a multi-pass strategy to identify and correct sequencing and assembly errors to produce a highly accurate assembly, but is sensitive to uneven sequencing depth of coverage and allelic diversity, resulting in a high number of redundant contigs. CAP3 is a proven and efficient DNA sequence assembler that can be used to join highly similar overlapping sequences, but is unable to handle the large number of reads produced by new high-throughput sequencing platforms. By combining these two assembly tools, we were able to join contigs and singletons that failed to assemble in MIRA to reduce sequence-level redundancy in our final unigene sequence set.</p>
<p>In examining the taxonomic distribution of nr blastx hits for the unigenes, we identified only a small proportion of sequences with best blast hits or LCA assignments outside of the green plants. When we examine these hits in greater detail, we find that many of them only align to short conserved domains, are hypothetical proteins of unknown function from model organisms, or are genes which are conserved across broad taxonomic levels, such as cytochrome P450, alpha-tubulin and dynein proteins. Additionally, because no other fern genomes have been sequenced, some of these sequences may represent novel fern genes. Thus, the evidence indicates that there is very little heterospecific sequence contamination in these data.</p>
</sec>
<sec>
<st>
<p>Transcriptome coverage</p>
</st>
<p>While the simulations that underlay ESTcalc are based on the well characterized <it>Arabidopsis thaliana </it>floral transcriptome (approximately 18,000 genes with transcripts averaging 1,500 bp long) and assume perfect cDNA normalization and sequence assembly, Wall et al. <abbrgrp>
<abbr bid="B57">57</abbr>
</abbrgrp> show that their results were highly predictive for empirical datasets from diverse eukaryotic species and tissues, making their simulations useful as a null model for predicting transcriptome coverage in other organisms. The predictions for transcriptome coverage produced by ESTcalc are largely consistent with that observed in our two-step assembly. However, the larger assembly size, greater number of unigenes, and shorter unigene lengths observed in our data set relative to the ESTcalc prediction may be explained by imperfect cDNA normalization or inefficient <it>de novo </it>assembly. Additionally, it is also becoming evident that with increased transcriptome sequencing throughput, it is possible to capture a richer, more nuanced, picture of transcriptome complexity (e. g. partially processed transcripts and alternative splice forms) <abbrgrp>
<abbr bid="B73">73</abbr>
<abbr bid="B74">74</abbr>
<abbr bid="B75">75</abbr>
</abbrgrp>. This increased information content, however, presents significant challenges for <it>de novo </it>assembly and often results in a fragmented or partially redundant assembly <abbrgrp>
<abbr bid="B76">76</abbr>
<abbr bid="B77">77</abbr>
</abbrgrp>. Also consistent with the ESTcalc estimate that we have tagged all of the transcripts present in this sample, our unigene accumulation curve shows that the rate of new unigene detection for this cDNA library has declined to the point that additional sequencing is unlikely to detect new genes, but may however serve to condense and join non-overlapping contigs in our assembly. Similar approaches to evaluate sufficient sampling in transcriptome projects have been used by other researchers when other information about the transcriptome is absent <abbrgrp>
<abbr bid="B15">15</abbr>
<abbr bid="B78">78</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Functional annotation</p>
</st>
<p>The GO functional categories represented in the <it>Pteridium </it>gametophyte transcriptome are not significantly different from the suite of functional categories present in the full Arabidopsis genome GO annotation. Most of the unigenes annotated with a cellular component are localized to plastids or mitochondria, but a large number of them are also targeted for ribosomes or the plasma membrane (Figure <figr fid="F3">3</figr>). The molecular function of unigenes is heavily dominated by binding nucleic acids or proteins and metabolic activity, including hydrolase and kinase activity (Figure <figr fid="F3">3</figr>). The biological processes represented include all of the major cellular processes from transport and cellular organization to transcription, translation, and metabolism (Figure <figr fid="F3">3</figr>).</p>
<p>Visual examination of annotated/colored KEGG maps (not shown) indicates that we have captured all of the genes required for glycolysis, the citrate cycle, and plant hormone biosynthesis including gibberellin, abscisic acid, strigolactone, cytokinin, brassinosteroid, and auxin. We also detected Enzyme code signatures for most of the genes involved in nucleic and amino acid metabolism and chlorophyll biosynthesis.</p>
</sec>
<sec>
<st>
<p>Comparative genomics</p>
</st>
<p>The PlantTribes database contains an objective MCL cluster-based classification system for plant genes and gene families <abbrgrp>
<abbr bid="B60">60</abbr>
<abbr bid="B79">79</abbr>
<abbr bid="B80">80</abbr>
</abbrgrp>. By identifying similar sequences in this classification system, we assigned unigene sequences with putative gene family identities. The most abundant of these gene families present in the unigene set was the pentatricopeptide repeat protein (PPR) family, with over 600 unigenes classified as PPR proteins. We were also able to identify 65 unigenes classified in the MADS-box transcription factor family. Using this classification, gene sequences from <it>Pteridium </it>can be extracted for gene families of interest for use in studies of gene family evolution or phylogenomics. The overlap in orthogroup membership and blast hits for proteins in <it>Arabidopsis thaliana</it>, <it>Selaginella moellendorffii</it>, and <it>Physcomitrella patens </it>is similar (Figure <figr fid="F4">4A/B</figr>), but some striking differences can be observed. In both the PlantTribes and blast-based Venn diagrams, most of the unigenes which were identified in <it>Arabidopsis</it>, <it>Selaginella</it>, or <it>Physcomitrella </it>are also shared across all three species. In the PlantTribes classification, most of the genes are shared with <it>Arabidopsis </it>(21,649 unigenes), the species in this comparison that shares the most recent common ancestor with <it>Pteridium</it>, while slightly fewer and approximately equal gene representation is shared with <it>Selaginella </it>and <it>Physcomitrella </it>(19,649 and 19,485 unigenes, respectively). This is in contrast to the blast-based examination of gene set overlap in which <it>Arabidopsis </it>again has the greatest number of unigenes with hits (23,148 unigenes), but <it>Physcomitrella </it>has hits with 6,122 more unigenes than <it>Selaginella </it>(Figure <figr fid="F4">4</figr>). At first this seems counterintuitive because <it>Pteridium </it>shares a more recent common ancestor with <it>Selaginella </it>than with <it>Physcomitrella</it>. This pattern may be explained by the expression of "gametophyte genes" in <it>Pteridium </it>that are conserved with genes in the <it>Physcomitrella </it>genome, however little is known about the expression and evolution of genes between sporophyte and gametophyte stages. Both <it>Physcomitrella </it>and <it>Pteridium </it>have maintained a homosporous life cycle with a large independent gametophyte stage. These life history differences may also play a role on the selective pressures and/or constraints influencing gene evolution and more work is needed to address these hypotheses.</p>
<p>In our examination of <it>Arabidopsis </it>gametophyte genes, we identified gametophyte-expressed homologs in <it>Pteridium </it>for over half (52.7%) of the previously characterized <it>Arabidopsis </it>gametophyte specific or enriched genes. This finding suggests that a highly specific suite of genes required for gamete production and syngamy may be conserved over long periods of evolutionary time, despite substantial differences in life cycle and reproductive strategies between <it>Pteridium </it>and <it>Arabidopsis</it>. It should be noted also that these conserved genes are not the genes required for meiosis because the tissues sampled in this study and those used to identify gametophyte specific genes in <it>Arabidopsis </it>were all post-meiotic. An in-depth study of sporophytic genes in <it>Pteridium </it>is needed to better understand the evolution and expression of genes between the sporophyte and gametophyte stages.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>This study is the first comprehensive sequencing effort and analysis of gene function in the transcriptome of a fern and represents the most extensive expressed sequence resource available in ferns to date, nearly 16 times more data than exists for <it>Adiantum capillus-veneris</it>. These data are an important new scientific resource for comparative evolutionary studies in land plants and will be of great value for studies of genome structure and function in ferns. These data can be used to develop microarrays for gene expression assays or serve as a reference transcriptome for future RNA-seq experiments in <it>Pteridium</it>. As additional genome-scale projects in diverse plants are undertaken, these data will be of immense value in representing ferns, the sister clade to seed plants, in comparative genomic analyses.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Gametophyte culture, library preparation, and sequencing</p>
</st>
<p>
<it>Pteridium aquilinum </it>ssp. <it>aquilinum </it>spores (collection number: Wolf 83; sourced from a single sporophyte individual collected in Norwich, UK) were sown onto sterile agar nutrient media containing Bold's macronutrients and Nitch's micronutrients (prepared as described by <abbrgrp>
<abbr bid="B81">81</abbr>
</abbrgrp>) and grown under white light. Whole gametophytes including both vegetative and sexually mature male, female, and hermaphroditic individuals of various ages (up to 9 months from germination) were flash frozen in liquid nitrogen and ground to a fine powder. Total RNA was isolated using the Sigma Spectrum Plant Total RNA Kit, incorporating on-column DNase I (Qiagen) digestion during extraction to remove traces of genomic DNA. Total RNA was concentrated by precipitating in 2.5 M ammonium acetate and 70% ethanol, then resolubilizing the RNA pellet in RNase-free water to approximately 500 ng/&#956;L. Total RNA was quantified and its quality verified using an Agilent Bioanalyzer 2100. Total RNA was sent to the Center for Genomics and Bioinformatics at Indiana University, Bloomington (IU CGB), where a normalized transcriptome (cDNA) library optimized for Roche 454 GS-FLX Titanium sequencing was prepared <abbrgrp>
<abbr bid="B82">82</abbr>
</abbrgrp>.</p>
<p>Briefly, full-length enriched cDNA was synthesized with the CloneTech SMART cDNA synthesis kit using modified 454-ready adapter/primer oligos (K Mockaitis, unpublished). The frequency of abundant cDNA species was reduced using the Evrogen Direct Trimmer normalization kit. Normalized cDNA was fragmented by sonication, blunt end repaired, and ligated to custom 454 sequencing adapters. Amplification of the sequencing library incorporated an adapter-mediated PCR suppression effect to preferentially amplify ligation products suitable for 454 sequencing <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>. The final transcriptome library was size selected and 454 sequencing proceeded according to the manufacturers recommended protocol on 3 regions of a four region PicoTiter plate.</p>
</sec>
<sec>
<st>
<p>Sequence preprocessing, transcriptome assembly, and coverage assessment</p>
</st>
<p>Sequence reads generated in this study were deposited in the NCBI sequence read archive (SRA012887). Raw sequence reads that passed instrument software quality filters were trimmed of custom oligonucleotide adapter sequences (Justin Choi, unpublished, IU CGB). The resulting sequences were further processed with SeqClean <abbrgrp>
<abbr bid="B83">83</abbr>
</abbrgrp> and SnoWhite v1.0.3 <abbrgrp>
<abbr bid="B84">84</abbr>
</abbrgrp> to remove low quality, short, and contaminant sequences, and to aggressively trim polyA/T sequences. Cleaned reads were assembled <it>de novo </it>in MIRA v3rc4 <abbrgrp>
<abbr bid="B55">55</abbr>
<abbr bid="B85">85</abbr>
</abbrgrp> using a minimum percent identity of 94% to align reads, retaining singleton reads in the assembly (-OUT:sssip = yes). This primary assembly was passed through a secondary assembly step in CAP3 (95% identity, 25 bp overlap) <abbrgrp>
<abbr bid="B56">56</abbr>
</abbrgrp> to reduce redundancy in the final assembly and join additional contigs. Custom perl scripts were used to extract summary information about the reads and assemblies (Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr>; Additional file <supplr sid="S7">7</supplr>).</p>
<suppl id="S7">
<title>
<p>Additional file 7</p>
</title>
<text>
<p>
<b>Secondary coverage perl script</b>. Script used to identify singleton reads and primary and secondary contigs and calculate assembly statistics.</p>
</text>
<file name="1471-2164-12-99-S7.PL">
   <p>Click here for file</p>
</file>
</suppl>
<p>We utilized a web-based tool, ESTcalc <abbrgrp>
<abbr bid="B57">57</abbr>
</abbrgrp>, to estimate the predicted level of transcriptome coverage for our data set. Input parameters to ESTcalc require that we specify the sequencing technology used (or a combination of technologies) and either the total sequencing level (Mbp), or the number of reads and an estimate of read lengths. We used the best approximation for sequencing technology available (454 GS-FLX) and the empirical values observed for the cleaned sequence data (254 Mbp or 681,722 reads with an average of 372.6 bp/read) to obtain our estimates. The estimates reported were identical whether we parameterized on total sequence or supplied read length information as well.</p>
<p>To determine the number of eukaryotic ultra conserved orthologs (UCOs <abbrgrp>
<abbr bid="B58">58</abbr>
</abbrgrp>) we captured in the <it>Pteridium </it>transcriptome data set, we queried a list of 357 UCO coding sequences from <it>Arabidopsis </it>(sequences available at: <url>http://compgenomics.ucdavis.edu/compositae_reference.php</url>) into the unigene set with an e-value threshold of 1e-10 using NCBI tblastx. These blast results were then parsed to determine then number of UCOs with a positive hit that returned an amino acid alignment greater than 30 residues long.</p>
<p>We assessed the changing rate of new gene detection as a function of sampling effort (unigene accumulation curve, Figure <figr fid="F2">2</figr>) using a bootstrapped random sampling protocol implemented in a custom perl script (Additional file <supplr sid="S8">8</supplr>). This script uses the empirical distribution of read number per unigene in our final assembly to randomly sample reads one at a time and tracks the total number of unigenes detected at each step. Because the order of sampling can impact the shape of this curve, we computed 1,000 replicate random sample orders and calculated the mean number of unigenes detected after each draw. To evaluate the level of variation in the number of unigenes detected, we also calculated the 95% confidence interval on the number of unigenes. Using this curve, we then estimated the number of reads it took to capture an average of 90%, 95%, and 99% of the unigenes and the average number of additional reads required to detect each of the last 10 unigenes.</p>
<suppl id="S8">
<title>
<p>Additional file 8</p>
</title>
<text>
<p>
<b>Unigene accumulation curve perl script</b>. Script to randomly select reads from the assembly and calculate the number of contigs detected to produce the unigene accumulation curve.</p>
</text>
<file name="1471-2164-12-99-S8.PL">
   <p>Click here for file</p>
</file>
</suppl>
<p>To evaluate for the presence of potential contaminating sequences, we examined the taxonomic distribution of blastx hits for the unigene set in the NCBI nr protein database using an e-value threshold of 1e-10. The top 10 blast hits for each unigene were kept and examined in MEGAN v.3.7.2 <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp>. MEGAN is a tool built for the examination of metagenomic data sets and provides a number of useful functions to explore the information content of large blast results. The blast results for each unigene were mapped onto the NCBI taxonomy tree by examining just the best hit (lowest e-value) or by using the lowest common ancestor (LCA) algorithm <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp>. LCA was determined using at least three blast hits with a bitscore greater than 75 and within 10% of the top bitscore for that unigene.</p>
</sec>
<sec>
<st>
<p>Functional annotation</p>
</st>
<p>The same blast search used to examine the taxonomic distribution of blast hits was used to identify putative homologous proteins and annotate each sequence with gene ontology (GO) terms using Blast2GO <abbrgrp>
<abbr bid="B65">65</abbr>
<abbr bid="B66">66</abbr>
<abbr bid="B67">67</abbr>
</abbrgrp>. Blast2GO was also used to automatically handle InterProScan (IPS) searches to identify conserved protein domains in translations of the longest ORF in each unigene. Any GO terms associated with IPS hits were then merged into the blast-based GO annotation. GO terms were then used to map enzyme codes to each sequence. Enzyme codes were then used to automatically color and retrieve KEGG pathway maps <abbrgrp>
<abbr bid="B86">86</abbr>
<abbr bid="B87">87</abbr>
</abbrgrp>. As a final step in examining a broad functional representation of the gametophyte transcriptome, GO terms were mapped to the reduced GO-slim ontology and visualized and explored with directed acyclic graphs (not shown) and summarized with filtered pie charts including GO categories represented by at least 150 sequences (Figure <figr fid="F3">3</figr>)</p>
</sec>
<sec>
<st>
<p>Comparative genomics</p>
</st>
<p>Unigenes were classified into tribe- and orthoMCL clusters in the PlantTribes2.0 database using a custom Perl pipeline (dePamphilis lab, unpublished) which queries each unigene against the complete inferred protein set from ten plant species that have complete sequenced genomes, using a blastx e-value threshold of 1e-10. Unigenes were assigned to MCL clusters based on the best blast hit. Species used for blast searches and gene clustering in the PlantTribes2.0 database include: <it>Chlamydomonas reinhardtii v3.0</it>, <it>Physcomitrella patens v1.1</it>, <it>Selaginella moellendorffii v1.0</it>, <it>Oryza sativa v5.0</it>, <it>Sorghum bicolor v1.0</it>, <it>Vitis vinifera v1.0</it>, <it>Populus trichocarpa v1.0</it>, <it>Medicago truncatula v1.0</it>, <it>Carica papaya v1.0</it>, and <it>Arabidopsis thaliana </it>(TAIR7). Meta-information about each assigned cluster was extracted from the database for each unigene and was output to a file. Simple text-based searches examined this information to retrieve gene family names and putative gene family functional data. The shared single copy tribes <abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp> for <it>Arabidopsis</it>, <it>Vitis</it>, <it>Populus</it>, and <it>Oryza </it>were identified in the PlantTribes2.0 database and the number of these tribes detected in the unigene set was determined by examining the pipeline output file. Orthogroup assignments for <it>Pteridium </it>unigenes were examined for cluster membership by <it>Selaginella</it>, <it>Physcomitrella</it>, and <it>Arabidopsis </it>to generate a Venn diagram showing putative gene level overlap (Figure <figr fid="F4">4A</figr>). Unigenes were also directly queried against each of these proteomes using a blastx e-value threshold of 1e-10 to examine the distribution of similar proteins in these three species. Venn diagrams were generated to graphically illustrate the overlap of unigenes for each proteome (Figure <figr fid="F4">4B</figr>). To screen for the presence of putative gametophyte genes, a list of male gametophyte specific genes in <it>Arabidopsis </it>was extracted from the microarray study of Honys and Twell <abbrgrp>
<abbr bid="B68">68</abbr>
</abbrgrp> and a combined list of significantly enriched female gametophyte genes was compiled from the studies of Wuest et al. <abbrgrp>
<abbr bid="B70">70</abbr>
</abbrgrp> and Yu et al. <abbrgrp>
<abbr bid="B69">69</abbr>
</abbrgrp>. These <it>Arabidopsis </it>gametophyte gene lists compiled from the literature were examined for overlap with the list of genes producing significant alignments with <it>Pteridium </it>unigenes in the blastx search against the <it>Arabidopsis </it>genome to produce a Venn diagram of gametophyte genes (Figure <figr fid="F5">5</figr>).</p>
</sec>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>JPD conceived and designed the study, cultured the plant tissue, isolated total RNA, performed the data analyses, and drafted the manuscript. MSB advised on sequencing strategies and assisted with bioinformatic analyses. CWD and NJW assisted with bioinformatic analyses, assisted in summarizing and interpreting analysis results and in planning the manuscript. PGW collected the original material used to initiate gametophyte tissue culture and provided input on all aspects of the study from experimental design and analyses to manuscript preparation. All authors have read and approved the manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>Funding for this research was provided by the Vice President for Research and the Center for Integrated Biosystems at Utah State University. The Department of Biology and the Ecology Center at Utah State University provided funds for publication processing charges. Keithanne Mockaitis directed the cDNA library preparation and sequencing at the Center for Genomics and Bioinformatics at Indiana University, and Justin Choi pre-processed the primary sequencing data. Michael Pfrender, Aaron Duffy, Katrina Dlugosch, Eric Wafula, Paul Cliften, Amanda Grusz, Jacob Davidson, and Kristal Watrous assisted with various aspects of this project or manuscript preparation.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Unfurling fern biology in the genomics age</p></title><aug><au><snm>Barker</snm><fnm>MS</fnm></au><au><snm>Wolf</snm><fnm>PG</fnm></au></aug><source>Bioscience</source><pubdate>2010</pubdate><volume>60</volume><fpage>177</fpage><lpage>185</lpage><xrefbib><pubid idtype="doi">10.1525/bio.2010.60.3.4</pubid></xrefbib></bibl><bibl id="B2"><title><p>Deciding among green plants for whole genome studies</p></title><aug><au><snm>Pryer</snm><fnm>KM</fnm></au><au><snm>Schneider</snm><fnm>H</fnm></au><au><snm>Zimmer</snm><fnm>EA</fnm></au><au><snm>Ann Banks</snm><fnm>J</fnm></au></aug><source>Trends in Plant Science</source><pubdate>2002</pubdate><volume>7</volume><fpage>550</fpage><lpage>554</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1360-1385(02)02375-0</pubid><pubid idtype="pmpid" link="fulltext">12475497</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Evolutionary genomic analyses of ferns reveal that high chromosome numbers are a product of high retention and fewer rounds of polyploidy relative to angiosperms</p></title><aug><au><snm>Barker</snm><fnm>MS</fnm></au></aug><source>American Fern Journal</source><pubdate>2009</pubdate><volume>99</volume><fpage>136</fpage><lpage>137</lpage></bibl><bibl id="B4"><title><p>Evolution of the nuclear genome of ferns and lycophytes</p></title><aug><au><snm>Nakazato</snm><fnm>T</fnm></au><au><snm>Barker</snm><fnm>MS</fnm></au><au><snm>Rieseberg</snm><fnm>LH</fnm></au><au><snm>Gastony</snm><fnm>GJ</fnm></au></aug><source>Biology and evolution of ferns and lycophytes</source><publisher>Cambridge University Press</publisher><editor>Ranker TA, Haufler CH</editor><pubdate>2008</pubdate><fpage>175</fpage><lpage>198</lpage><xrefbib><pubid idtype="doi">full_text</pubid></xrefbib></bibl><bibl id="B5"><title><p>Fern genome structure and evolution</p></title><aug><au><snm>Nakazato</snm><fnm>T</fnm></au></aug><source>American Fern Journal</source><pubdate>2009</pubdate><volume>99</volume><fpage>134</fpage><lpage>136</lpage></bibl><bibl id="B6"><title><p>Genetic map-based analysis of genome structure in the homosporous fern <it>Ceratopteris richardii</it></p></title><aug><au><snm>Nakazato</snm><fnm>T</fnm></au><au><snm>Jung</snm><fnm>MK</fnm></au><au><snm>Housworth</snm><fnm>EA</fnm></au><au><snm>Rieseberg</snm><fnm>LH</fnm></au><au><snm>Gastony</snm><fnm>GJ</fnm></au></aug><source>Genetics</source><pubdate>2006</pubdate><volume>173</volume><fpage>1585</fpage><lpage>1597</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1534/genetics.106.055624</pubid><pubid idtype="pmcid">1526675</pubid><pubid idtype="pmpid">16648591</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Profile and analysis of gene expression changes during early development in germinating spores of <it>Ceratopteris richardii</it></p></title><aug><au><snm>Salmi</snm><fnm>ML</fnm></au><au><snm>Bushart</snm><fnm>TJ</fnm></au><au><snm>Stout</snm><fnm>SC</fnm></au><au><snm>Roux</snm><fnm>SJ</fnm></au></aug><source>Plant Physiology</source><pubdate>2005</pubdate><volume>138</volume><fpage>1734</fpage><lpage>1745</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1104/pp.105.062851</pubid><pubid idtype="pmcid">1176442</pubid><pubid idtype="pmpid">15965014</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Analysis of expressed sequence tags in prothallia of <it>Adiantum capillus-veneris</it></p></title><aug><au><snm>Yamauchi</snm><fnm>D</fnm></au><au><snm>Sutoh</snm><fnm>K</fnm></au><au><snm>Kanegae</snm><fnm>H</fnm></au><au><snm>Horiguchi</snm><fnm>T</fnm></au><au><snm>Matsuoka</snm><fnm>K</fnm></au><au><snm>Fukuda</snm><fnm>H</fnm></au><au><snm>Wada</snm><fnm>M</fnm></au></aug><source>Journal of Plant Research</source><pubdate>2005</pubdate><volume>118</volume><fpage>223</fpage><lpage>227</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s10265-005-0209-3</pubid><pubid idtype="pmpid" link="fulltext">15940394</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>NCBI GenBank dbEST</p></title><url>http://www.ncbi.nlm.nih.gov/nucest</url></bibl><bibl id="B10"><title><p>Comparative 454 pyrosequencing of transcripts from two olive genotypes during fruit development</p></title><aug><au><snm>Alagna</snm><fnm>F</fnm></au><au><snm>D&apos;agostino</snm><fnm>N</fnm></au><au><snm>Torchia</snm><fnm>L</fnm></au><au><snm>Servili</snm><fnm>M</fnm></au><au><snm>Rao</snm><fnm>R</fnm></au><au><snm>Pietrella</snm><fnm>M</fnm></au><au><snm>Giuliano</snm><fnm>G</fnm></au><au><snm>Chiusano</snm><fnm>ML</fnm></au><au><snm>Baldoni</snm><fnm>L</fnm></au><au><snm>Perrotta</snm><fnm>G</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>399</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-399</pubid><pubid idtype="pmcid">2748093</pubid><pubid idtype="pmpid">19709400</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Comparison of the transcriptomes of American chestnut (<it>Castanea dentata</it>) and Chinese chestnut (<it>Castanea mollissima</it>) in response to the chestnut blight infection</p></title><aug><au><snm>Barakat</snm><fnm>A</fnm></au><au><snm>DiLoreto</snm><fnm>DS</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Smith</snm><fnm>C</fnm></au><au><snm>Baier</snm><fnm>K</fnm></au><au><snm>Powell</snm><fnm>WA</fnm></au><au><snm>Wheeler</snm><fnm>N</fnm></au><au><snm>Sederoff</snm><fnm>R</fnm></au><au><snm>Carlson</snm><fnm>JE</fnm></au></aug><source>BMC Plant Biology</source><pubdate>2009</pubdate><volume>9</volume><fpage>51</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2229-9-51</pubid><pubid idtype="pmcid">2688492</pubid><pubid idtype="pmpid">19426529</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Combining next-generation pyrosequencing with microarray for large scale expression analysis in non-model species</p></title><aug><au><snm>Bellin</snm><fnm>D</fnm></au><au><snm>Ferrarini</snm><fnm>A</fnm></au><au><snm>Chimento</snm><fnm>A</fnm></au><au><snm>Kaiser</snm><fnm>O</fnm></au><au><snm>Levenkova</snm><fnm>N</fnm></au><au><snm>Bouffard</snm><fnm>P</fnm></au><au><snm>Delledonne</snm><fnm>M</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>555</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-555</pubid><pubid idtype="pmcid">2790472</pubid><pubid idtype="pmpid">19930683</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>An approach to transcriptome analysis of non-model organisms using short-read sequences</p></title><aug><au><snm>Collins</snm><fnm>L</fnm></au><au><snm>Biggs</snm><fnm>P</fnm></au><au><snm>Voelckel</snm><fnm>C</fnm></au><au><snm>Joly</snm><fnm>S</fnm></au></aug><source>Genome Informatics</source><pubdate>2008</pubdate><volume>21</volume><fpage>3</fpage><lpage>14</lpage><xrefbib><pubidlist><pubid idtype="doi">full_text</pubid><pubid idtype="pmpid">19425143</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly <it>Sarcophaga crassipalpis</it></p></title><aug><au><snm>Hahn</snm><fnm>DA</fnm></au><au><snm>Ragland</snm><fnm>GJ</fnm></au><au><snm>Shoemaker</snm><fnm>DD</fnm></au><au><snm>Denlinger</snm><fnm>DL</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>234</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-234</pubid><pubid idtype="pmcid">2700817</pubid><pubid idtype="pmpid">19454017</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Next-generation pyrosequencing of gonad transcriptomes in the polyploid lake sturgeon (<it>Acipenser fulvescens</it>): the relative merits of normalization and rarefaction in gene discovery</p></title><aug><au><snm>Hale</snm><fnm>MC</fnm></au><au><snm>McCormick</snm><fnm>CR</fnm></au><au><snm>Jackson</snm><fnm>JR</fnm></au><au><snm>Dewoody</snm><fnm>JA</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>203</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-203</pubid><pubid idtype="pmcid">2688523</pubid><pubid idtype="pmpid">19402907</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Characterization of the <it>Zoarces viviparus </it>liver transcriptome using massively parallel pyrosequencing</p></title><aug><au><snm>Kristiansson</snm><fnm>E</fnm></au><au><snm>Asker</snm><fnm>N</fnm></au><au><snm>F&#246;rlin</snm><fnm>L</fnm></au><au><snm>Larsson</snm><fnm>DG</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>345</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-345</pubid><pubid idtype="pmcid">2725146</pubid><pubid idtype="pmpid">19646242</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx</p></title><aug><au><snm>Meyer</snm><fnm>E</fnm></au><au><snm>Aglyamova</snm><fnm>GV</fnm></au><au><snm>Wang</snm><fnm>S</fnm></au><au><snm>Buchanan-Carter</snm><fnm>J</fnm></au><au><snm>Abrego</snm><fnm>D</fnm></au><au><snm>Colbourne</snm><fnm>JK</fnm></au><au><snm>Willis</snm><fnm>BL</fnm></au><au><snm>Matz</snm><fnm>MV</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>219</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-219</pubid><pubid idtype="pmcid">2689275</pubid><pubid idtype="pmpid">19435504</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>High-throughput gene and SNP discovery in <it>Eucalyptus grandis</it>, an uncharacterized genome</p></title><aug><au><snm>Novaes</snm><fnm>E</fnm></au><au><snm>Drost</snm><fnm>DR</fnm></au><au><snm>Farmerie</snm><fnm>WG</fnm></au><au><snm>Pappas</snm><fnm>GJ</fnm></au><au><snm>Grattapaglia</snm><fnm>D</fnm></au><au><snm>Sederoff</snm><fnm>RR</fnm></au><au><snm>Kirst</snm><fnm>M</fnm></au></aug><source>BMC Genomics</source><pubdate>2008</pubdate><volume>9</volume><fpage>312</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-9-312</pubid><pubid idtype="pmcid">2483731</pubid><pubid idtype="pmpid">18590545</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery</p></title><aug><au><snm>Parchman</snm><fnm>TL</fnm></au><au><snm>Geist</snm><fnm>KS</fnm></au><au><snm>Grahnen</snm><fnm>JA</fnm></au><au><snm>Benkman</snm><fnm>CW</fnm></au><au><snm>Buerkle</snm><fnm>CA</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>180</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-180</pubid><pubid idtype="pmcid">2851599</pubid><pubid idtype="pmpid">20233449</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>A 454 sequencing approach for large scale phylogenomic analysis of the common emperor scorpion (<it>Pandinus imperator</it>)</p></title><aug><au><snm>Roeding</snm><fnm>F</fnm></au><au><snm>Borner</snm><fnm>J</fnm></au><au><snm>Kube</snm><fnm>M</fnm></au><au><snm>Klages</snm><fnm>S</fnm></au><au><snm>Reinhardt</snm><fnm>R</fnm></au><au><snm>Burmester</snm><fnm>T</fnm></au></aug><source>Molecular Phylogenetics and Evolution</source><pubdate>2009</pubdate><volume>53</volume><fpage>826</fpage><lpage>834</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ympev.2009.08.014</pubid><pubid idtype="pmpid" link="fulltext">19695333</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Harnessing genomics for evolutionary insights</p></title><aug><au><snm>Rokas</snm><fnm>A</fnm></au><au><snm>Abbot</snm><fnm>P</fnm></au></aug><source>Trends in Ecology and Evolution</source><pubdate>2009</pubdate><volume>24</volume><fpage>192</fpage><lpage>200</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tree.2008.11.004</pubid><pubid idtype="pmpid" link="fulltext">19201503</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing</p></title><aug><au><snm>Vera</snm><fnm>JC</fnm></au><au><snm>Wheat</snm><fnm>CW</fnm></au><au><snm>Fescemyer</snm><fnm>HW</fnm></au><au><snm>Frilander</snm><fnm>MJ</fnm></au><au><snm>Crawford</snm><fnm>DL</fnm></au><au><snm>Hanski</snm><fnm>I</fnm></au><au><snm>Marden</snm><fnm>JH</fnm></au></aug><source>Molecular Ecology</source><pubdate>2008</pubdate><volume>17</volume><fpage>1636</fpage><lpage>1647</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-294X.2008.03666.x</pubid><pubid idtype="pmpid" link="fulltext">18266620</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Global characterization of <it>Artemisia annua </it>glandular trichome transcriptome using 454 pyrosequencing</p></title><aug><au><snm>Wang</snm><fnm>W</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au><au><snm>Zhang</snm><fnm>Q</fnm></au><au><snm>Qi</snm><fnm>Y</fnm></au><au><snm>Guo</snm><fnm>D</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>465</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-465</pubid><pubid idtype="pmcid">2763888</pubid><pubid idtype="pmpid">19818120</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>454 pyrosequencing based transcriptome analysis of <it>Zygaena filipendulae </it>with focus on genes involved in biosynthesis of cyanogenic glucosides</p></title><aug><au><snm>Zagrobelny</snm><fnm>M</fnm></au><au><snm>Scheibye-Alsing</snm><fnm>K</fnm></au><au><snm>Jensen</snm><fnm>NB</fnm></au><au><snm>M&#248;ller</snm><fnm>BL</fnm></au><au><snm>Gorodkin</snm><fnm>J</fnm></au><au><snm>Bak</snm><fnm>S</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>574</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-574</pubid><pubid idtype="pmcid">2791780</pubid><pubid idtype="pmpid">19954531</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel <it>Bathymodiolus azoricus</it></p></title><aug><au><snm>Bettencourt</snm><fnm>R</fnm></au><au><snm>Pinheiro</snm><fnm>M</fnm></au><au><snm>Egas</snm><fnm>C</fnm></au><au><snm>Gomes</snm><fnm>P</fnm></au><au><snm>Afonso</snm><fnm>M</fnm></au><au><snm>Shank</snm><fnm>T</fnm></au><au><snm>Serrao Santos</snm><fnm>R</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>559</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-559</pubid><pubid idtype="pmpid" link="fulltext">20937131</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p><it>De Novo </it>Transcriptome Sequencing in <it>Anopheles funestus </it>using Illumina RNA-Seq Technology</p></title><aug><au><snm>Crawford</snm><fnm>JE</fnm></au><au><snm>Guelbeogo</snm><fnm>WM</fnm></au><au><snm>Sanou</snm><fnm>A</fnm></au><au><snm>Traor&#233;</snm><fnm>A</fnm></au><au><snm>Vernick</snm><fnm>KD</fnm></au><au><snm>Sagnon</snm><fnm>N</fnm></au><au><snm>Lazzaro</snm><fnm>BP</fnm></au></aug><source>PLoS ONE</source><pubdate>2010</pubdate><volume>5</volume><fpage>e14202</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0014202</pubid><pubid idtype="pmcid">2996306</pubid><pubid idtype="pmpid">21151993</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Population-level transcriptome sequencing of nonmodel organisms <it>Erynnis propertius </it>and <it>Papilio zelicaon</it></p></title><aug><au><snm>O&apos;Neil</snm><fnm>ST</fnm></au><au><snm>Dzurisin</snm><fnm>JD</fnm></au><au><snm>Carmichael</snm><fnm>RD</fnm></au><au><snm>Lobo</snm><fnm>NF</fnm></au><au><snm>Emrich</snm><fnm>SJ</fnm></au><au><snm>Hellmann</snm><fnm>JJ</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>310</fpage><xrefbib><pubidlist><pubid idtype="pmcid">2887415</pubid><pubid idtype="pmpid">20478048</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Global chloroplast phylogeny and biogeography of bracken (<it>Pteridium</it>; Dennstaedtiaceae)</p></title><aug><au><snm>Der</snm><fnm>JP</fnm></au><au><snm>Thomson</snm><fnm>JA</fnm></au><au><snm>Stratford</snm><fnm>JK</fnm></au><au><snm>Wolf</snm><fnm>PG</fnm></au></aug><source>American Journal of Botany</source><pubdate>2009</pubdate><volume>96</volume><fpage>1041</fpage><lpage>1049</lpage><xrefbib><pubid idtype="doi">10.3732/ajb.0800333</pubid></xrefbib></bibl><bibl id="B29"><title><p>Germination of bracken fern spore</p></title><aug><au><snm>DeMaggo</snm><fnm>AE</fnm></au><au><snm>Raghavan</snm><fnm>V</fnm></au></aug><source>Experimental Cell Research</source><pubdate>1972</pubdate><volume>73</volume><fpage>182</fpage><lpage>186</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0014-4827(72)90117-6</pubid><pubid idtype="pmpid" link="fulltext">5036988</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Development of the bracken fern, <it>Pteridium aquilinum </it>(L.) Kuhn.-- II. stelar ontogeny of the sporeling</p></title><aug><au><snm>Gottlieb</snm><fnm>JE</fnm></au></aug><source>Phytomorphology</source><pubdate>1959</pubdate><volume>9</volume><fpage>91</fpage><lpage>105</lpage></bibl><bibl id="B31"><title><p>Apospory in the fern <it>Pteridium aquilinum </it>(L) Kuhn 1: low temperature scanning electron microscopy</p></title><aug><au><snm>Sheffield</snm><fnm>E</fnm></au></aug><source>Cytobios</source><pubdate>1984</pubdate><volume>39</volume><fpage>171</fpage><lpage>176</lpage></bibl><bibl id="B32"><title><p>Cellular aspects of the initiation of asporous outgrowths in ferns</p></title><aug><au><snm>Sheffield</snm><fnm>E</fnm></au></aug><source>Proceedings of the Royal Society B</source><pubdate>1985</pubdate><volume>86</volume><fpage>45</fpage><lpage>50</lpage></bibl><bibl id="B33"><title><p>Alternation of generations</p></title><aug><au><snm>Sheffield</snm><fnm>E</fnm></au></aug><source>Biology and evolution of ferns and lycophytes</source><publisher>Cambridge University Press</publisher><editor>Ranker TA, Haufler CH</editor><pubdate>2008</pubdate><fpage>49</fpage><lpage>74</lpage><xrefbib><pubid idtype="doi">full_text</pubid></xrefbib></bibl><bibl id="B34"><title><p>Cessation of vascular activity correlated with asporous development in <it>Pteridium aquilinum </it>(L.) Kuhn</p></title><aug><au><snm>Sheffield</snm><fnm>E</fnm></au><au><snm>Bell</snm><fnm>PR</fnm></au></aug><source>New Phytologist</source><pubdate>1981</pubdate><volume>88</volume><fpage>533</fpage><lpage>538</lpage><xrefbib><pubid idtype="doi">10.1111/j.1469-8137.1981.tb04096.x</pubid></xrefbib></bibl><bibl id="B35"><title><p>Chromosome study on induced apospory in the bracken fern</p></title><aug><au><snm>Takahashi</snm><fnm>C</fnm></au></aug><source>La Kromosomo</source><pubdate>1961</pubdate><volume>48</volume><fpage>1602</fpage><lpage>1605</lpage></bibl><bibl id="B36"><title><p>Morphogenesis in <it>Pteridium aquilinum </it>(L.) Kuhn - general morphology and growth habit</p></title><aug><au><snm>Webster</snm><fnm>BD</fnm></au><au><snm>Steeves</snm><fnm>TA</fnm></au></aug><source>Phytomorphology</source><pubdate>1958</pubdate><volume>8</volume><fpage>30</fpage><lpage>41</lpage></bibl><bibl id="B37"><title><p>An assessment of genetic and environmental effects on sporangial development in bracken [<it>Pteridium aquilinum </it>(L.) Kuhn] using a novel quantitative method</p></title><aug><au><snm>Wynn</snm><fnm>JM</fnm></au><au><snm>Small</snm><fnm>JL</fnm></au><au><snm>Pakeman</snm><fnm>RJ</fnm></au><au><snm>Sheffield</snm><fnm>E</fnm></au></aug><source>Annals of Botany</source><pubdate>2000</pubdate><volume>85</volume><fpage>113</fpage><lpage>115</lpage><xrefbib><pubid idtype="doi">10.1006/anbo.1999.1068</pubid></xrefbib></bibl><bibl id="B38"><title><p>The effect of spore density on germination and development in <it>Pteridium</it>, monitored using a novel culture technique</p></title><aug><au><snm>Ashcroft</snm><fnm>CJ</fnm></au><au><snm>Sheffield</snm><fnm>E</fnm></au></aug><source>American Fern Journal</source><pubdate>2000</pubdate><volume>90</volume><fpage>91</fpage><lpage>99</lpage><xrefbib><pubid idtype="doi">10.2307/1547324</pubid></xrefbib></bibl><bibl id="B39"><title><p>Gametogenesis and fertilization in <it>Pteridium</it></p></title><aug><au><snm>Bell</snm><fnm>PR</fnm></au><au><snm>Duckett</snm><fnm>JG</fnm></au></aug><source>Botanical Journal of the Linnean Society</source><pubdate>1976</pubdate><volume>73</volume><fpage>47</fpage><lpage>78</lpage><xrefbib><pubid idtype="doi">10.1111/j.1095-8339.1976.tb02012.x</pubid></xrefbib></bibl><bibl id="B40"><title><p>On the physiology of antheridium formation in the bracken fern, <it>Pteridium aquilinum </it>(L.) Kuhn</p></title><aug><au><snm>N&#228;f</snm><fnm>U</fnm></au></aug><source>Physiologia Plantarum</source><pubdate>1958</pubdate><volume>11</volume><fpage>728</fpage><lpage>746</lpage></bibl><bibl id="B41"><title><p>Reproductive behavior of cloned gametophytes of <it>Pteridium aquilinum </it>(L.) Kuhn</p></title><aug><au><snm>Robertson</snm><fnm>FW</fnm></au></aug><source>American Fern Journal</source><pubdate>2002</pubdate><volume>92</volume><fpage>270</fpage><lpage>287</lpage><xrefbib><pubid idtype="doi">10.1640/0002-8444(2002)092[0270:RBOCGO]2.0.CO;2</pubid></xrefbib></bibl><bibl id="B42"><title><p>Antheridiogens</p></title><aug><au><snm>Schneller</snm><fnm>JJ</fnm></au></aug><source>Biology and evolution of ferns and lycophytes</source><publisher>Cambridge University Press</publisher><editor>Ranker TA, Haufler CH</editor><pubdate>2008</pubdate><fpage>134</fpage><lpage>158</lpage><xrefbib><pubid idtype="doi">full_text</pubid></xrefbib></bibl><bibl id="B43"><title><p>The growth and division of cells in relation to morphogenesis in fern gametophytes I. photomorphogenetic studies in Pteridium aquilinum</p></title><aug><au><snm>Sobota</snm><fnm>AE</fnm></au><au><snm>Partanen</snm><fnm>CR</fnm></au></aug><source>Canadian Journal of Botany</source><pubdate>1966</pubdate><volume>44</volume><fpage>497</fpage><lpage>506</lpage><xrefbib><pubid idtype="doi">10.1139/b66-059</pubid></xrefbib></bibl><bibl id="B44"><title><p>In vitro studies on abnormal growth of prothalli of the bracken fern</p></title><aug><au><snm>Steeves</snm><fnm>TA</fnm></au><au><snm>Sussex</snm><fnm>IM</fnm></au><au><snm>Partanen</snm><fnm>CR</fnm></au></aug><source>American Journal of Botany</source><pubdate>1955</pubdate><volume>42</volume><fpage>232</fpage><lpage>245</lpage><xrefbib><pubid idtype="doi">10.2307/2438559</pubid></xrefbib></bibl><bibl id="B45"><title><p>The origin and development of apogamous structures in the gametophyte of <it>Pteridium </it>in sterile culture</p></title><aug><au><snm>Whittier</snm><fnm>DP</fnm></au></aug><source>Phytomorphology</source><pubdate>1962</pubdate><volume>12</volume><fpage>10</fpage><lpage>20</lpage></bibl><bibl id="B46"><title><p>Cyanogenic polymorphism in bracken in relation to herbivore predation</p></title><aug><au><snm>Cooper-Driver</snm><fnm>GA</fnm></au><au><snm>Swain</snm><fnm>T</fnm></au></aug><source>Nature</source><pubdate>1970</pubdate><volume>260</volume><fpage>604</fpage><xrefbib><pubid idtype="doi">10.1038/260604a0</pubid></xrefbib></bibl><bibl id="B47"><title><p>Human carcinogenesis and bracken fern: a review of the evidence</p></title><aug><au><snm>Alonso-Amelot</snm><fnm>ME</fnm></au><au><snm>Avendano</snm><fnm>M</fnm></au></aug><source>Current Medicinal Chemistry</source><pubdate>2002</pubdate><volume>9</volume><fpage>675</fpage><lpage>686</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">11945131</pubid></xrefbib></bibl><bibl id="B48"><title><p>Quercetin elevates p27(Kip1) and arrests both primary and HPV16 E6/E7 transformed human keratinocytes in G1</p></title><aug><au><snm>Beniston</snm><fnm>RG</fnm></au><au><snm>Campo</snm><fnm>MS</fnm></au></aug><source>Oncogene</source><pubdate>2003</pubdate><volume>22</volume><fpage>5504</fpage><lpage>5514</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/sj.onc.1206848</pubid><pubid idtype="pmpid" link="fulltext">12934110</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>Carcinogenic effects of ptaquiloside in bracken fern and related compounds</p></title><aug><au><snm>Potter</snm><fnm>DM</fnm></au><au><snm>Baird</snm><fnm>MS</fnm></au></aug><source>British Journal of Cancer</source><pubdate>2000</pubdate><volume>83</volume><fpage>914</fpage><lpage>920</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1054/bjoc.2000.1368</pubid><pubid idtype="pmcid">2374682</pubid><pubid idtype="pmpid">10970694</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>Effects of experimental restoration on the diaspore bank of an upland moor degraded by <it>Pteridium aquilinum </it>invasion</p></title><aug><au><snm>Ghorbani</snm><fnm>J</fnm></au><au><snm>Le Duc</snm><fnm>MG</fnm></au><au><snm>McAllister</snm><fnm>HA</fnm></au><au><snm>Pakeman</snm><fnm>RJ</fnm></au><au><snm>Marrs</snm><fnm>RH</fnm></au></aug><source>Land Degredation and Development</source><pubdate>2007</pubdate><volume>18</volume><fpage>659</fpage><lpage>669</lpage><xrefbib><pubid idtype="doi">10.1002/ldr.804</pubid></xrefbib></bibl><bibl id="B51"><title><p>The invasion of <it>Pteridium aquilinum </it>and the impoverishment of the seed bank in fire prone areas of Brazilian Atlantic forest</p></title><aug><au><snm>Rodrigues Da Silva</snm><fnm>UDS</fnm></au><au><snm>Matos</snm><fnm>DMDS</fnm></au></aug><source>Biodiversity and Conservation</source><pubdate>2006</pubdate><volume>15</volume><fpage>3035</fpage><lpage>3043</lpage><xrefbib><pubid idtype="doi">10.1007/s10531-005-4877-z</pubid></xrefbib></bibl><bibl id="B52"><title><p>Bracken fern invasion in southern Yucatan: a case for land-change science</p></title><aug><au><snm>Schneider</snm><fnm>LC</fnm></au></aug><source>Geographical Review</source><pubdate>2004</pubdate><volume>94</volume><fpage>229</fpage><lpage>241</lpage><xrefbib><pubid idtype="doi">10.1111/j.1931-0846.2004.tb00169.x</pubid></xrefbib></bibl><bibl id="B53"><title><p>Modelling the effects of climate change on the growth of bracken (<it>Pteridium aquilinum</it>) in Britain</p></title><aug><au><snm>Pakeman</snm><fnm>RJ</fnm></au><au><snm>Marrs</snm><fnm>RH</fnm></au></aug><source>Journal of Applied Ecology</source><pubdate>1996</pubdate><volume>33</volume><fpage>561</fpage><lpage>575</lpage><xrefbib><pubid idtype="doi">10.2307/2404985</pubid></xrefbib></bibl><bibl id="B54"><title><p>Variation in genome size in <it>Pteridium</it></p></title><aug><au><snm>Tan</snm><fnm>MK</fnm></au><au><snm>Thompson</snm><fnm>JA</fnm></au></aug><source>Bracken 89: Bracken Biology and Management</source><publisher>Sydney, Australia: Australian Institute of Agricultural Science</publisher><editor>Thompson JA, Smith RT</editor><pubdate>1990</pubdate><fpage>87</fpage><lpage>93</lpage></bibl><bibl id="B55"><title><p>Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs</p></title><aug><au><snm>Chevreux</snm><fnm>B</fnm></au><au><snm>Pfisterer</snm><fnm>T</fnm></au><au><snm>Drescher</snm><fnm>B</fnm></au><au><snm>Driesel</snm><fnm>AJ</fnm></au><au><snm>M&#252;ller</snm><fnm>WE</fnm></au><au><snm>Wetter</snm><fnm>T</fnm></au><au><snm>Suhai</snm><fnm>S</fnm></au></aug><source>Genome Research</source><pubdate>2004</pubdate><volume>14</volume><fpage>1147</fpage><lpage>1159</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.1917404</pubid><pubid idtype="pmcid">419793</pubid><pubid idtype="pmpid">15140833</pubid></pubidlist></xrefbib></bibl><bibl id="B56"><title><p>CAP3: A DNA sequence assembly program</p></title><aug><au><snm>Huang</snm><fnm>X</fnm></au><au><snm>Madan</snm><fnm>A</fnm></au></aug><source>Genome Research</source><pubdate>1999</pubdate><volume>9</volume><fpage>868</fpage><lpage>877</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.9.9.868</pubid><pubid idtype="pmcid">310812</pubid><pubid idtype="pmpid">10508846</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>Comparison of next generation sequencing technologies for transcriptome characterization</p></title><aug><au><snm>Wall</snm><fnm>PK</fnm></au><au><snm>Leebens-Mack</snm><fnm>J</fnm></au><au><snm>Chanderbali</snm><fnm>AS</fnm></au><au><snm>Barakat</snm><fnm>A</fnm></au><au><snm>Wolcott</snm><fnm>E</fnm></au><au><snm>Liang</snm><fnm>H</fnm></au><au><snm>Landherr</snm><fnm>L</fnm></au><au><snm>Tomsho</snm><fnm>LP</fnm></au><au><snm>Hu</snm><fnm>Y</fnm></au><au><snm>Carlson</snm><fnm>JE</fnm></au><au><snm>Ma</snm><fnm>H</fnm></au><au><snm>Schuster</snm><fnm>SC</fnm></au><au><snm>Soltis</snm><fnm>DE</fnm></au><au><snm>Soltis</snm><fnm>PS</fnm></au><au><snm>Altman</snm><fnm>N</fnm></au><au><snm>dePamphilis</snm><fnm>CW</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>347</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-347</pubid><pubid idtype="pmcid">2907694</pubid><pubid idtype="pmpid">19646272</pubid></pubidlist></xrefbib></bibl><bibl id="B58"><title><p>Eukaryotic ultra conserved orthologs and estimation of gene capture In EST libraries [abstract]</p></title><aug><au><snm>Kozik</snm><fnm>A</fnm></au><au><snm>Matvienko</snm><fnm>M</fnm></au><au><snm>Kozik</snm><fnm>I</fnm></au><au><snm>Van Leeuwen</snm><fnm>H</fnm></au><au><snm>Van Deynze</snm><fnm>A</fnm></au><au><snm>Michelmore</snm><fnm>R</fnm></au></aug><source>Plant and Animal Genomes Conference</source><pubdate>2008</pubdate><volume>16</volume><fpage>P6</fpage></bibl><bibl id="B59"><title><p>Identification of shared single copy nuclear genes in <it>Arabidopsis</it>, <it>Populus</it>, <it>Vitis </it>and <it>Oryza </it>and their phylogenetic utility across various taxonomic levels</p></title><aug><au><snm>Duarte</snm><fnm>JM</fnm></au><au><snm>Wall</snm><fnm>PK</fnm></au><au><snm>Edger</snm><fnm>PP</fnm></au><au><snm>Landherr</snm><fnm>LL</fnm></au><au><snm>Ma</snm><fnm>H</fnm></au><au><snm>Pires</snm><fnm>JC</fnm></au><au><snm>Leebens-Mack</snm><fnm>J</fnm></au><au><snm>Depamphilis</snm><fnm>C</fnm></au></aug><source>BMC Evolutionary Biology</source><pubdate>2010</pubdate><volume>10</volume><fpage>61</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2148-10-61</pubid><pubid idtype="pmcid">2848037</pubid><pubid idtype="pmpid">20181251</pubid></pubidlist></xrefbib></bibl><bibl id="B60"><title><p>PlantTribes: a gene and gene family resource for comparative genomics in plants</p></title><aug><au><snm>Wall</snm><fnm>PK</fnm></au><au><snm>Leebens-Mack</snm><fnm>J</fnm></au><au><snm>M&#252;ller</snm><fnm>KF</fnm></au><au><snm>Field</snm><fnm>D</fnm></au><au><snm>Altman</snm><fnm>NS</fnm></au><au><snm>dePamphilis</snm><fnm>CW</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>D970</fpage><lpage>6</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkm972</pubid><pubid idtype="pmcid">2238917</pubid><pubid idtype="pmpid">18073194</pubid></pubidlist></xrefbib></bibl><bibl id="B61"><title><p>MEGAN analysis of metagenomic data</p></title><aug><au><snm>Huson</snm><fnm>DH</fnm></au><au><snm>Auch</snm><fnm>AF</fnm></au><au><snm>Qi</snm><fnm>J</fnm></au><au><snm>Schuster</snm><fnm>SC</fnm></au></aug><source>Genome Res</source><pubdate>2007</pubdate><volume>17</volume><fpage>377</fpage><lpage>386</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.5969107</pubid><pubid idtype="pmcid">1800929</pubid><pubid idtype="pmpid">17255551</pubid></pubidlist></xrefbib></bibl><bibl id="B62"><title><p>Genomic Perspectives on Evolution in Bracken Fern</p></title><aug><au><snm>Der</snm><fnm>JP</fnm></au></aug><source>Dissertation</source><publisher>Utah State University, Department of Biology</publisher><pubdate>2010</pubdate></bibl><bibl id="B63"><title><p>Chloroplast RNA metabolism</p></title><aug><au><snm>Stern</snm><fnm>DB</fnm></au><au><snm>Goldschmidt-Clermont</snm><fnm>M</fnm></au><au><snm>Hanson</snm><fnm>MR</fnm></au></aug><source>Annual Review of Plant Biology</source><pubdate>2010</pubdate><volume>61</volume><fpage>125</fpage><lpage>155</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1146/annurev-arplant-042809-112242</pubid><pubid idtype="pmpid" link="fulltext">20192740</pubid></pubidlist></xrefbib></bibl><bibl id="B64"><title><p>Sampling the <it>Arabidopsis </it>transcriptome with massively parallel pyrosequencing</p></title><aug><au><snm>Weber</snm><fnm>AP</fnm></au><au><snm>Weber</snm><fnm>KL</fnm></au><au><snm>Carr</snm><fnm>K</fnm></au><au><snm>Wilkerson</snm><fnm>C</fnm></au><au><snm>Ohlrogge</snm><fnm>JB</fnm></au></aug><source>Plant Physiology</source><pubdate>2007</pubdate><volume>144</volume><fpage>32</fpage><lpage>42</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1104/pp.107.096677</pubid><pubid idtype="pmcid">1913805</pubid><pubid idtype="pmpid">17351049</pubid></pubidlist></xrefbib></bibl><bibl id="B65"><title><p>Blast2GO: A comprehensive suite for functional analysis in plant genomics</p></title><aug><au><snm>Conesa</snm><fnm>A</fnm></au><au><snm>G&#246;tz</snm><fnm>S</fnm></au></aug><source>International Journal of Plant Genomics</source><pubdate>2008</pubdate><volume>2008</volume><fpage>619832</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1155/2008/619832</pubid><pubid idtype="pmcid">2375974</pubid><pubid idtype="pmpid">18483572</pubid></pubidlist></xrefbib></bibl><bibl id="B66"><title><p>Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research</p></title><aug><au><snm>Conesa</snm><fnm>A</fnm></au><au><snm>G&#246;tz</snm><fnm>S</fnm></au><au><snm>Garc&#237;a-G&#243;mez</snm><fnm>JM</fnm></au><au><snm>Terol</snm><fnm>J</fnm></au><au><snm>Tal&#243;n</snm><fnm>M</fnm></au><au><snm>Robles</snm><fnm>M</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><fpage>3674</fpage><lpage>3676</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti610</pubid><pubid idtype="pmpid" link="fulltext">16081474</pubid></pubidlist></xrefbib></bibl><bibl id="B67"><title><p>High-throughput functional annotation and data mining with the Blast2GO suite</p></title><aug><au><snm>G&#246;tz</snm><fnm>S</fnm></au><au><snm>Garc&#237;a-G&#243;mez</snm><fnm>JM</fnm></au><au><snm>Terol</snm><fnm>J</fnm></au><au><snm>Williams</snm><fnm>TD</fnm></au><au><snm>Nagaraj</snm><fnm>SH</fnm></au><au><snm>Nueda</snm><fnm>MJ</fnm></au><au><snm>Robles</snm><fnm>M</fnm></au><au><snm>Tal&#243;n</snm><fnm>M</fnm></au><au><snm>Dopazo</snm><fnm>J</fnm></au><au><snm>Conesa</snm><fnm>A</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>3420</fpage><lpage>3435</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2425479</pubid><pubid idtype="pmpid">18445632</pubid></pubidlist></xrefbib></bibl><bibl id="B68"><title><p>Transcriptome analysis of haploid male gametophyte development in <it>Arabidopsis</it></p></title><aug><au><snm>Honys</snm><fnm>D</fnm></au><au><snm>Twell</snm><fnm>D</fnm></au></aug><source>Genome Biology</source><pubdate>2004</pubdate><volume>5</volume><fpage>R85</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2004-5-11-r85</pubid><pubid idtype="pmcid">545776</pubid><pubid idtype="pmpid">15535861</pubid></pubidlist></xrefbib></bibl><bibl id="B69"><title><p>Analysis of the female gametophyte transcriptome of <it>Arabidopsis </it>by comparative expression profiling</p></title><aug><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Hogan</snm><fnm>P</fnm></au><au><snm>Sundaresan</snm><fnm>V</fnm></au></aug><source>Plant Physiology</source><pubdate>2005</pubdate><volume>139</volume><fpage>1853</fpage><lpage>1869</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1104/pp.105.067314</pubid><pubid idtype="pmcid">1310564</pubid><pubid idtype="pmpid">16299181</pubid></pubidlist></xrefbib></bibl><bibl id="B70"><title><p><it>Arabidopsis </it>female gametophyte gene expression map reveals similarities between plant and animal gametes</p></title><aug><au><snm>Wuest</snm><fnm>S</fnm></au><au><snm>Vijverberg</snm><fnm>K</fnm></au><au><snm>Schmidt</snm><fnm>A</fnm></au><au><snm>Weiss</snm><fnm>M</fnm></au><au><snm>Gheyselinck</snm><fnm>J</fnm></au><au><snm>Lohr</snm><fnm>M</fnm></au><au><snm>Wellmer</snm><fnm>F</fnm></au><au><snm>Rahnenf&#252;hrer</snm><fnm>J</fnm></au><au><snm>von Mering</snm><fnm>C</fnm></au><au><snm>Grossniklaus</snm><fnm>U</fnm></au></aug><source>Curr Biol</source><pubdate>2010</pubdate><volume>20</volume><fpage>506</fpage><lpage>512</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cub.2010.01.051</pubid><pubid idtype="pmpid" link="fulltext">20226671</pubid></pubidlist></xrefbib></bibl><bibl id="B71"><title><p>MSATCOMMANDER: detection of microsatellite repeat arrays and automated, locus-specific primer design</p></title><aug><au><snm>Faircloth</snm><fnm>BC</fnm></au></aug><source>Molecular Ecology Resources</source><pubdate>2008</pubdate><volume>8</volume><fpage>92</fpage><lpage>94</lpage><xrefbib><pubid idtype="doi">10.1111/j.1471-8286.2007.01884.x</pubid></xrefbib></bibl><bibl id="B72"><title><p>Primer3 on the WWW for general users and for biologist programmers</p></title><aug><au><snm>Rozen</snm><fnm>S</fnm></au><au><snm>Skaletsky</snm><fnm>H</fnm></au></aug><source>Methods Mol Biol</source><pubdate>2000</pubdate><volume>132</volume><fpage>365</fpage><lpage>386</lpage><xrefbib><pubid idtype="pmpid">10547847</pubid></xrefbib></bibl><bibl id="B73"><title><p>Genome-wide mapping of alternative splicing in <it>Arabidopsis thaliana</it></p></title><aug><au><snm>Filichkin</snm><fnm>SA</fnm></au><au><snm>Priest</snm><fnm>HD</fnm></au><au><snm>Givan</snm><fnm>SA</fnm></au><au><snm>Shen</snm><fnm>R</fnm></au><au><snm>Bryant</snm><fnm>DW</fnm></au><au><snm>Fox</snm><fnm>SE</fnm></au><au><snm>Wong</snm><fnm>WK</fnm></au><au><snm>Mockler</snm><fnm>TC</fnm></au></aug><source>Genome Res</source><pubdate>2010</pubdate><volume>20</volume><fpage>45</fpage><lpage>58</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.093302.109</pubid><pubid idtype="pmcid">2798830</pubid><pubid idtype="pmpid">19858364</pubid></pubidlist></xrefbib></bibl><bibl id="B74"><title><p>The developmental dynamics of the maize leaf transcriptome</p></title><aug><au><snm>Li</snm><fnm>P</fnm></au><au><snm>Ponnala</snm><fnm>L</fnm></au><au><snm>Gandotra</snm><fnm>N</fnm></au><au><snm>Wang</snm><fnm>L</fnm></au><au><snm>Si</snm><fnm>Y</fnm></au><au><snm>Tausta</snm><fnm>SL</fnm></au><au><snm>Kebrom</snm><fnm>TH</fnm></au><au><snm>Provart</snm><fnm>N</fnm></au><au><snm>Patel</snm><fnm>R</fnm></au><au><snm>Myers</snm><fnm>CR</fnm></au><au><snm>Reidel</snm><fnm>EJ</fnm></au><au><snm>Turgeon</snm><fnm>R</fnm></au><au><snm>Liu</snm><fnm>P</fnm></au><au><snm>Sun</snm><fnm>Q</fnm></au><au><snm>Nelson</snm><fnm>T</fnm></au><au><snm>Brutnell</snm><fnm>TP</fnm></au></aug><source>Nature Genetics</source><pubdate>2010</pubdate><volume>42</volume><fpage>1060</fpage><lpage>1067</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.703</pubid><pubid idtype="pmpid" link="fulltext">21037569</pubid></pubidlist></xrefbib></bibl><bibl id="B75"><title><p>Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome</p></title><aug><au><snm>Zhang</snm><fnm>G</fnm></au><au><snm>Guo</snm><fnm>G</fnm></au><au><snm>Hu</snm><fnm>X</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Li</snm><fnm>Q</fnm></au><au><snm>Li</snm><fnm>R</fnm></au><au><snm>Zhuang</snm><fnm>R</fnm></au><au><snm>Lu</snm><fnm>Z</fnm></au><au><snm>He</snm><fnm>Z</fnm></au><au><snm>Fang</snm><fnm>X</fnm></au><au><snm>Chen</snm><fnm>L</fnm></au><au><snm>Tian</snm><fnm>W</fnm></au><au><snm>Tao</snm><fnm>Y</fnm></au><au><snm>Kristiansen</snm><fnm>K</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Li</snm><fnm>S</fnm></au><au><snm>Yang</snm><fnm>H</fnm></au><au><snm>Wang</snm><fnm>J</fnm></au><au><snm>Wang</snm><fnm>J</fnm></au></aug><source>Genome Res</source><pubdate>2010</pubdate><volume>20</volume><fpage>646</fpage><lpage>654</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.100677.109</pubid><pubid idtype="pmcid">2860166</pubid><pubid idtype="pmpid">20305017</pubid></pubidlist></xrefbib></bibl><bibl id="B76"><title><p>Comparing <it>de novo </it>assemblers for 454 transcriptome data</p></title><aug><au><snm>Kumar</snm><fnm>S</fnm></au><au><snm>Blaxter</snm><fnm>ML</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>571</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-571</pubid><pubid idtype="pmpid" link="fulltext">20950480</pubid></pubidlist></xrefbib></bibl><bibl id="B77"><title><p>Optimization of <it>de novo </it>transcriptome assembly from next-generation sequencing data</p></title><aug><au><snm>Surget-Groba</snm><fnm>Y</fnm></au><au><snm>Montoya-Burgos</snm><fnm>JI</fnm></au></aug><source>Genome Res</source><pubdate>2010</pubdate><volume>20</volume><fpage>1432</fpage><lpage>1440</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.103846.109</pubid><pubid idtype="pmpid" link="fulltext">20693479</pubid></pubidlist></xrefbib></bibl><bibl id="B78"><title><p>Sequencing <it>Medicago truncatula </it>expressed sequenced tags using 454 Life Sciences technology</p></title><aug><au><snm>Cheung</snm><fnm>F</fnm></au><au><snm>Haas</snm><fnm>B</fnm></au><au><snm>Goldberg</snm><fnm>S</fnm></au><au><snm>May</snm><fnm>G</fnm></au><au><snm>Xiao</snm><fnm>Y</fnm></au><au><snm>Town</snm><fnm>C</fnm></au></aug><source>BMC Genomics</source><pubdate>2006</pubdate><volume>7</volume><fpage>272</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-7-272</pubid><pubid idtype="pmcid">1635983</pubid><pubid idtype="pmpid">17062153</pubid></pubidlist></xrefbib></bibl><bibl id="B79"><title><p>OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups</p></title><aug><au><snm>Chen</snm><fnm>F</fnm></au><au><snm>Mackey</snm><fnm>AJ</fnm></au><au><snm>Stoeckert</snm><fnm>CJ</fnm></au><au><snm>Roos</snm><fnm>DS</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2006</pubdate><volume>34</volume><fpage>D363</fpage><lpage>8</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkj123</pubid><pubid idtype="pmcid">1347485</pubid><pubid idtype="pmpid">16381887</pubid></pubidlist></xrefbib></bibl><bibl id="B80"><title><p>Protein families and TRIBES in genome sequence space</p></title><aug><au><snm>Enright</snm><fnm>AJ</fnm></au><au><snm>Kunin</snm><fnm>V</fnm></au><au><snm>Ouzounis</snm><fnm>CA</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2003</pubdate><volume>31</volume><fpage>4632</fpage><lpage>4638</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg495</pubid><pubid idtype="pmcid">169885</pubid><pubid idtype="pmpid">12888524</pubid></pubidlist></xrefbib></bibl><bibl id="B81"><title><p>Gemmiferous fern gametophytes-Vittariaceae</p></title><aug><au><snm>Farrar</snm><fnm>DR</fnm></au></aug><source>American Journal of Botany</source><pubdate>1974</pubdate><volume>61</volume><fpage>146</fpage><lpage>155</lpage><xrefbib><pubid idtype="doi">10.2307/2441184</pubid></xrefbib></bibl><bibl id="B82"><title><p>A garter snake transcriptome: pyrosequencing, <it>de novo </it>assembly, and sex-specific differences</p></title><aug><au><snm>Schwartz</snm><fnm>T</fnm></au><au><snm>Tae</snm><fnm>H</fnm></au><au><snm>Yang</snm><fnm>Y</fnm></au><au><snm>Mockaitis</snm><fnm>K</fnm></au><au><snm>Vanhemert</snm><fnm>J</fnm></au><au><snm>Proulx</snm><fnm>S</fnm></au><au><snm>Choi</snm><fnm>J</fnm></au><au><snm>Bronikowski</snm><fnm>A</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>694</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-694</pubid><pubid idtype="pmcid">3014983</pubid><pubid idtype="pmpid">21138572</pubid></pubidlist></xrefbib></bibl><bibl id="B83"><title><p>The Gene Indices Project software tools: SeqClean</p></title><url>http://compbio.dfci.harvard.edu/tgi/software/</url></bibl><bibl id="B84"><title><p>SnoWhite: A cleaning pipeline for cDNA sequences</p></title><url>http://kdlugosch.net/software/</url></bibl><bibl id="B85"><title><p>Genome sequence assembly using trace signals and additional sequence information</p></title><aug><au><snm>Chevreux</snm><fnm>B</fnm></au><au><snm>Wetter</snm><fnm>T</fnm></au><au><snm>Suhai</snm><fnm>S</fnm></au></aug><source>Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)</source><pubdate>1999</pubdate><volume>99</volume><fpage>45</fpage><lpage>56</lpage></bibl><bibl id="B86"><title><p>KEGG: Kyoto encyclopedia of genes and genomes</p></title><aug><au><snm>Kanehisa</snm><fnm>M</fnm></au><au><snm>Goto</snm><fnm>S</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2000</pubdate><volume>28</volume><fpage>27</fpage><lpage>30</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/28.1.27</pubid><pubid idtype="pmcid">102409</pubid><pubid idtype="pmpid">10592173</pubid></pubidlist></xrefbib></bibl><bibl id="B87"><title><p>From genomics to chemical genomics: new developments in KEGG</p></title><aug><au><snm>Kanehisa</snm><fnm>M</fnm></au><au><snm>Goto</snm><fnm>S</fnm></au><au><snm>Hattori</snm><fnm>M</fnm></au><au><snm>Aoki-Kinoshita</snm><fnm>KF</fnm></au><au><snm>Itoh</snm><fnm>M</fnm></au><au><snm>Kawashima</snm><fnm>S</fnm></au><au><snm>Katayama</snm><fnm>T</fnm></au><au><snm>Araki</snm><fnm>M</fnm></au><au><snm>Hirakawa</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2006</pubdate><volume>34</volume><fpage>D354</fpage><lpage>7</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkj102</pubid><pubid idtype="pmcid">1347464</pubid><pubid idtype="pmpid">16381885</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>