<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2164-12-61</ui><ji>1471-2164</ji><fm>
<dochead>Research article</dochead>
<bibl>
<title>
<p>The maternal and early embryonic transcriptome of the milkweed bug <it>Oncopeltus fasciatus</it>
</p>
</title>
<aug>
<au id="A1"><snm>Ewen-Campen</snm><fnm>Ben</fnm><insr iid="I1"/><email>bewencampen@oeb.harvard.edu</email></au>
<au id="A2"><snm>Shaner</snm><fnm>Nathan</fnm><insr iid="I2"/><email>nshaner@mbari.org</email></au>
<au id="A3"><snm>Panfilio</snm><mi>A</mi><fnm>Kristen</fnm><insr iid="I3"/><email>kpanfili@uni-koeln.de</email></au>
<au id="A4"><snm>Suzuki</snm><fnm>Yuichiro</fnm><insr iid="I4"/><email>ysuzuki@wellesley.edu</email></au>
<au id="A5"><snm>Roth</snm><fnm>Siegfried</fnm><insr iid="I3"/><email>Siegfried.Roth@uni-koeln.de</email></au>
<au ca="yes" id="A6"><snm>Extavour</snm><mi>G</mi><fnm>Cassandra</fnm><insr iid="I1"/><email>extavour@oeb.harvard.edu</email></au>
</aug>
<insg>
<ins id="I1"><p>Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA</p></ins>
<ins id="I2"><p>Monterey Bay Aquarium Research Institute, 7700 Sandholdt Road, Moss Landing, CA 95039, USA</p></ins>
<ins id="I3"><p>Institute for Developmental Biology, University of Cologne, Cologne Biocenter, Z&#252;lpicher Stra&#223;e 47b, 50674, Cologne, Germany</p></ins>
<ins id="I4"><p>Department of Biological Sciences, Wellesley College, 106 Central Street, Wellesley MA 02481, USA</p></ins>
</insg>
<source>BMC Genomics</source>
<issn>1471-2164</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>1</issue>
<fpage>61</fpage>
<url>http://www.biomedcentral.com/1471-2164/12/61</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-12-61</pubid><pubid idtype="pmpid">21266083</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>7</day><month>10</month><year>2010</year></date></rec><acc><date><day>25</day><month>1</month><year>2011</year></date></acc><pub><date><day>25</day><month>1</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Ewen-Campen et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Most evolutionary developmental biology ("evo-devo") studies of emerging model organisms focus on small numbers of candidate genes cloned individually using degenerate PCR. However, newly available sequencing technologies such as 454 pyrosequencing have recently begun to allow for massive gene discovery in animals without sequenced genomes. Within insects, although large volumes of sequence data are available for holometabolous insects, developmental studies of basally branching hemimetabolous insects typically suffer from low rates of gene discovery.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>We used 454 pyrosequencing to sequence over 500 million bases of cDNA from the ovaries and embryos of the milkweed bug <it>Oncopeltus fasciatus</it>, which lacks a sequenced genome. This indirectly developing insect occupies an important phylogenetic position, branching basal to Diptera (including fruit flies) and Hymenoptera (including honeybees), and is an experimentally tractable model for short-germ development. 2,087,410 reads from both normalized and non-normalized cDNA assembled into 21,097 sequences (isotigs) and 112,531 singletons. The assembled sequences fell into 16,617 unique gene models, and included predictions of splicing isoforms, which we examined experimentally. Discovery of new genes plateaued after assembly of ~1.5 million reads, suggesting that we have sequenced nearly all transcripts present in the cDNA sampled. Many transcripts have been assembled at close to full length, and there is a net gain of sequence data for over half of the pre-existing <it>O. fasciatus </it>accessions for developmental genes in GenBank. We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes. We also specifically address the effects of cDNA normalization on gene discovery in <it>de novo </it>transcriptome analyses.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome. These data will have applications to the study of the evolution of arthropod genes and genetic pathways, and to the wider evolution, development and genomics communities working with emerging model organisms.</p>
<p>[The sequence data from this study have been submitted to GenBank under study accession number SRP002610 (<url>http://www.ncbi.nlm.nih.gov/sra?term=SRP002610</url>). Custom scripts generated are available at <url>http://www.extavourlab.com/protocols/index.html</url>. Seven Additional files are available.]</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>New and emerging model organisms occupy an increasingly important part of the developmental biology and developmental genetics research landscape. While studying a huge diversity of animals has long been the norm in the classical fields of experimental embryology and functional morphology [see for example <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
</abbrgrp>], the molecular biology revolution and the advent of the "model system" concept <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp> created demand for a small number of highly genetically manipulable organisms that could be intensively studied <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. Research on these "big six" [sensu 6] genetic model organisms has led to enormous advances in our understanding of general principles of embryogenesis. However, placing these general principles in an evolutionary context requires broader taxonomic sampling. Many researchers have highlighted the need for developing new model organisms for specific comparative, evolutionary and ecological questions <abbrgrp>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
</abbrgrp>. It has also been suggested, however, that the single gene expression approach of the last several decades of evolutionary developmental biology ("evo-devo") has outlived its usefulness, and that what are needed are not more model organisms, but rather a smaller number of groups chosen for the ability to functionally manipulate genes <abbrgrp>
<abbr bid="B9">9</abbr>
<abbr bid="B10">10</abbr>
</abbrgrp>. Sophisticated gene expression techniques and even stable germline transgenesis have been developed in a large array of models outside of the "big six" [see for example <abbrgrp>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
</abbrgrp>]. The ancient history of the small RNA processing machinery <abbrgrp>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
</abbrgrp> means that gene knockdown is a feasible goal for most organisms, as long as the sequences of genes of interest are available.</p>
<p>While whole genome sequencing is an increasingly viable option for some organisms, many new models, particularly within the arthropods, lack the large community resources necessary to finance and maintain annotation of a genome. For these reasons, many researchers studying non-traditional model organisms have turned to Sanger-sequenced EST libraries [see for example <abbrgrp>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
</abbrgrp>]. In principle this method of gene discovery can lead to high-throughput expression and functional genetic analyses of multiple genes [see for example <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>]. In practice, however, most non-traditional organism studies are still subject to a gene discovery bottleneck. This is largely because at the scale needed to uncover rare developmental transcripts, Sanger-based EST sequencing quickly becomes technically and financially prohibitive for many labs working on organisms with smaller research communities. In addition, those smaller-scale EST projects that have been carried out are often not publically available in easily searchable formats, and their potential contribution to the developmental and evolutionary biology fields is thus limited.</p>
<p>Next-generation sequencing (NGS) offers comparative and evolutionary developmental biologists a way to obtain orders of magnitude more developmental gene data than ever before, at a fraction of its former cost. Several studies have demonstrated the feasibility of NGS for identifying SNPs for population studies and gene sequences for use as phylogenetic markers <abbrgrp>
<abbr bid="B18">18</abbr>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
<abbr bid="B26">26</abbr>
<abbr bid="B27">27</abbr>
<abbr bid="B28">28</abbr>
<abbr bid="B29">29</abbr>
<abbr bid="B30">30</abbr>
<abbr bid="B31">31</abbr>
<abbr bid="B32">32</abbr>
<abbr bid="B33">33</abbr>
<abbr bid="B34">34</abbr>
<abbr bid="B35">35</abbr>
</abbrgrp>. Unfortunately, the lack of suitable protocols for cDNA preparation, and of established pipelines for analysis have left this tool under-utilized by many evo-devo researchers. Furthermore, according to some estimates <abbrgrp>
<abbr bid="B35">35</abbr>
</abbrgrp>, few of these studies have been carried out at a scale large enough to provide significant recovery of rare transcripts, and therefore of developmental genes. Here we present an optimized protocol for synthesizing cDNA for 454 Titanium pyrosequencing, as well as a simple workflow for <it>de novo </it>assembly of the data without a reference genome, annotation and analysis of the dataset, and a demonstration of its utility for comparative developmental genetics.</p>
<p>A large body of literature is dedicated to the development and genomics of holometabolous insects (insects undergoing complete metamorphosis between embryonic and adult stages). Tens of holometabolous insect genomes are now available, thanks largely to work on <it>Drosophila melanogaster</it>, other drosophilids, and dipteran disease vectors <abbrgrp>
<abbr bid="B36">36</abbr>
<abbr bid="B37">37</abbr>
</abbrgrp>. In contrast, relatively little is known about the development of hemimetabolous insects, which undergo incomplete metamorphosis. Although several of these insects are amenable to laboratory culture and a variety of experimental manipulations, molecular developmental studies are scarce, and gene discovery rates remain low. Notable exceptions among the Hemiptera are the aphid <it>Acyrthosiphon pisum </it>and the Chagas' disease vector <it>Rhodnius prolixus</it>, whose genomes are completed and in progress respectively <abbrgrp>
<abbr bid="B38">38</abbr>
<abbr bid="B39">39</abbr>
</abbrgrp>. However, the aphid genome has undergone extensive duplications and gene loss, possibly due to its unusual reproductive and ecological characteristics <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. The mammalian blood feeding needs of <it>R. prolixus </it>make it a sub-optimal organism for developmental studies.</p>
<p>The milkweed bug <it>Oncopeltus </it>
<it>fasciatus </it>(Figure <figr fid="F1">1A-D</figr>) has emerged as a promising hemipteran system for studying the molecular development of hemimetabolous insects <abbrgrp>
<abbr bid="B40">40</abbr>
<abbr bid="B41">41</abbr>
<abbr bid="B42">42</abbr>
</abbrgrp>. It can be reared easily and cheaply in the laboratory, and has a long history as a laboratory animal for classical embryology and pattern formation studies <abbrgrp>
<abbr bid="B43">43</abbr>
<abbr bid="B44">44</abbr>
<abbr bid="B45">45</abbr>
</abbrgrp>. More recently, robust protocols for <it>in situ </it>hybridization, live imaging of embryogenesis, and RNAi-mediated gene knockdown have been developed and successfully applied to the study of the evolution of development [see for example <abbrgrp>
<abbr bid="B46">46</abbr>
<abbr bid="B47">47</abbr>
</abbrgrp>].</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Introduction to <it>Oncopeltus fasciatus </it>and the workflow for producing a <it>de novo </it>transcriptome assembly</p></caption><text>
   <p><b>Introduction to <it>Oncopeltus fasciatus </it>and the workflow for producing a <it>de novo </it>transcriptome assembly</b>. (<it>A</it>) An adult milkweed bug, <it>Oncopeltus fasciatus</it>. (<it>B</it>) Ovaries of adult female. Anterior is up. Oocytes (O) are visible in progressive stages of growth before reaching a common oviduct (Od). Oocytes are cytoplasmically connected to nurse cells (Nc) in the anterior of each ovariole. Scale bar = 1.0 mm. (<it>C</it>-<it>D</it>) The stages of <it>O. fasciatus </it>embryogenesis represented in this transcriptome. Embryos are stained with Sytox Green (Invitrogen) to visualize nuclei. Scale bars = 0.5 mm. (<it>C</it>) Development proceeds from left to right. Anterior is to the left. The cellularized blastoderm forms during the first ~20% of development (~0-24 hours at 28&#176;C), as nuclei reach the surface of the yolk and repeatedly divide. <it>(D</it>) Germ band extension and segmentation occur from ~20-60% of development (~24-72 hours at 28&#176;C). Development proceeds from left to right. Anterior is up. Mn = mandibular segment; Mx = maxillary segment; Lb = labial segment; T1-T3 = leg-bearing thoracic segments 1-3; Ab = abdomen. (<it>E</it>) The flow of information during this <it>de novo </it>transcriptome assembly project. Data files are represented as white boxes within grey boxes that indicate the computer programs used to generate these files. All of the computer programs used are freely available. Ortholog_best_hit_calculator.py and transcriptome_blast_summarizer.py are custom python scripts available at <url>http://www.extavourlab.com/protocols/index.html</url> (see text for details). Photograph in (<it>A</it>) courtesy of David Behl.</p>
</text><graphic file="1471-2164-12-61-1" hint_layout="double"/></fig>
<p>Here we present the results of the sequencing and <it>de novo </it>assembly of the <it>Oncopeltus </it>ovarian and early embryonic transcriptome. We outline an assembly and analysis framework using a combination of existing tools and freely available custom-made command line computational tools, which we hope will make this approach to gene discovery accessible to comparative developmental biologists. We identify homologues of genes involved in all major signaling pathways and developmental processes, including biologically verified splicing isoforms for some genes. We also address the need for library normalization in these studies, and show that at large enough scales of NGS, large numbers of developmental genes can be discovered even with omission of a normalization step.</p>
</sec>
<sec>
<st>
<p>Results and Discussion</p>
</st>
<sec>
<st>
<p>
<it>Assembling the ovarian and embryonic transcriptome of </it>O. fasciatus</p>
</st>
<p>We prepared cDNA from ovaries and early to mid-staged embryos of <it>O. fasciatus</it>, covering oogenesis and all major stages of embryonic patterning (Figure <figr fid="F1">1B-D</figr>). These cDNA samples were prepared using a protocol optimized for preparation of small or limiting samples for 454 pyrosequencing (see Materials and Methods). From these libraries, we generated a total of 2,087,410 sequence reads (Table <tblr tid="T1">1</tblr>). This includes reads generated using GS-FLX technology as well as both normalized (N) and non-normalized (NN) cDNA sequenced using the GS-FLX Titanium platform. As expected, the reads generated using GS-FLX Titanium technology were substantially longer than those generated using GS-FLX technology (Table <tblr tid="T1">1</tblr>, Figure <figr fid="F2">2A</figr>). However, the N sample gave an unexpectedly low number of reads, which were on average shorter than those generated by the NN sample (Table <tblr tid="T1">1</tblr>; Figure <figr fid="F2">2A</figr>). Given that a pilot run of one lane (1/8 plate) of this same normalized cDNA sample generated roughly equal number and size-distribution as a NN pilot study (Additional file <supplr sid="S1">1</supplr>), we suspect that a technical error reduced the sequencing efficiency of this plate. Despite the comparatively low yield of this normalized cDNA, it still generated more than 600,000 high quality reads that we therefore included in subsequent analyses.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Sources of <it>O. fasciatus </it>sequence reads.</p></caption><tblbdy cols="8">
      <r>
         <c ca="left">
            <p>
               <b>Tissue</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Normalized?</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>cDNA prep</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>454 Platform</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>No. Plates</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>No. Reads</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Median Read Length</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Accession #</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Ovary</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="left">
            <p>SMART</p>
         </c>
         <c ca="left">
            <p>GS-FLX</p>
         </c>
         <c ca="left">
            <p>&#188;</p>
         </c>
         <c ca="left">
            <p>65,394</p>
         </c>
         <c ca="center">
            <p>225</p>
         </c>
         <c ca="left">
            <p>SRR057570.2</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Embryonic</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="left">
            <p>SMART</p>
         </c>
         <c ca="left">
            <p>GS-FLX</p>
         </c>
         <c ca="left">
            <p>&#188;</p>
         </c>
         <c ca="left">
            <p>71,911</p>
         </c>
         <c ca="center">
            <p>230</p>
         </c>
         <c ca="left">
            <p>SRR057571.1</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Ovarian and Embryonic</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="left">
            <p>Modified SMART</p>
         </c>
         <c ca="left">
            <p>GS-FLX Titanium</p>
         </c>
         <c ca="left">
            <p>1 + &#188;</p>
         </c>
         <c ca="left">
            <p>656,782</p>
         </c>
         <c ca="center">
            <p>244</p>
         </c>
         <c ca="left">
            <p>SRR057572.1</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Ovarian and Embryonic</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="left">
            <p>Modified SMART</p>
         </c>
         <c ca="left">
            <p>GS-FLX Titanium</p>
         </c>
         <c ca="left">
            <p>1 + 1/8</p>
         </c>
         <c ca="left">
            <p>1,293,323</p>
         </c>
         <c ca="center">
            <p>313</p>
         </c>
         <c ca="left">
            <p>SRR057573.1</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Total</b>
            </p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>2 + 7/8</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>2,087,410</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>301</b>
            </p>
         </c>
         <c ca="left">
            <p>SRP002610.1</p>
         </c>
      </r>
   </tblbdy></tbl>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Effects of normalization and 454 sequencing chemistry on read length and isotig length</p></caption><text>
   <p><b>Effects of normalization and 454 sequencing chemistry on read length and isotig length</b>. (<it>A</it>) Titanium sequencing chemistry (grey, black) generally results in longer read lengths when compared with FLX chemistry (white). However, the normalized sample run with Titanium chemistry (black) had shorter read lengths than the non-normalized sample (grey). This result is likely due to a technical error in that particular sequencing run, since a 1/8 plate run of the same sample showed a read length distribution comparable to that of the non-normalized sample (Additional file <supplr sid="S1">1</supplr>). (<it>B</it>) Isotig length distributions from assemblies of Titanium-sequenced data. The longest isotig per isogroup is shown. The number of bases in the non-normalized (grey) and normalized (black) samples has been equalized to eliminate possible bias due to the greater number and length of reads obtained from the run of the normalized sample (see (<it>A</it>)). The isotigs generated from the normalized cDNA tended to be shorter than those produced by the non-normalized cDNA (see also Table 2). Pooling all FLX and Titanium reads generates an assembly with more, longer isotigs (blue).</p>
</text><graphic file="1471-2164-12-61-2" hint_layout="single"/></fig>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Normalized sample did not perform equally in pilot and full sequencing runs</b>. (<it>A</it>) For the normalized sample, the read lengths of the full plate sequencing runs (white) were shorter than those obtained by the 1/8 plate run (grey). (<it>B</it>) The read length distribution of the non-normalized sample was comparable for both 1/8 plate (grey) and full plate (white) sequencing runs.</p>
</text>
<file name="1471-2164-12-61-S1.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>We used the cDNA assembly algorithm of Newbler v2.3 (Roche) to screen the reads for adaptor sequence and then assemble the cleaned reads (see Note Added in Proof for a comparison with Newbler v2.5). After quality trimming and adapter screening, 2,041,966 reads (97.8%) were used in the assembly. Of these, 1,773,450 (86.9%) assembled either wholly or partially into contigs, and 178,770 (8.8%) remained as singletons. The remaining reads were excluded as either originating from repeat regions (9,875 reads; 0.05%), outliers (26,943 reads; 1.3%), or too short (&lt;50 base pairs: 52,928 reads; 2.6%).</p>
<p>To our knowledge, Newbler v2.3 and higher are the only assembly programs that address alternative splicing and can output multiple isoforms per gene. Newbler v2.3 explicitly accounts for alternative splicing by creating a hierarchical assembly composed of three elements: contigs, isotigs, and isogroups. For consistency, we follow their terminology. Contigs are stretches of assembled reads that are free of branching conflicts. In other words, contigs can be thought of as exons or sets of exons that are always co-transcribed. Isotigs represent a particular continuous path through a set of contigs, i.e. a transcript. An isogroup is the set of isotigs arising from the same set of contigs, i.e. a gene. Different isotigs within an isogroup are thought to represent alternative isoforms of the same gene. Note that it is possible for an isogroup to contain only one isotig, and it is also possible for an isotig to be composed of only one contig.</p>
<p>After the initial Newbler assembly, we noticed substantial redundancy among the singletons. We therefore subjected the 178,770 unassembled singletons to a secondary assembly with CAP3 <abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp>. This secondary assembly reduced the number of singletons from 178,770 to 112,531 (28,143 cap3_contigs and 84,388 cap3_singlets). Thus, in total, our assembly generated a total of 133,628 sequences, including isotigs, cap3_contigs and cap3_singlets (Table <tblr tid="T2">2</tblr>).</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p><it>O. fasciatus </it>transcriptome assembly statistics.</p></caption><tblbdy cols="4">
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>Full Assembly</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Normalized Assembly</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Non-Normalized Assembly</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Assembled reads (base pairs)</p>
         </c>
         <c ca="left">
            <p>1,773,450 (508,738,047)</p>
         </c>
         <c ca="left">
            <p>389,605 (84,353,140)</p>
         </c>
         <c ca="left">
            <p>336,568 (108,372,883)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Isogroups ("genes")</p>
         </c>
         <c ca="left">
            <p>16,617</p>
         </c>
         <c ca="left">
            <p>10,581</p>
         </c>
         <c ca="left">
            <p>7,591</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Isotigs ("transcripts")</p>
         </c>
         <c ca="left">
            <p>21,097</p>
         </c>
         <c ca="left">
            <p>11,353</p>
         </c>
         <c ca="left">
            <p>8,346</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Isotig N50</p>
         </c>
         <c ca="left">
            <p>1,735</p>
         </c>
         <c ca="left">
            <p>846</p>
         </c>
         <c ca="left">
            <p>1,162</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mean # isotigs per isogroup</p>
         </c>
         <c ca="left">
            <p>1.3</p>
         </c>
         <c ca="left">
            <p>1.1</p>
         </c>
         <c ca="left">
            <p>1.1</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Contigs ("exons")</p>
         </c>
         <c ca="left">
            <p>22,235</p>
         </c>
         <c ca="left">
            <p>11,839</p>
         </c>
         <c ca="left">
            <p>8,731</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mean # contigs per isotig</p>
         </c>
         <c ca="left">
            <p>1.9</p>
         </c>
         <c ca="left">
            <p>1.2</p>
         </c>
         <c ca="left">
            <p>1.3</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Singletons (singletons after secondary CAP3 assembly)</p>
         </c>
         <c ca="left">
            <p>178,770 (112,531)</p>
         </c>
         <c ca="left">
            <p>110,265 (N/A)</p>
         </c>
         <c ca="left">
            <p>52,585 (N/A)</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>To enable comparison, we equalized individual assemblies of Normalized and Non-Normalized samples to contain the same number of base pairs before assembly.</p>
   </tblfn></tbl>
<p>Our data assembled into 22,235 contigs, organized among 21,097 isotigs (Figure <figr fid="F2">2B</figr>). The isotig N50 length was 1,735 bp (in other words, 50% of the bases are incorporated into isotigs &#8805; 1,735 bp), and 14,460 (68.5%) of the isotigs contained only one contig. The 21,097 isotigs fell into 16,617 isogroups, of which 14,562 (87.6%) contain only one isotig (average number of isotigs per isogroup = 1.3).</p>
<p>The average coverage among contigs was 23.2 reads/bp (median coverage = 6.9 reads/bp) (Additional file <supplr sid="S2">2</supplr>). This coverage value is more than twice as high as the highest reported value from a <it>de novo </it>transcriptome assembly to date [summarized in <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>]. Such deep coverage should be helpful for overcoming the presence of insertion/deletion errors in the individual raw reads <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>.</p>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>Distribution of average coverage (reads/bp) within contigs in the <it>O. fasciatus </it>transcriptome</b>. The coverage within contigs is calculated by dividing the total number of base pairs contained in the reads used to construct a contig by the length of that contig. Note that Newbler v2.3 discards those contigs &lt;100 bp.</p>
</text>
<file name="1471-2164-12-61-S2.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>To test whether our assembly would have been aided by the inclusion of nucleotide sequence from <it>Rhodnius prolixus</it>, the most closely related hemipteran to <it>O. fasciatus </it>whose genome is currently being sequenced <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>, we used the BLASTN algorithm to compare our isotigs (the longest isotig per isogroup) with the published ESTs of <it>R. prolixus </it>with an e-value cut-off of 1e-6. Consistent with previous observations of extremely low levels of conservation between insect genomes <abbrgrp>
<abbr bid="B50">50</abbr>
</abbrgrp> we found that only 53 out of 16,617 isotigs had hits to <it>R. prolixus </it>ESTs on the nucleotide level. These results suggest that <it>de novo </it>sequencing and assembling efforts will be necessary for most insect species, even when sequence data are available for other members of the same order. We note, however, that a recent study <abbrgrp>
<abbr bid="B51">51</abbr>
</abbrgrp> has shown that it may be possible to incorporate EST data from different species into a <it>de novo </it>assembly by using amino acid sequence rather than nucleotide sequence.</p>
</sec>
<sec>
<st>
<p>Validation of predicted alternate isoforms</p>
</st>
<p>To examine whether the alternative isoforms predicted by Newbler v2.3 are in fact present in developing embryos of <it>O. fasciatus</it>, we first focused on a gene of particular interest to developmental biologists, <it>nanos</it>. This conserved metazoan gene was first described as a loss of function mutation in <it>Drosophila melanogaster </it>
<abbrgrp>
<abbr bid="B52">52</abbr>
</abbrgrp>, and is necessary for germ cell and posterior somatic development [reviewed in <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp>]. Newbler v2.3 predicted this gene to encode two alternative isotigs within a single isogroup (Figure <figr fid="F3">3B</figr>). The two isotigs differ in that the longer contains an additional 100-bp exon that is absent from the shorter (Figure <figr fid="F3">3B</figr>). We designed PCR primers against sequences present in both isotigs (Figure <figr fid="F3">3B</figr> arrows), which amplified two bands differing by ~100 bp from a pool of embryonic cDNA (Figure <figr fid="F3">3C</figr>). Sequencing of these two bands confirmed that they differ exactly as predicted by Newbler v2.3 (Figure <figr fid="F3">3D</figr>).</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Newbler 2.3 correctly identifies splicing isoforms of <it>nanos</it></p></caption><text>
   <p><b>Newbler 2.3 correctly identifies splicing isoforms of <it>nanos</it></b>. (<it>A</it>) Newbler v2.0 identified three separate contigs that map to an <it>O. fasciatus nanos </it>homologue that we had previously identified by degenerate PCR (Ewen-Campen &amp; Extavour, unpublished). Newbler v2.0 failed to identify these contigs as belonging to the same transcript because of branching conflicts amongst the reads joining these contigs. BLASTX against the RefSeq protein database identified only contig 31035 as being a putative <it>nanos </it>homologue; the other two contigs lie outside the conserved Nanos domain and obtain no BLAST hits. (<it>B</it>) Newbler v2.3 predicted that the same three contigs identified by Newbler v2.0 belonged to two isotigs, or splicing isoforms. (<it>C</it>) RT-PCR with specific primers F and R shown in (<it>B</it>) resulted in two bands of the predicted sizes of the isotigs predicted by Newbler v2.3. (<it>D</it>) Sequencing the bands from (<it>C</it>) revealed that they were identical to the sequences of the predicted isotigs from (<it>B</it>).</p>
</text><graphic file="1471-2164-12-61-3" hint_layout="single"/></fig>
<p>Importantly, a previous version of Newbler (v2.0), which does not account for alternative splicing, failed to join together the three fragments which were linked by Newbler v2.3 (Figure <figr fid="F3">3A</figr>). Because of this, Newbler v2.0 (and presumably other assemblers which do not address branching within contigs) predicted three separate contigs, only one of which could be identified as <it>nanos </it>with BLASTX, as the others fall in poorly conserved regions of the gene. Thus, the ability of Newbler2.3 to handle branching conflicts between reads allows this program to assemble longer continuous sequences, which are therefore in turn more easily annotated using BLAST.</p>
<p>To further characterize the accuracy of Newbler's predictions of alternative transcript isoforms, we randomly selected 10 isogroups that contained exactly two alternative isotigs differing by the presence/absence of a single contig (Additional file <supplr sid="S3">3</supplr>). As we did for <it>nanos</it>, we designed primers to flank the region differing between the two predicted isoforms (Additional file <supplr sid="S3">3A</supplr>), and performed RT-PCR on <it>O. fasciatus </it>embryonic cDNA. In eight of ten instances, we observed bands of the predicted sizes following agarose gel electrophoresis (Additional file <supplr sid="S3">3B,C</supplr>). However, in four of the eight positive cases, additional, unpredicted bands were present (Additional file <supplr sid="S3">3</supplr>). In one of the ten cases, we observed two RT-PCR products, but only one of them was of the predicted size (Additional file <supplr sid="S3">3C</supplr>, lane 6). Taken together, these results suggest that Newbler v2.3 has a low rate of false positives in the prediction of multiple splicing isoforms. Including our investigation of <it>nanos</it>, only one of 11 test cases (9.1%) produced a single RT-PCR product where Newbler v2.3 had predicted multiple products. However, we observed that roughly half of the time, Newbler v2.3 failed to predict all of the isoforms identified via RT-PCR.</p>
<suppl id="S3">
<title>
<p>Additional file 3</p>
</title>
<text>
<p>
<b>RT-PCR validation of bioinformatically predicted multiple isoforms</b>. (<it>A</it>) Schematic of experimental design. Ten isogroups were randomly selected, each containing exactly two isotigs that differed by the presence/absence of a single contig. PCR primers were designed to flank the differing region. (<it>B</it>) Band sizes predicted by Newbler v2.3 for ten randomly selected isogroups containing exactly two isotigs. <it>(C) </it>Agarose gel following RT-PCR using primers against the sequences described in (<it>B</it>). Ladder sizes are given in base pairs on the left. Blue arrowheads: bands of the sizes predicted by Newbler v2.3; red arrowheads: bands not predicted by Newbler v2.3.</p>
</text>
<file name="1471-2164-12-61-S3.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Transcriptome annotation</p>
</st>
<p>A BLASTN search of our dataset for the 93 existing GenBank accessions for <it>O. fasciatus </it>sequences yielded a hit result for 56% of the accessions, with an e-value cut-off of 1e-10. This result may be due in part to the short length of some of the GenBank sequences. Accordingly, we found that accessions with hits in the database were significantly longer (mean length 729 bp) than accessions without hits (mean length 397 bp) (unpaired Student's <it>t</it>-Test: <it>t </it>= 2.89, DF = 91, <it>p </it>= 0.0048). Of greater relevance to developmental applications of this dataset, however, was our finding that 85% of <it>O. fasciatus </it>developmental genes with existing GenBank accessions (n = 32) are represented in our transcriptome.</p>
<p>We then used BLASTX to map the 133,628 <it>O. fasciatus </it>sequences (isotigs, cap3_contigs and cap3_singletons) against the entire RefSeq Protein database with an e-value cut-off of 1e-10. To simplify these statistics, we report only the BLAST results for the longest isotig per isogroup, under the assumption that all isotigs within an isogroup share nearly identical BLAST results. Of 16,617 isotigs, 7,219 (43.4%) had at least one hit. Of the 28,143 cap3_contigs, 2,594 (9.2%) had hits, and of the 84,388 cap3_singlets, 2,367 (2.8%) had hits. These values are higher than comparable BLAST statistics of most other published studies of 454-generated <it>de novo </it>transcriptomes <abbrgrp>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
<abbr bid="B26">26</abbr>
<abbr bid="B30">30</abbr>
<abbr bid="B32">32</abbr>
<abbr bid="B33">33</abbr>
</abbrgrp>, likely because deeper sequencing increases the length of assembled sequences and thereby makes these sequences more likely to be identified via BLAST. The unidentifiable sequences likely originate from UTRs or non-conserved portions of protein-coding sequences. Of the top BLAST hits, 89.3% were genes from arthropod sequences (Additional file <supplr sid="S4">4</supplr>). Of the 12,180 <it>O. fasciatus </it>sequences with BLAST hits, 1,455 hit non-overlapping segments of the same top BLAST hit (i.e. potentially unassembled portions of the same transcript), and 825 hit overlapping segments of the same top BLAST hit (i.e. potential paralogs). Excluding those 1,455 potentially double-counted BLAST hits, our transcriptome identified a total of 10,775 genes. The assembled sequences generated in this study, as well as pre-computed BLAST results, are available as flat files from the authors upon request.</p>
<suppl id="S4">
<title>
<p>Additional file 4</p>
</title>
<text>
<p>
<b>Identity of taxa with top BLAST hits</b>. "Isotigs" refers only to the longest isotig of each isogroup; "Singletons" refers to the Newbler-generated singletons after secondary CAP3 assembly. The category "other" is the summation of all those species obtaining very low numbers of BLAST hits.</p>
</text>
<file name="1471-2164-12-61-S4.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>To explore and summarize the functional categories of the genes sequenced in this study, we obtained the Gene Ontology (GO) terms associated with the top 20 BLAST hits of each sequence using Blast2GO <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp>. Among the 7,059 genes for which we obtained GO terms, we observed a wide diversity of functional categories represented on all levels of the Gene Ontology database (Figure <figr fid="F4">4</figr>). The <it>O. fasciatus </it>sequences fall into GO categories with a roughly similar distribution to that of the well-annotated <it>Drosophila melanogaster </it>genome, suggesting that our sequence data contain a large diversity of genes involved in a variety of biological processes, and do not contain any notable biases towards particular categories of genes.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>GO term distribution of BLAST hits from the <it>O. fasciatus </it>transcriptome compared with those from the <it>D. melanogaster </it>genome</p></caption><text>
   <p><b>GO term distribution of BLAST hits from the <it>O. fasciatus </it>transcriptome compared with those from the <it>D. melanogaster </it>genome</b>. Several GO categories are shown within the top-level divisions of Biological Process, Molecular Function, and Cellular Component. Column heights reflect the percentage of annotated sequences in each assembly that mapped to a given Biological Process GO term. The relative percentages of genes falling into GO categories are comparable between our <it>O. fasciatus </it>transcriptome (black) and the <it>D. melanogaster </it>transcriptome (white).</p>
</text><graphic file="1471-2164-12-61-4" hint_layout="double"/></fig>
</sec>
<sec>
<st>
<p>
<it>Assessing coverage of the </it>O. fasciatus <it>transcriptome</it>
</p>
</st>
<p>We wished to know how thoroughly our sequencing efforts sampled the true diversity of transcripts present in our cDNA samples. This is a two-part question: first, of the genes truly expressed during <it>O. fasciatus </it>oogenesis and embryogenesis, how many did we identify? And second, of these identified genes, how thoroughly had we assembled their full-length transcripts?</p>
<p>To address the first question, we created eight separate assemblies of progressively larger sub-samples of our total reads and tallied the total number of genes identified via BLASTX. The number of newly discovered genes began to plateau after ~1.5M reads (1 7/8 plates in our case) (Figure <figr fid="F5">5</figr> black line). However, the N50 isotig length continued to increase roughly linearly over this range of reads (Figure <figr fid="F5">5</figr> grey line). These results suggest that additional sequencing of this sample is unlikely to identify substantially more genes, but may continue to lengthen the existing sequences. Although in the absence of a sequenced genome it is not possible to accurately estimate how many genes are in fact present in the <it>O. fasciatus </it>transcriptome, we note that while several developmental genes of interest were identified in this study, others were not. (Tables <tblr tid="T3">3</tblr>, <tblr tid="T4">4</tblr> and see below). Because these data suggest that we have sequenced these specific cDNA samples quite deeply, some form of specific target enrichment may be necessary for future attempts to discover additional genes not identified in this dataset.</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Assessing coverage of the <it>O. fasciatus </it>transcriptome</p></caption><text>
   <p><b>Assessing coverage of the <it>O. fasciatus </it>transcriptome</b>. Randomly chosen subsets of increasing numbers of Titanium reads were used to generate progressively larger sub-assemblies. The number of reads in each sub-assembly (X axis) is plotted against the number of unique BLAST hits in each sub-assembly (left Y axis: black), and against the N50 isotig length (right Y axis: grey). For this analysis BLAST was performed against the SwissProt database. The number of unique BLAST hits plateaus when the assembly is composed of approximately 1.5 million reads. However, the N50 isotig length maintains an approximately constant rate of increase.</p>
</text><graphic file="1471-2164-12-61-5" hint_layout="single"/></fig>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>Selected signaling pathway genes identified in the <it>O. fasciatus </it>transcriptome.</p></caption><tblbdy cols="6">
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c cspan="2" ca="center">
            <p><b>Present in</b>:</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Pathway</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b># Hits</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Hit ID (I/C/S)</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Length (range)</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Normalized</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Non-Normalized</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>HEDGEHOG</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>cubitus interruptus</it>
            </p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>I,S</p>
         </c>
         <c ca="center">
            <p>225-906</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>fused</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>516-1582</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>patched</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>C, S</p>
         </c>
         <c ca="center">
            <p>225-418</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>smoothened</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1270-1604</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>JAK/STAT</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>domeless</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>4028</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>hopscotch (janus kinase)</it>
            </p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>I, C</p>
         </c>
         <c ca="center">
            <p>473-2644</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Signal transducer and activator of transcription</it>
            </p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>444-3270</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>NFKB/TOLL</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>cactus</it>
            </p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>I, C</p>
         </c>
         <c ca="center">
            <p>629-1748</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>dorsal (Nuclear factor NF-kappa-B)</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1308-3926</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>relish</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>2650</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Toll</it>
            </p>
         </c>
         <c ca="center">
            <p>11</p>
         </c>
         <c ca="center">
            <p>I, C, S</p>
         </c>
         <c ca="center">
            <p>215-4323</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>NOTCH</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>fringe</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>877</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Hairless</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1053</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>hairy (Enhancer of split/HES-1)</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>2530</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>mind bomb</it>
            </p>
         </c>
         <c ca="center">
            <p>7 (6<sup>&#8224;</sup>)</p>
         </c>
         <c ca="center">
            <p>I,C,S</p>
         </c>
         <c ca="center">
            <p>335-1185</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Notch</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>S</p>
         </c>
         <c ca="center">
            <p>235</p>
         </c>
         <c ca="center">
            <p>Y*</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Notchless</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>2035</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Presenilin</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1661</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Serrate/Jagged</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>S</p>
         </c>
         <c ca="center">
            <p>246-300</p>
         </c>
         <c ca="center">
            <p>Y*</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>strawberry notch</it>
            </p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>I,S</p>
         </c>
         <c ca="center">
            <p>191-3519</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Suppressor of Hairless</it>
            </p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>I,C</p>
         </c>
         <c ca="center">
            <p>375-697</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>WNT</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>armadillo</it>
            </p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>I,S</p>
         </c>
         <c ca="center">
            <p>348-3001</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>dishevelled</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>954-1321</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>frizzled</it>
            </p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>C,S</p>
         </c>
         <c ca="center">
            <p>194-500</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Wnt family (wingless, WNTs)</it>
            </p>
         </c>
         <c ca="center">
            <p>6</p>
         </c>
         <c ca="center">
            <p>C,S</p>
         </c>
         <c ca="center">
            <p>207-508</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>TGF-BETA</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>decapentaplegic (BMP2/4)</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>547</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>glass bottom boat (BMP5/7)</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>510-737</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>SMADs (Mad, Smad2/3, Smad4/Medea)</it>
            </p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>I,C</p>
         </c>
         <c ca="center">
            <p>276-2276</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Type I Receptor (saxophone/thickveins/activin receptor type I)</it>
            </p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>I,C</p>
         </c>
         <c ca="center">
            <p>236-2466</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Type II Receptor (punt, wishful thinking)</it>
            </p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>259-5038</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>RECEPTOR TYROSINE KINASES</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Epidermal growth factor receptor</it>
            </p>
         </c>
         <c ca="center">
            <p>7 (5<sup>&#8224;</sup>)</p>
         </c>
         <c ca="center">
            <p>I,C,S</p>
         </c>
         <c ca="center">
            <p>229-715</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>rhomboid</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>229-602</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="6" ca="left">
            <p>HORMONE SIGNALING (ECDYSONE, NUCLEAR HORMONE)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>disembodied (ecdysteroidogenic P450)</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1835</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Ecdysone receptor</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I,C</p>
         </c>
         <c ca="center">
            <p>231-1393</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>E75</it>
            </p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>I,S</p>
         </c>
         <c ca="center">
            <p>257-649</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Ecdysone-induced protein 63E</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1479</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>ecdysoneless</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>4158</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Nuclear hormone receptor E78</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>3150</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Nuclear hormone receptor HR3</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>529-737</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>phantom (cytochrome P450 306a1)</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>344-575</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>shade (cytochrome 450 314A1)</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>2125</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>shadow (cytochrome 450 315A1)</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1650</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>ultraspiracle nuclear receptor</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>245</p>
         </c>
         <c ca="center">
            <p>Y*</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>without children</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1155-1357</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Hit ID indicates if gene hits were found among isotigs (I), Cap3-assembled contigs (C), or unassembled singletons (S). Sequence length (range) indicates the shortest and longest S, C or I hit sequences for each gene. These results were generated by BLASTing the raw reads from the N and NN samples against the full assembly. When multiple sequences were obtained via name search, they were tested to see whether they could be made to form a contig with Sequencher or CLC Combined Workbench (see Methods). Asterisk indicates hits only present in normalized GS-FLX reads. X(Y<sup>&#8224;</sup>) indicates that the X sequences with hits could be assembled into Y contigs.</p>
   </tblfn></tbl>
<tbl id="T4"><title><p>Table 4</p></title><caption><p>Selected developmental process genes identified in the <it>O. fasciatu</it><it>s </it>transcriptome.</p></caption><tblbdy cols="6">
      <r>
         <c ca="left">
            <p>
               <b>Process</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b># Hits</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Hit ID (I/C/S)</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Length (range)</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Normalized</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Non-Normalized</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GERM PLASM</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Argonaute 3</it>
            </p>
         </c>
         <c ca="center">
            <p>2 (1<sup>&#8224;</sup>)</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>2042-2231</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>germ cell-less</it>
            </p>
         </c>
         <c ca="center">
            <p>2 (1<sup>&#8224;</sup>)</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>630-1817</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>maelstrom</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>994</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>nanos</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1961</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>piwi/aubergine</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>2888</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>pumilio</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>424-2574</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>staufen</it>
            </p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>599-2100</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Tudor</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>2719-3299</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>vasa</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>330</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="6" ca="left">
            <p>ANTERIOR-POSTERIOR DETERMINATION</p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GAP</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>hunchback</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1429</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Kruppel</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>S</p>
         </c>
         <c ca="center">
            <p>250</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>ocelliless (orthodenticle)</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>S</p>
         </c>
         <c ca="center">
            <p>207</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>TERMINAL GROUP</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>huckebein</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>589</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>torso-like</it>
            </p>
         </c>
         <c ca="center">
            <p>2 (1<sup>&#8224;</sup>)</p>
         </c>
         <c ca="center">
            <p>I,C</p>
         </c>
         <c ca="center">
            <p>430-1868</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>PAIR RULE</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>fushi tarazu</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>788</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>hairy (Enhancer of split/HES-1)</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>2530</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>odd skipped</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>346</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SEGMENT POLARITY</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>armadillo</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>I,S</p>
         </c>
         <c ca="center">
            <p>348-3001</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>cubitus interruptus</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>I,S</p>
         </c>
         <c ca="center">
            <p>225-906</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>engrailed</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>S</p>
         </c>
         <c ca="center">
            <p>227</p>
         </c>
         <c ca="center">
            <p>Y*</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>fused</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>516-1582</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>pangolin</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I,C</p>
         </c>
         <c ca="center">
            <p>492-544</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>patched</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>C, S</p>
         </c>
         <c ca="center">
            <p>225-418</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>Wnt family (wingless, Wnts)</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>6</p>
         </c>
         <c ca="center">
            <p>C,S</p>
         </c>
         <c ca="center">
            <p>207-508</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>DORSO-VENTRAL AXIS</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>cactus</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>I, C</p>
         </c>
         <c ca="center">
            <p>629-1748</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>decapentaplegic (BMP2/4)</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>547</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>gastrulation-defective</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1773</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>nudel</it>
            </p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>I,S</p>
         </c>
         <c ca="center">
            <p>322-1458</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>pipe</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>266</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>short gastrulation</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>254-615</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>snake</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1789</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>sp&#228;tzle</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>993-3170</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>Toll</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>11</p>
         </c>
         <c ca="center">
            <p>I, C, S</p>
         </c>
         <c ca="center">
            <p>215-4323</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>MOLTING/METAMORPHOSIS</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>cuticular proteins (including CP 49Ae and adult cuticle protein)</it>
            </p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>I,C</p>
         </c>
         <c ca="center">
            <p>404-566</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>disembodied (ecdysteroidogenic P450)</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1835</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>Ecdysone receptor</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I,C</p>
         </c>
         <c ca="center">
            <p>231-1393</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>E75</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>I,S</p>
         </c>
         <c ca="center">
            <p>257-649</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>Ecdysone-induced protein 63E</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1479</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>ecdysoneless</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>4158</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>ftz transcription factor 1</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>807</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>hormone receptor 4</it>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1003-2114</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>juvenile hormone acid methyltransferase</it>
            </p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>548-2871</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>juvenile hormone binding protein</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1099</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>juvenile hormone epoxide hydrolase</it>
            </p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>I,S</p>
         </c>
         <c ca="center">
            <p>255-2859</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>juvenile hormone esterase</it>
            </p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>850-2382</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>juvenile hormone esterase binding protein</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1057</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Juvenile hormone-inducible protein</it>
            </p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>456-2757</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Methoprene-tolerant</it>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>3415</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>Nuclear hormone receptor E78</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>3150</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>Nuclear hormone receptor HR3</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>529-737</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>phantom (cytochrome P450 306a1)</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>344-575</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>shade (cytochrome 450 314A1)</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>2125</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>shadow (cytochrome 450 315A1)</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1650</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>takeout</it>
            </p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>591-1011</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>ultraspiracle nuclear receptor</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>C</p>
         </c>
         <c ca="center">
            <p>245</p>
         </c>
         <c ca="center">
            <p>Y*</p>
         </c>
         <c ca="center">
            <p>N</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>
                  <it>without children</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>I</p>
         </c>
         <c ca="center">
            <p>1155-1357</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
         <c ca="center">
            <p>Y</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Hit ID indicates if gene hits were found among isotigs (I), CAP3-assembled contigs (C), or unassembled singletons (S). Sequence length (range) indicates the shortest and longest S, C or I hit sequences for each gene. These results were generated by BLASTing the raw reads from the N and NN samples against the full assembly. When multiple sequences were obtained via name search, they were tested to see whether they could be made to form a contig with Sequencher or CLC Combined Workbench (see Methods). Asterisk indicates hits only present in normalized GS-FLX reads. X(Y<sup>&#8224;</sup>) indicates that the X sequences with hits could be assembled into Y contigs. <b>Boldface </b>indicates genes also present in Table 3.</p>
   </tblfn></tbl>
<p>To address the second question, we employed a method proposed by O'Neil and colleagues <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp> for addressing the question of how closely our sequences approached full-length transcripts. Their metric, the "ortholog hit ratio," compares the length of the newly discovered sequence that obtains a BLAST hit versus the full length of its top hit <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. Thus, an ortholog hit ratio of one implies that a transcript has been assembled to its true full length, while values over one suggest insertions in the query sequence relative to its top BLAST hit. We note the caveat that many genes contain relatively poorly conserved regions that may fail to obtain a BLAST hit at all, causing the ortholog hit ratio to be an underestimate in these cases (Additional file <supplr sid="S5">5</supplr>). In our dataset, many of the <it>O. fasciatus </it>isotigs appear to be nearly fully assembled, while the singletons predictably tend to represent small portions of their top BLAST hit in RefSeq (Figure <figr fid="F6">6</figr>). In total, of the 7,219 isotigs with BLAST hits, 3,953 (54.8%) had ratios &gt; 0.5 and 2,689 (37.2%) had ratios &gt; 0.8.</p>
<suppl id="S5">
<title>
<p>Additional file 5</p>
</title>
<text>
<p>
<b>
<it>O. fasciatus </it>assembly isotigs have ortholog hit ratios similar to predictions from fully genome-sequenced databases</b>. When isotigs from the <it>O. fasciatus </it>transcriptome are BLASTed against the RefSeq protein database, ortholog hit ratios show a similar profile to those obtained when the complete <it>Acyrthosiphon pisum </it>gene prediction set (downloaded from <url>http://www.aphidbase.com/aphidbase/downloads/</url>) is BLASTed against the predicted gene set of <it>Drosophila melanogaster </it>(r5.28 downloaded from <url>ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/</url>) with an e-value cut-off of 1e-10.</p>
</text>
<file name="1471-2164-12-61-S5.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>Ortholog hit ratio analysis of isotigs and CAP3-reassembled singletons</p></caption><text>
   <p><b>Ortholog hit ratio analysis of isotigs and CAP3-reassembled singletons</b>. An ortholog hit ratio of one implies that a transcript has been assembled to its true full length. For isotigs (black), a majority (54.8%) appear to contain at least 50% of the full length transcript sequence (arrow), while over one-third (37.2%) appear to represent at least 80% of the full length transcript sequence (arrowhead). Most singletons (grey) represent much smaller percentages of full-length transcripts.</p>
</text><graphic file="1471-2164-12-61-6" hint_layout="single"/></fig>
<p>We also asked, for those <it>O. fasciatus </it>sequences of developmental genes already present in GenBank that overlapped with transcriptome hits (n = 23), whether our transcriptome data provided any net gain in transcript sequence compared to the GenBank accession sequence. In 15/23 cases (68%), the transcriptome data extended the known sequence beyond that reported in GenBank by an average of 349 bp (range: 82-1,366 bp). In most cases, additional 3' sequence was obtained (Figure <figr fid="F7">7</figr>).</p>
<fig id="F7"><title><p>Figure 7</p></title><caption><p>The <it>O. fasciatus </it>transcriptome adds sequence data to existing GenBank accessions, which in turn improves annotation of transcriptome sequences</p></caption><text>
   <p><b>The <it>O. fasciatus </it>transcriptome adds sequence data to existing GenBank accessions, which in turn improves annotation of transcriptome sequences</b>. <it>(A) </it>Extended contig for <it>Of-hunchback </it>(bottom), comprising the complete mRNA GenBank accession (top, light grey), two isotigs and one CAP3 contig from the transcriptome (middle, dark grey). The largest isotig provides an additional 252 bp of 3' UTR sequence to the GenBank sequence (black). Comparison with the GenBank sequence enabled isotig 08619 and cap3_contig 21314 to be assembled into the same contig. <it>(B) </it>Extended contig for <it>Of-homothorax </it>(bottom), with a partial mRNA GenBank accession (top, light grey) and two transcriptome isotigs (middle, dark grey). Both isotigs extend beyond the known GenBank sequence at the 3' and 5' ends, extending the known region by 449 bp in total (black). Both isotigs had been identified as <it>homothorax</it>, and because they did not overlap, they were classified as belonging to the same transcript rather than being paralogs. The GenBank sequence bridges an 87 bp gap between the isotigs, confirming that both sequences are fragments of a single gene.</p>
</text><graphic file="1471-2164-12-61-7" hint_layout="single"/></fig>
</sec>
<sec>
<st>
<p>Assessing the value of cDNA normalization</p>
</st>
<p>Reducing the representation of highly abundant transcripts (i.e. normalizing the cDNA) is often considered essential to capture sequence from genes expressed at lower levels, including many important developmental genes [see for example <abbrgrp>
<abbr bid="B55">55</abbr>
<abbr bid="B56">56</abbr>
<abbr bid="B57">57</abbr>
</abbrgrp>]. However, we hypothesized that current next-generation sequencing technologies could provide sufficiently deep sequence to render normalization largely unnecessary for construction of <it>de novo </it>transcriptomes for comparative developmental biologists. To address this question, we assessed the relative contribution of the N and NN cDNA to our final assembly using several strategies.</p>
<p>First, to test whether our normalization protocol successfully reduced the presence of highly abundant transcripts, we created separate assemblies from the N and NN cDNA samples (equalizing the total number of bases to reduce the contribution of additional sequence found in the NN sample). The N assembly contained a greater number of isotigs that were shorter on average than those in the NN assembly (Figure <figr fid="F2">2B</figr>). Additionally, more singletons were generated in the N assembly relative to the NN assembly (Table <tblr tid="T2">2</tblr>). Further, similar to the results obtained by Bellin and colleagues <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>, we observed the predicted decrease in the maximum number of reads per contig in the N assembly compared to the NN assembly (Figure <figr fid="F8">8A, B</figr>), demonstrating that the normalization procedure successfully reduced the sequencing of highly abundant transcripts. These statistics, which could be interpreted to suggest that the N reads generated an inferior assembly, may result from the shorter average length of reads in the N sample (Figure <figr fid="F2">2A</figr>). Indeed, Newbler rejected 7.9% (30,780) of the N reads as too short, compared to only 1% (3,935) of the NN reads. However, these assembly statistics could also indicate greater heterogeneity in the N sample, which would suggest that normalization might increase the number of new genes identified.</p>
<fig id="F8"><title><p>Figure 8</p></title><caption><p>Normalization decreases coverage of highly abundant genes, but does not change the GO term distribution of contigs</p></caption><text>
   <p><b>Normalization decreases coverage of highly abundant genes, but does not change the GO term distribution of contigs</b>. In both samples, most contigs are composed of &lt;10<sup>2 </sup>reads. However, the non-normalized sample (<it>A</it>) contains contigs with many more reads per contigs than the normalized sample (<it>B</it>). In other words, normalization preferentially decreases the number of reads of those contigs with the most reads. <it>(C) </it>GO term distributions do not differ dramatically between pyrosequenced libraries of N versus NN cDNA. However, see Additional file <supplr sid="S6">6</supplr> for exceptions. Column heights reflect the percentage of annotated sequences in each assembly that mapped to a given GO term. Note that the GO terms shown represent the results of mapping the N and NN reads against the complete assembly, rather than those obtained via independent assemblies of N and NN reads.</p>
</text><graphic file="1471-2164-12-61-8" hint_layout="double"/></fig>
<p>To discriminate between these possibilities, we explored the contribution of the N and NN reads to the genes discovered in our full assembly. We used BLASTN to map one plate's worth of raw reads from the N sample and from the NN sample (equalized to contain the same number of base pairs) against the complete assembled transcriptome, with an e-value cut-off of 1e-4. We then explored the GO annotation of those genes hit exclusively by only one of these two samples. We observed similar overall GO term distributions between the N and NN samples (Figure <figr fid="F8">8C</figr>). We found that a small number of GO terms (n = 20) were significantly differentially represented in the two samples, albeit generally with very few sequences in each GO term (Additional file <supplr sid="S6">6</supplr>). For example, we were surprised to see that three of the four terms statistically over-represented in the N sample were related to ribosome function (14/750 (1.9%) of the N hits were annotated with 'ribosomal subunit', compared to 1/1124 (0.09%) NN hits; FDR-corrected <it>p</it>-value = 0.006). In contrast, several terms related to active transmembrane transport were over-represented in the NN sample (Additional file <supplr sid="S6">6</supplr>) possibly indicating that normalization may have reduced the representation of genes involved in certain basic metabolic processes.</p>
<suppl id="S6">
<title>
<p>Additional file 6</p>
</title>
<text>
<p>
<b>GO terms enriched in Normalized (N) and Non-Normalized (NN) cDNA samples</b>. N (assembly generated from full plate of normalized cDNA) and NN (assembly generated from an equalized number of base pairs of non-normalized cDNA) reads were BLASTed against the full transcriptome assembly, and the results were used to generate "test" and "reference" sets for a Fisher's Exact Test. FDR: false discovery rate.</p>
</text>
<file name="1471-2164-12-61-S6.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>As an additional way to investigate the contribution of the N and NN samples to identifying specific genes of interest for our studies, we manually examined the results of mapping the N and NN samples to the fully assembled transcriptome. Of the 79 genes of interest that we investigated, four (5.1%) were uniquely present in the N sample, whereas nine (11.4%) were uniquely present in the NN sample, and the remaining 66 (83.6%) were present in reads of both the N and NN samples (Tables <tblr tid="T3">3</tblr>, <tblr tid="T4">4</tblr>). Although this may be an artifact of sequencing depth (i.e. low-abundance genes of interest may be present in only one of the two cDNA samples simply due to sampling effects rather than the normalization protocol <it>per se</it>), our data suggest that the normalized cDNA sample did not contribute disproportionately to gene discovery.</p>
</sec>
<sec>
<st>
<p>Gene discovery for developmental studies</p>
</st>
<p>The ultimate goal of this sequencing project was to identify a wide diversity of candidate genes involved in developmental processes. Traditionally, such gene discovery in "non-model" organisms has required degenerate PCR, which is labor-intensive, expensive, and prone to failure. The annotated transcriptome assembly we present here allows researchers to identify genes of interest via simple text searches, or via BLAST searches. To demonstrate the usefulness of these data for large-scale gene discovery, we report here the identification of several components from each of the seven widely studied metazoan signaling pathways (Table <tblr tid="T3">3</tblr>) as well as many genes involved in specific developmental processes (Table <tblr tid="T4">4</tblr>). We note that the majority of these gene fragments are of suitable length for immediate application of such widely used techniques as <it>in situ </it>hybridization and RNAi-based functional knockdown. In cases of functional experiments where full-length proteins are desirable, such as protein overexpression, RACE PCR will likely be required. Importantly, we note that many genes of interest were present among the singletons, many of which are long enough for immediate use as sequences for <it>in situ </it>hybridization probes or RNAi templates, emphasizing the importance of including these in NGS gene discovery studies.</p>
<p>Although we identified a diverse array of genes, some well-studied genes known to be expressed during embryogenesis were not easily identified in this study. For example, our BLAST results only contained three genes from the Hox cluster (<it>fushi tarazu, Antennapedia</it>, and <it>Abdominal-B</it>), although orthologs of all the canonical arthropod Hox genes are known to be present in <it>O. fasciatus </it>
<abbrgrp>
<abbr bid="B58">58</abbr>
</abbrgrp>. However, using the <it>O. fasciatus </it>Hox gene sequence fragments available from NCBI as a BLAST query against our transcriptome did reveal sequences for all Hox genes except <it>Sex combs reduced</it>. It is possible that these genes are expressed at very low levels during the developmental stages sampled here, suggesting that enrichment techniques may be necessary to more easily identify certain genes of interest. We do note, however, that <it>fushi tarazu</it>, the only Hox cluster gene not previously identified in <it>O. fasciatus</it>, was identified in both N and NN samples of this transcriptome dataset (Table <tblr tid="T4">4</tblr>).</p>
</sec>
<sec>
<st>
<p>Case study: gene discovery for endocrine regulation of development</p>
</st>
<p>In addition to surveying the transcriptome for genes involved in embryonic patterning and other developmental processes, we asked whether we could also identify genes known to be employed in biological processes during postembryonic development of holometabolous insects. Recent studies have suggested that many of the genes used during holometabolous insect metamorphosis may also play important roles during embryogenesis in hemimetabolous insects <abbrgrp>
<abbr bid="B59">59</abbr>
<abbr bid="B60">60</abbr>
</abbrgrp>. To investigate this, we searched the <it>O. fasciatus </it>transcriptome for expression of key ecdysteroid- and juvenile hormone (JH)-related genes. We identified transcripts for many of the known ecdysteroid biosynthesis genes, including cytochrome P450 genes encoded by the <it>Drosophila </it>Halloween family, such as <it>shade </it>(CYP314A1), <it>shadow </it>(CYP315A1)<it>, phantom </it>(CYP306A1) and <it>disembodied </it>(CYP302A1) (Table <tblr tid="T4">4</tblr>). We also detected expression of ecdysone response genes. In particular, we identified many of the ecdysone-regulated genes that play key roles during molting and metamorphosis, including <it>E75</it>, <it>HR3</it>, and <it>HR4 </it>(Table <tblr tid="T4">4</tblr>). The presence of these genes in the ovaries and early embryos of <it>O. fasciatus </it>corroborates recent studies that implicate ecdysone-response genes in key developmental processes during embryogenesis <abbrgrp>
<abbr bid="B59">59</abbr>
<abbr bid="B60">60</abbr>
<abbr bid="B61">61</abbr>
</abbrgrp>. As might be expected for a situation where ecdysone regulates embryonic development but not molting, transcripts encoding insect peptide hormones implicated in eclosion behavior, such as ecdysis-triggering hormone, eclosion hormone and crustacean cardioactive peptide, were not detected. JH biosynthesis and response genes were also isolated (Table <tblr tid="T4">4</tblr>). JH has been shown to play a role in promoting embryonic development and tissue maturation <abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp>. The expression of these genes, together with that of JH esterase and JH binding proteins, is consistent with previous studies implicating tight control of JH during embryogenesis <abbrgrp>
<abbr bid="B63">63</abbr>
</abbrgrp>.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>We have used 454 pyrosequencing to create an early developmental transcriptome for the milkweed bug <it>O. fasciatus </it>in the absence of a reference genome. Although genomic sequence data will be necessary in the future for linkage or <it>cis</it>-regulatory analyses, at the early stages of establishing new model organisms, one of the most important goals is often gene discovery. In this regard, while no transcriptome generated in this way can realistically be "complete" in the sense of containing full length transcripts for all expressed genes, we propose that for many evolutionary developmental biology studies, the approach described here is a useful one for fast, high-throughput gene discovery. A high priority for comparative developmental biology research is gene expression and function analyses. By sequencing at great depth and testing a variety of cDNA preparation methods (normalized, non-normalized, embryo- and ovary-specific), we have generated tens of thousands of gene sequences of sufficient lengths for the commonly used developmental techniques of <it>in </it>
<it>situ </it>hybridization and RNAi-mediated gene knockdown. These data can also be used for phylogenetic, population genetic, and functional genomic applications, provide a starting point for identification of genomic regulatory sequences, and assist with assembly of hemipteran genomes sequenced in the future.</p>
<sec>
<st>
<p>Note added in Proof</p>
</st>
<p>While this article was in review, Kumar and Blaxter <abbrgrp>
<abbr bid="B64">64</abbr>
</abbrgrp> published a comparison of <it>de novo </it>assemblers for 454 transcriptome data, and reported important shortcomings of Newbler v2.3 compared to other available assemblers. Specifically, the authors reported that Newbler v2.3 produced the smallest assembly (i.e. the smallest number of base pairs incorporated into contigs) of the assemblers tested. The authors argue that this poor performance is likely because Newbler v2.3 inexplicably discards portions of read overlap information. In contrast, a newer, currently unreleased version of Newbler, v2.5, produced the most complete assembly of all those tested. Kumar and Blaxter (2010) therefore strongly advise all <it>de novo </it>454 transcriptome assembly projects which have used Newbler v2.3 to recompute their assemblies with Newbler v2.5.</p>
<p>To address this concern, we obtained a pre-release version of Newbler v2.5 from Roche and reassembled the <it>O. fasciatus </it>data, again using the -nosplit flag. In contrast to Kumar and Blaxter (2010), we observed much less dramatic differences between the assemblies produced by Newbler v2.3 and Newbler v2.5 (Additional file <supplr sid="S7">7</supplr>). For example, Kumar and Blaxter (2010) report that Newbler v2.5 increased their total assembly size by 39% compared to Newbler v2.3. For the <it>O. fasciatus </it>data analyzed here, Newbler v2.5 increased the total assembly size by less than 1% (Additional file <supplr sid="S7">7</supplr>). Further, we observed very similar numbers of isogroups, isotigs, and singletons between the two assemblies (Additional file <supplr sid="S7">7</supplr>). We did observe a 16% increase in the number of contigs reported by Newbler v2.5, but this difference was markedly less than the 80% increase observed in the data analyzed by Kumar and Blaxter (2010). After BLASTing all of the assembled isotigs and cap3-assembled singletons against the RefSeq database, we identified a total of 10,886 unique BLAST hits, compared to 10,775 genes identified using Newbler v2.3.</p>
<suppl id="S7">
<title>
<p>Additional file 7</p>
</title>
<text>
<p>
<b>Comparison of <it>de novo </it>transcriptome assemblies produced by Newbler v2.3 and Newbler v2.5</b>. Number of BLASTx hits reflects a search against RefSeq Protein database with an e-value cut-off value of 1e-10.</p>
</text>
<file name="1471-2164-12-61-S7.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>These results suggest that, although we did observe a modest increase in assembly size using Newbler v2.5, the analyses presented in the current study are largely robust against differences between currently available versions of Newbler. One possible explanation for the difference between these results and those observed by Kumar and Blaxter (2010), is the greater sequencing depth performed in the current study. If in fact the poor performance of Newbler v2.3 involves discarding information in regions of low coverage, the fact that our dataset includes ~2.4x more reads than that analyzed by Kumar and Blaxter (2010) may explain the reduced improvement that Newbler v2.5 provided our dataset. We also suggest that the reduced number of genes identified via BLAST observed by Kumar and Blaxter (their Table five) may result from the fact that the authors excluded singletons from their analyses. If Newbler v2.3 indeed fails to assemble regions of low coverage and instead retains those reads as singletons, many genes of interest may only be present as singletons. Indeed, we observed many genes of interest exclusively represented as singletons (Tables <tblr tid="T3">3</tblr> and <tblr tid="T4">4</tblr>). Thus, for the purpose of gene discovery, we emphasize that future <it>de novo </it>transcriptome projects should analyze singletons as an important source of useful gene sequence.</p>
<p>Although our results do not appear to be greatly sensitive to which version of Newbler is used, we agree with Kumar and Blaxter (2010) that future transcriptome project should use utilize the most current available version of Newbler, or whichever assembler algorithm they find most useful for their data.</p>
</sec>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Animal culture</p>
</st>
<p>The <it>O. fasciatus </it>specimens sequenced in this study were originally purchased from the Carolina Biological Supply Company (Burlington, NC) and were maintained in the laboratory on sunflower seeds under a 12h:12h light/dark cycle at 28&#176;C.</p>
</sec>
<sec>
<st>
<p>cDNA Synthesis</p>
</st>
<p>For our pilot study using the GS-FLX platform, total RNA was isolated from mature ovaries (Figure <figr fid="F1">1B</figr>) and from mixed-stage embryos representing the first three days of development (roughly 60% of embryogenesis at 28&#176;C; Figure <figr fid="F1">1C, D</figr>) using TRIzol (Invitrogen), following the manufacturer's protocols. For each RNA sample, approximately 5 &#956;g of cDNA was prepared using the SMART cDNA library construction kit (Clontech, CA, USA). The cDNA was normalized using Evrogen's Trimmer-Direct cDNA Normalization kit (Evrogen, Moscow, Russia), and subsequently digested with SfiI to partially remove the SMART adapters. The size distributions of total RNA and cDNA were assessed on 1.0% agarose gels following each step of the protocol.</p>
<p>To prepare cDNA for sequencing on the GS-FLX Titanium platform, we followed a modified version of the SMART cDNA protocol <abbrgrp>
<abbr bid="B65">65</abbr>
</abbrgrp> that has been optimized for cDNA quality and yield from small quantities of total RNA. A helpful guide that formed the initial basis for the optimization of this protocol was once available online from Evrogen, but has since been removed. At the time these libraries were prepared, Roche had not yet provided a specific protocol for cDNA library preparation for 454 pyrosequencing. Subsequently, the company has released a cDNA protocol that requires approximately 500 ng of purified mRNA (typically requiring isolation of 10 to 50 &#956;g of total RNA). While useful for larger tissue samples, the Roche cDNA preparation protocol is difficult to apply to samples in which RNA quantity is limiting, as is the case with many non-model organisms. The protocol we present here does not require the loss-prone step of mRNA purification, and we have found that it produces sufficient quantities of high-quality cDNA when 5 &#956;l of the RNA (18S and 28S bands) can be visualized on a 1% agarose gel stained with ethidium bromide. Compared with the original SMART protocol, we have optimized the primers, PCR conditions, and downstream purification steps to maximize the yield of double-stranded cDNA required for 454 pyrosequencing. We initially optimized this protocol for Roche's original 454 library preparation protocol (not specific to cDNA), which required input of double-stranded DNA amounts of 2.5-10 &#956;g (in our experience, typically 10-20 &#956;g prepared cDNA as measured by UV absorbance). However, newer protocols from Roche require only 500 ng double-stranded cDNA, limiting the need for a secondary amplification step, as described here, for samples with highly limiting quantities of total RNA.</p>
<p>After separately isolating total RNA from mature ovaries (Figure <figr fid="F1">1B</figr>) and from each of the first three days of embryogenesis (Figure <figr fid="F1">1C, D</figr>) as described above, each RNA sample was treated with DNAse to remove potential genomic contamination. Equal amounts of each sample were then pooled for use as a template for first strand cDNA synthesis. Due to concerns that the poly(T) primer used in the SMART kit could interfere with pyrosequencing, the 3'-primer used was modified in two ways: (1) the poly(T) was interrupted every fourth base by the inclusion of a cytosine [sensu 30]; and (2) the primer contained an <it>Mme</it>I site which allowed most of the poly(T) to be removed during digestion. This 3'-primer (PD243Mme-30TC, 5'-ATT CTA GAG CGC ACC TTG GCC TCC GAC TTT TCT TTT CTT TTT TTT TCT TTT TTT TTT VN-3') was used during first strand synthesis and for all subsequent amplification steps. Because <it>Mme</it>I also cleaves relatively commonly within eukaryotic genes, it may not always be desirable to use this enzyme for library preparation. As an alternative, we have additionally found that a similar 3' primer containing an <it>Sfi</it>I cleavage site (PD243-30TC, 5'-ATT CTA GAG GCC ACC TTG GCC GAC ATG TTT TCT TTT CTT TTT TTT TCT TTT TTT TTT VN-3') is also effective in producing cDNA that yields high-quality 454 data (data not shown).</p>
<p>For first-strand synthesis, 3 &#956;g of total RNA (in 6 &#956;l) and 2 &#956;l 3' primer (12 &#956;M) were mixed and denatured at 65&#176;C for 5 minutes, then placed on ice. Reverse transcription reactions using SuperScript II (Invitrogen) in the manufacturer's recommended buffer were performed for 50 minutes at 42&#176;C using twice the recommended concentration of enzyme, 1 &#956;l of Protector RNAse inhibitor (Roche) to avoid RNA degradation, 2 &#956;l 5' primer (12 &#956;M), 2 &#956;l 10 mM DTT, and 1 &#956;l 10 mM dNTPs. Template-switching essential for the SMART technique was achieved using a 5' primer (PD242, 5'-AAG CAG TGG TAT CAA CGC AGA GTG GCC ACG AAG GCC rGrGrG-3') with three RNA nucleotides at its 3' end, which contains an <it>Sfi</it>I site. Reactions were then heat-inactivated for 15 minutes at 70&#176;C and diluted 1:5 in milliQ water in preparation for PCR amplification. Contrary to some expectations, SuperScript III reverse transcriptase (Invitrogen) may be substituted in this protocol with equivalent results (data not shown).</p>
<p>To maximize yield during cDNA amplification, the first round of amplification was conducted using a 2:2:1 mix (v:v:v) of Hemo KlenTaq (New England Biolabs), Phusion (New England Biolabs), and PfuTurbo (Stratagene) polymerases. This mixture of enzymes was determined empirically to provide the highest yield of cDNA with a range of input first-strand concentrations. Cesium KlenTaq AC (DNA Polymerase Technologies) and the hot start versions of Phusion and PfuTurbo polymerases in the same ratio may be also substituted at this step without sacrificing yield; this may produce fewer PCR artifacts in the final cDNA preparation. Buffer conditions (MgCl<sub>2 </sub>and DMSO) were also empirically optimized to maximize yield and minimize PCR artifacts. Reactions were performed in 100 &#956;L total volume in 1X Phusion HF buffer, 1.5 &#956;L polymerase mix, 5 &#956;L first-strand cDNA (previously diluted 1:5 in H<sub>2</sub>O), 1 &#956;L 3' primer (PD243Mme-30TC, 12 &#956;M), 1 &#956;L 5' primer (PCRIIA, 5'-AAG CAG TGG TAT CAA CGC AGA GT-3', 12 &#956;M), and a final concentration of 1% DMSO, 1.5 mM MgCl<sub>2 </sub>(in addition to the MgCl<sub>2 </sub>already present in the HF buffer), and 200 &#956;M dNTPs. Reactions were cycled with the following program: 1 minute at 95&#176;C, followed by 16-20 cycles of 30 seconds at 95&#176;C (see below for determining optimal number of cycles), 30 seconds at 66&#176;C, and 3 minutes at 72&#176;C, and a final 10 minutes at 72&#176;C. After cooling to room temperature, 10 &#956;L 3M NaOAc pH 5.5 was added to each 100 &#956;L secondary PCR reaction followed by purification with the QiaQuick PCR purification kit (Qiagen) using the manufacturer's recommended protocol. For all purification steps, samples were eluted with TM buffer (10 mM Tris-HCl pH 8.5, 1 mM MgCl<sub>2</sub>) to prevent strand separation of double-stranded cDNA.</p>
<p>To produce sufficient cDNA for sequencing, Advantage 2 (Clontech) polymerase was used under the manufacturer's recommended conditions during the second round of amplification using the same primer concentrations and 1 &#956;l of undiluted primary PCR product. We recommend testing a range of dilutions of the primary PCR product to obtain the desired quantity of amplified cDNA in 9-10 PCR cycles. In cases of highly limiting RNA concentration, we have also found that a secondary PCR reaction using a 1:1:1 mix of Phusion, Cesium KlenTaq AC, and Deep Vent (exo-) (New England Biolabs) polymerase in ThermoPol reaction buffer supplemented with 1.5 mM MgSO<sub>4 </sub>and 1% DMSO produces the highest yield of secondary PCR product (note that this polymerase mix does not produce optimal results when used for first-round amplification). Secondary PCR reactions were cycled using the same parameters as the primary PCR but running for approximately 10 cycles.</p>
<p>To prevent overcycling during both rounds of PCR amplification, each reaction was prepared in duplicate, and one reaction was spiked with 1 &#956;l of 1:750 SybrGreen I (Invitrogen). The spiked reactions were monitored in real time on an Mx3005P QPCR machine (Stratagene Inc.), and the samples were removed when amplification began to plateau. To increase the representation of double-stranded cDNA, two cycles of "chase PCR" were conducted following each round of cDNA amplification after the optimal number of cycles had been reached. Excess primers were added (1.5 &#956;L of each, 12 &#956;M primer per 100 &#956;L reaction), and each reaction was subjected to two additional non-denaturing cycles of 1 minute at 77&#176;C, 1 minute at 65&#176;C, and 3 minutes at 72&#176;C, followed by a 10 minute extension at 72&#176;C.</p>
<p>Following the second round of amplification and PCR purification, the cDNA samples were double-digested with <it>Sfi</it>I and <it>Mme</it>I (40 and 26 units per 150 &#956;l reaction, respectively). cDNA species &lt;500 bp were then removed using Chroma Spin 400 columns (Clontech) which had been equilibrated with TM buffer following the manufacturer's protocol. It should be noted that the Chroma Spin column protocol suggested in the Clontech SMART cDNA kit is non-optimal, and that following the protocol provided with the separately purchased columns is less labor-intensive and produces a higher yield of size-selected cDNA. Equilibration of Chroma Spin columns is critical for maximizing the yield of double-stranded cDNA as required by the Roche library preparation protocols. Following size selection, cDNA was blunt-ended with the NEB Quick Blunting kit (New England Biolabs) and purified once more with the QiaQuick kit. After each step of cDNA synthesis, the size distribution was checked on 1.0% agarose gels, and the cDNA samples were quantified using a Qubit (Invitrogen), after observing that the NanoDrop 1000 (Thermo Scientific) did not reliably quantify ds-cDNA (C. Dunn, personal communication).</p>
<p>To prepare normalized cDNA for GS-FLX Titanium sequencing, 1 &#956;l of the twice-amplified, purified cDNA sample described above was subjected to Evrogen's DSN-treatment protocol, followed by a single round of further amplification, <it>Sfi</it>I/<it>Mme</it>I digestion, and size selection. Approximately 5 &#956;l of normalized and non-normalized cDNA were synthesized.</p>
</sec>
<sec>
<st>
<p>454 Titanium Pyrosequencing</p>
</st>
<p>For the pilot study using the GS-FLX platform, EnGenCore (University of South Carolina) conducted the final steps of library preparation, including nebulization, adaptor-ligation, and sequencing of each sample (&#188; plate each). For sequencing using the Titanium platform, the samples were nebulized, adaptor-ligated, and pyrosequenced by the Institute for Genome Science and Policy DNA Sequencing Facility (Duke University).</p>
</sec>
<sec>
<st>
<p>Sequence Assembly</p>
</st>
<p>Raw reads were assembled using the cDNA assembly algorithm of Newbler v2.3 (Roche) with default assembly parameters. An adaptor-trimming step was included in the assembly (the "-v" flag), and the "-nosplit" flag was also used to reduce the generation of extremely short contigs that might otherwise have been created. All of the raw reads generated in this study have been submitted to the NCBI Short Read Archive (Study Accession Number: SRP002610.1).</p>
<p>Because redundancy was observed among the singletons generated by Newbler v2.3, the singletons were reassembled using CAP3 <abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp>, with '-z' option set to 1. Prior to this secondary assembly, the singletons were screened for adaptor sequences using both cross_match <abbrgrp>
<abbr bid="B66">66</abbr>
<abbr bid="B67">67</abbr>
<abbr bid="B68">68</abbr>
</abbrgrp> and a custom python script (Casey Dunn, personal communication), We note that Newbler can also be used to produce a .fasta and corresponding .qual files of trimmed reads using the '-tr' option. The final assembly thus consists of three types of sequences: Newbler-assembled sequences, cap3_contigs, and cap3_singlets, all of which were subjected to subsequent analyses.</p>
</sec>
<sec>
<st>
<p>Sequence Annotation</p>
</st>
<p>Sequences were first mapped against the RefSeq Protein database [<abbrgrp>
<abbr bid="B69">69</abbr>
</abbrgrp>, downloaded from <url>ftp://ftp.ncbi.nih.gov/blast/db/</url> on April 27, 2010] using BLASTX. All BLAST searches were conducting using BLAST v2.2.23+ <abbrgrp>
<abbr bid="B70">70</abbr>
</abbrgrp> with an e-value cut-off of 1e-10. We then used Blast2GO v1.2.7 <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp> to retrieve the Gene Ontology (GO) <abbrgrp>
<abbr bid="B71">71</abbr>
</abbrgrp> terms and their parents associated with the top 20 BLAST hits for each sequence. To avoid potentially double-counting sequences that might represent un-assembled portions of the same transcript, a custom python script ("transcriptome_blast_summarizer.py", available at <url>http://www.extavourlab.com/protocols/index.html</url>) was used to identify sequences with identical top BLAST hits prior to GO annotation. If multiple sequences hit non-overlapping portions of the same top BLAST hit, we used the conservative assumption that these sequences represented unassembled portions of the same transcript, and therefore only tallied the GO terms of one of these sequences. However, if multiple sequences hit overlapping portions of the same top BLAST hit, we considered these sequences potential paralogs and retained them all. Thus, the counts of sequences in each GO term only include one sequence per top BLAST hit, unless the multiple sequences mapped to overlapping portions of the same BLAST hit. These counts were used to compare the distribution of sequences among specific GO terms between the transcriptomes of <it>O. fasciatus </it>and the <it>Drosophila melanogaster </it>genome. For this comparison, we used a precomputed GO annotation of the <it>D. melanogaster </it>genome <abbrgrp>
<abbr bid="B72">72</abbr>
</abbrgrp>.</p>
<p>The FASTA formatted transcriptome data set file was examined in TextWrangler (v. 3.1, Bare Bones Software, Inc.). Candidate genes were sought via whole gene names and, where possible, via the gene name abbreviations, while avoiding irrelevant hits. The FASTA header annotation of transcriptome sequences includes the top 20 BLASTx hits to the RefSeq database as described above.</p>
<p>Sequencher (v4.8, Gene Codes Corporation; default settings: minimum 20 bp overlap between sequences, &#8805;85% sequence identity) and CLC Combined Workbench (v5.6.1, CLC Bio) were used to examine whether transcriptome sequences could be further assembled.</p>
</sec>
<sec>
<st>
<p>Estimating sequencing depth</p>
</st>
<p>To estimate how thoroughly our sequencing efforts sampled the <it>O. fasciatus </it>transcriptome, eight progressively larger subsets of the reads were independently assembled. The total number of genes was then identified via BLASTX. For these smaller assemblies, reads from one plate each of normalized and non-normalized reads were combined in random order and sampled without replacement. For each assembly, we BLASTed the longest isotig of each isogroup, and all of the singletons, against the SwissProt database [<abbrgrp>
<abbr bid="B73">73</abbr>
</abbrgrp>, downloaded from <url>ftp://ftp.ncbi.nih.gov/blast/db/</url> on April 21, 2010]. We used the relatively small SwissProt database in order to reduce computation time. However, the absolute values of BLAST hits against this database are likely to be underestimates of those values that would have been obtained from a larger database such as RefSeq or nr. If multiple isotigs or contigs hit non-overlapping portions of the same top BLAST hit, only one of these sequences was counted. However, because frequent cases of identical, unassembled singletons were observed, we counted only one singleton per top BLAST hit, regardless of whether these hits overlapped or not.</p>
<p>We used a custom python script to calculate the ortholog hit ratio. This script, "ortholog_hit_ratio_calculator.py" is available at <url>http://www.extavourlab.com/protocols/index.html</url>).</p>
</sec>
<sec>
<st>
<p>Assessing the importance of cDNA normalization</p>
</st>
<p>To assess the relative contribution of cDNA normalization to the quality of our assembly, the screened, raw reads from both normalized (N) and non-normalized (NN) samples were mapped against the complete assembly of all reads using the BLASTN algorithm <abbrgrp>
<abbr bid="B70">70</abbr>
</abbrgrp> with an e-value cut-off of 1e-4. Based on these results, the Fisher's Exact Test was used to identify over- and under-represented terms in each gene list. This test was performed using Blast2GO (two-tailed, removing double IDs so that only those genes hit uniquely by either N or NN reads were considered). The BLASTN results were also investigated using text searches to find whether certain genes of interest were present in only one of the two cDNA samples.</p>
</sec>
</sec>
<sec>
<st>
<p>Competing interests</p>
</st>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>BEC helped design the research, performed the experiments, collected and analyzed the data, and wrote the manuscript. NS contributed new protocols and helped write the manuscript. KAP helped analyze the data and write the manuscript. YS helped analyze the data and write the manuscript, and obtained funding for the research. SR helped design the research and review the manuscript, and obtained funding for the research. CE proposed the idea for the research, helped design the research and analyze the data, wrote the manuscript and obtained funding for the research. All authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>Thanks to Casey Dunn and Freya Goetz for helpful discussions on cDNA preparation and bioinformatic analysis; Amir Karger, Jiangwen Zhang, and Suvendra Dutta of the Harvard FAS Life Science Computing team for help with bioinformatic analysis; Ana Conesa and Stefan Gotz for assistance with Blast2GO via the Blast2GO mailing list; Joe Jones and Lisa Bukovnik for their administration of the sequencing; Evelyn Schwager, Frederike Alwes, and other members of the Extavour lab for discussions of the results and manuscript. We thank David and Z Behl for the photograph of an <it>Oncopeltus </it>adult (Figure <figr fid="F1">1A</figr>). This work was partially supported by National Science Foundation (NSF) award IOS-0817678 to CE, an NSF Predoctoral Fellowship to BEC, DFG Collaborative Research Center grant 680 "The molecular basis of evolutionary innovations" to KP and SR, and the Wellesley College research fund to YS.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Invertebrate Embryology</p></title><aug><au><snm>Kum&#233;</snm><fnm>M</fnm></au><au><snm>Dan</snm><fnm>K</fnm></au></aug><publisher>Belgrade: Prosveta</publisher><pubdate>1968</pubdate></bibl><bibl id="B2"><title><p>Principles of Comparative Anatomy of Invertebrates: Promorphology</p></title><aug><au><snm>Beklemishev</snm><fnm>WN</fnm></au></aug><publisher>Chicago: University of Chicago Press</publisher><edition>3</edition><pubdate>1969</pubdate><volume>1</volume></bibl><bibl id="B3"><title><p>Principles of Comparative Anatomy of Invertebrates: Organology</p></title><aug><au><snm>Beklemishev</snm><fnm>WN</fnm></au></aug><publisher>Chicago: University of Chicago Press</publisher><edition>3</edition><pubdate>1969</pubdate><volume>2</volume></bibl><bibl id="B4"><title><p>The Role of Models in Science</p></title><aug><au><snm>Rosenblueth</snm><fnm>A</fnm></au><au><snm>Wiener</snm><fnm>N</fnm></au></aug><source>Philosophy of Science</source><pubdate>1945</pubdate><volume>12</volume><issue>4</issue><fpage>316</fpage><lpage>321</lpage><xrefbib><pubid idtype="doi">10.1086/286874</pubid></xrefbib></bibl><bibl id="B5"><title><p>The origin and evolution of model organisms</p></title><aug><au><snm>Hedges</snm><fnm>SB</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2002</pubdate><volume>3</volume><issue>11</issue><fpage>838</fpage><lpage>849</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg929</pubid><pubid idtype="pmpid" link="fulltext">12415314</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>The choice of model organisms in evo-devo</p></title><aug><au><snm>Jenner</snm><fnm>RA</fnm></au><au><snm>Wills</snm><fnm>MA</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2007</pubdate><volume>8</volume><issue>4</issue><fpage>311</fpage><lpage>319</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg2062</pubid><pubid idtype="pmpid" link="fulltext">17339879</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Model systems in developmental biology</p></title><aug><au><snm>Bolker</snm><fnm>JA</fnm></au></aug><source>BioEssays</source><pubdate>1995</pubdate><volume>17</volume><issue>5</issue><fpage>451</fpage><lpage>455</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/bies.950170513</pubid><pubid idtype="pmpid">7786291</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Are we there yet? Tracking the development of new model systems</p></title><aug><au><snm>Abzhanov</snm><fnm>A</fnm></au><au><snm>Extavour</snm><fnm>CG</fnm></au><au><snm>Groover</snm><fnm>A</fnm></au><au><snm>Hodges</snm><fnm>SA</fnm></au><au><snm>Hoekstra</snm><fnm>HE</fnm></au><au><snm>Kramer</snm><fnm>EM</fnm></au><au><snm>Monteiro</snm><fnm>A</fnm></au></aug><source>Trends in Genetics</source><pubdate>2008</pubdate><volume>24</volume><issue>7</issue><fpage>353</fpage><lpage>360</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tig.2008.04.002</pubid><pubid idtype="pmpid" link="fulltext">18514356</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>The future of evo-devo: model systems and evolutionary theory</p></title><aug><au><snm>Sommer</snm><fnm>RJ</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2009</pubdate><volume>10</volume><issue>6</issue><fpage>416</fpage><lpage>422</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">19369972</pubid></xrefbib></bibl><bibl id="B10"><title><p>Emerging Market Organisms</p></title><aug><au><snm>Slack</snm><fnm>JMW</fnm></au></aug><source>Science</source><pubdate>2009</pubdate><volume>323</volume><issue>5922</issue><fpage>1674</fpage><lpage>1675</lpage><xrefbib><pubid idtype="doi">10.1126/science.1171948</pubid></xrefbib></bibl><bibl id="B11"><title><p>Insulated piggyBac vectors for insect transgenesis</p></title><aug><au><snm>Sarkar</snm><fnm>A</fnm></au><au><snm>Atapattu</snm><fnm>A</fnm></au><au><snm>Belikoff</snm><fnm>EJ</fnm></au><au><snm>Heinrich</snm><fnm>JC</fnm></au><au><snm>Li</snm><fnm>X</fnm></au><au><snm>Horn</snm><fnm>C</fnm></au><au><snm>Wimmer</snm><fnm>EA</fnm></au><au><snm>Scott</snm><fnm>MJ</fnm></au></aug><source>BMC Biotechnol</source><pubdate>2006</pubdate><volume>6</volume><fpage>27</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1472-6750-6-27</pubid><pubid idtype="pmcid">1525164</pubid><pubid idtype="pmpid">16776846</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Self-regulatory circuits in dorsoventral axis formation of the short-germ beetle <it>Tribolium castaneum</it></p></title><aug><au><snm>Nunes da Fonseca</snm><fnm>R</fnm></au><au><snm>von Levetzow</snm><fnm>C</fnm></au><au><snm>Kalscheuer</snm><fnm>P</fnm></au><au><snm>Basal</snm><fnm>A</fnm></au><au><snm>van der Zee</snm><fnm>M</fnm></au><au><snm>Roth</snm><fnm>S</fnm></au></aug><source>Dev Cell</source><pubdate>2008</pubdate><volume>14</volume><issue>4</issue><fpage>605</fpage><lpage>615</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.devcel.2008.02.011</pubid><pubid idtype="pmpid" link="fulltext">18410735</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals</p></title><aug><au><snm>Grimson</snm><fnm>A</fnm></au><au><snm>Srivastava</snm><fnm>M</fnm></au><au><snm>Fahey</snm><fnm>B</fnm></au><au><snm>Woodcroft</snm><fnm>BJ</fnm></au><au><snm>Chiang</snm><fnm>HR</fnm></au><au><snm>King</snm><fnm>N</fnm></au><au><snm>Degnan</snm><fnm>BM</fnm></au><au><snm>Rokhsar</snm><fnm>DS</fnm></au><au><snm>Bartel</snm><fnm>DP</fnm></au></aug><source>Nature</source><pubdate>2008</pubdate><volume>455</volume><issue>7217</issue><fpage>1193</fpage><lpage>1197</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature07415</pubid><pubid idtype="pmpid" link="fulltext">18830242</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Origins and evolution of eukaryotic RNA interference</p></title><aug><au><snm>Shabalina</snm><fnm>SA</fnm></au><au><snm>Koonin</snm><fnm>EV</fnm></au></aug><source>Trends Ecol Evol (Amst)</source><pubdate>2008</pubdate><volume>23</volume><issue>10</issue><fpage>578</fpage><lpage>587</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tree.2008.06.005</pubid><pubid idtype="pmcid">2695246</pubid><pubid idtype="pmpid">18715673</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>A wing expressed sequence tag resource for <it>Bicyclus anynana </it>butterflies, an evo-devo model</p></title><aug><au><snm>Beldade</snm><fnm>P</fnm></au><au><snm>Rudd</snm><fnm>S</fnm></au><au><snm>Gruber</snm><fnm>JD</fnm></au><au><snm>Long</snm><fnm>AD</fnm></au></aug><source>BMC Genomics</source><pubdate>2006</pubdate><volume>7</volume><fpage>130</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-7-130</pubid><pubid idtype="pmcid">1534037</pubid><pubid idtype="pmpid">16737530</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>A cricket Gene Index: a genomic resource for studying neurobiology, speciation, and molecular evolution</p></title><aug><au><snm>Danley</snm><fnm>PD</fnm></au><au><snm>Mullen</snm><fnm>SP</fnm></au><au><snm>Liu</snm><fnm>F</fnm></au><au><snm>Nene</snm><fnm>V</fnm></au><au><snm>Quackenbush</snm><fnm>J</fnm></au><au><snm>Shaw</snm><fnm>KL</fnm></au></aug><source>BMC Genomics</source><pubdate>2007</pubdate><volume>8</volume><fpage>109</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-8-109</pubid><pubid idtype="pmcid">1878485</pubid><pubid idtype="pmpid">17459168</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Anteroposterior patterning in hemichordates and the origins of the chordate nervous system</p></title><aug><au><snm>Lowe</snm><fnm>CJ</fnm></au><au><snm>Wu</snm><fnm>M</fnm></au><au><snm>Salic</snm><fnm>A</fnm></au><au><snm>Evans</snm><fnm>L</fnm></au><au><snm>Lander</snm><fnm>E</fnm></au><au><snm>Stange-Thomann</snm><fnm>N</fnm></au><au><snm>Gruber</snm><fnm>CE</fnm></au><au><snm>Gerhart</snm><fnm>J</fnm></au><au><snm>Kirschner</snm><fnm>M</fnm></au></aug><source>Cell</source><pubdate>2003</pubdate><volume>113</volume><issue>7</issue><fpage>853</fpage><lpage>865</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0092-8674(03)00469-0</pubid><pubid idtype="pmpid" link="fulltext">12837244</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Transcriptome sequencing and comparative transcriptome analysis of the scleroglucan producer <it>Sclerotium rolfsii</it></p></title><aug><au><snm>Schmid</snm><fnm>J</fnm></au><au><snm>M&#252;ller-Hagen</snm><fnm>D</fnm></au><au><snm>Bekel</snm><fnm>T</fnm></au><au><snm>Funk</snm><fnm>L</fnm></au><au><snm>Stahl</snm><fnm>U</fnm></au><au><snm>Sieber</snm><fnm>V</fnm></au><au><snm>Meyer</snm><fnm>V</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>329</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-329</pubid><pubid idtype="pmcid">2887420</pubid><pubid idtype="pmpid">20504312</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Uncovering the evolutionary origin of plant molecular processes: comparison of Coleochaete (Coleochaetales) and Spirogyra (Zygnematales) transcriptomes</p></title><aug><au><snm>Timme</snm><fnm>RE</fnm></au><au><snm>Delwiche</snm><fnm>CF</fnm></au></aug><source>BMC Plant Biol</source><pubdate>2010</pubdate><volume>10</volume><fpage>96</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2229-10-96</pubid><pubid idtype="pmcid">2890016</pubid><pubid idtype="pmpid">20500869</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Population-level transcriptome sequencing of nonmodel organisms <it>Erynnis </it><it>propertius </it>and <it>Papilio zelicaon</it></p></title><aug><au><snm>O&apos;Neil</snm><fnm>ST</fnm></au><au><snm>Dzurisin</snm><fnm>JDK</fnm></au><au><snm>Carmichael</snm><fnm>RD</fnm></au><au><snm>Lobo</snm><fnm>NF</fnm></au><au><snm>Emrich</snm><fnm>SJ</fnm></au><au><snm>Hellmann</snm><fnm>JJ</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>310</fpage><xrefbib><pubidlist><pubid idtype="pmcid">2887415</pubid><pubid idtype="pmpid">20478048</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Massively parallel pyrosequencing-based transcriptome analyses of small brown planthopper (<it>Laodelphax striatellus</it>), a vector insect transmitting rice stripe virus (RSV)</p></title><aug><au><snm>Zhang</snm><fnm>F</fnm></au><au><snm>Guo</snm><fnm>H</fnm></au><au><snm>Zheng</snm><fnm>H</fnm></au><au><snm>Zhou</snm><fnm>T</fnm></au><au><snm>Zhou</snm><fnm>Y</fnm></au><au><snm>Wang</snm><fnm>S</fnm></au><au><snm>Fang</snm><fnm>R</fnm></au><au><snm>Qian</snm><fnm>W</fnm></au><au><snm>Chen</snm><fnm>X</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>303</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-303</pubid><pubid idtype="pmcid">2885366</pubid><pubid idtype="pmpid">20462456</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Gene expression in proliferating cells of the dinoflagellate <it>Alexandrium catenella </it>(Dinophyceae)</p></title><aug><au><snm>Toulza</snm><fnm>E</fnm></au><au><snm>Shin</snm><fnm>MS</fnm></au><au><snm>Blanc</snm><fnm>G</fnm></au><au><snm>Audic</snm><fnm>S</fnm></au><au><snm>Laabir</snm><fnm>M</fnm></au><au><snm>Collos</snm><fnm>Y</fnm></au><au><snm>Claverie</snm><fnm>JM</fnm></au><au><snm>Grzebyk</snm><fnm>D</fnm></au></aug><source>Appl Environ Microbiol</source><pubdate>2010</pubdate><volume>76</volume><issue>13</issue><fpage>4521</fpage><lpage>4529</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/AEM.02345-09</pubid><pubid idtype="pmcid">2897438</pubid><pubid idtype="pmpid">20435767</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Transcriptome sequencing and development of an expression microarray platform for the domestic ferret</p></title><aug><au><snm>Bruder</snm><fnm>CE</fnm></au><au><snm>Yao</snm><fnm>S</fnm></au><au><snm>Larson</snm><fnm>F</fnm></au><au><snm>Camp</snm><fnm>JV</fnm></au><au><snm>Tapp</snm><fnm>R</fnm></au><au><snm>McBrayer</snm><fnm>A</fnm></au><au><snm>Powers</snm><fnm>N</fnm></au><au><snm>Granda</snm><fnm>WV</fnm></au><au><snm>Jonsson</snm><fnm>CB</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>251</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-251</pubid><pubid idtype="pmcid">2873475</pubid><pubid idtype="pmpid">20403183</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery</p></title><aug><au><snm>Parchman</snm><fnm>TL</fnm></au><au><snm>Geist</snm><fnm>KS</fnm></au><au><snm>Grahnen</snm><fnm>JA</fnm></au><au><snm>Benkman</snm><fnm>CW</fnm></au><au><snm>Buerkle</snm><fnm>CA</fnm></au></aug><source>BMC genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>180</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-180</pubid><pubid idtype="pmcid">2851599</pubid><pubid idtype="pmpid">20233449</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>A 454 sequencing approach for large scale phylogenomic analysis of the common emperor scorpion (<it>Pandinus imperator</it>)</p></title><aug><au><snm>Roeding</snm><fnm>F</fnm></au><au><snm>Borner</snm><fnm>J</fnm></au><au><snm>Kube</snm><fnm>M</fnm></au><au><snm>Klages</snm><fnm>S</fnm></au><au><snm>Reinhardt</snm><fnm>R</fnm></au><au><snm>Burmester</snm><fnm>T</fnm></au></aug><source>Mol Phylogenet Evol</source><pubdate>2009</pubdate><volume>53</volume><issue>3</issue><fpage>826</fpage><lpage>834</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ympev.2009.08.014</pubid><pubid idtype="pmpid" link="fulltext">19695333</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly <it>Sarcophaga crassipalpis</it></p></title><aug><au><snm>Hahn</snm><fnm>DA</fnm></au><au><snm>Ragland</snm><fnm>GJ</fnm></au><au><snm>Shoemaker</snm><fnm>DD</fnm></au><au><snm>Denlinger</snm><fnm>DL</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>234</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-234</pubid><pubid idtype="pmcid">2700817</pubid><pubid idtype="pmpid">19454017</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Combining next-generation pyrosequencing with microarray for large scale expression analysis in non-model species</p></title><aug><au><snm>Bellin</snm><fnm>D</fnm></au><au><snm>Ferrarini</snm><fnm>A</fnm></au><au><snm>Chimento</snm><fnm>A</fnm></au><au><snm>Kaiser</snm><fnm>O</fnm></au><au><snm>Levenkova</snm><fnm>N</fnm></au><au><snm>Bouffard</snm><fnm>P</fnm></au><au><snm>Delledonne</snm><fnm>M</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>555</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-555</pubid><pubid idtype="pmcid">2790472</pubid><pubid idtype="pmpid">19930683</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Characterization of the <it>Zoarces viviparus </it>liver transcriptome using massively parallel pyrosequencing</p></title><aug><au><snm>Kristiansson</snm><fnm>E</fnm></au><au><snm>Asker</snm><fnm>N</fnm></au><au><snm>F&#246;rlin</snm><fnm>L</fnm></au><au><snm>Larsson</snm><fnm>DGJ</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>345</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-345</pubid><pubid idtype="pmcid">2725146</pubid><pubid idtype="pmpid">19646242</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Pyrosequencing of the midgut transcriptome of the poplar leaf beetle <it>Chrysomela tremulae </it>reveals new gene families in Coleoptera</p></title><aug><au><snm>Pauchet</snm><fnm>Y</fnm></au><au><snm>Wilkinson</snm><fnm>P</fnm></au><au><snm>van Munster</snm><fnm>M</fnm></au><au><snm>Augustin</snm><fnm>S</fnm></au><au><snm>Pauron</snm><fnm>D</fnm></au><au><snm>ffrench-Constant</snm><fnm>RH</fnm></au></aug><source>Insect Biochem Mol Biol</source><pubdate>2009</pubdate><volume>39</volume><issue>5-6</issue><fpage>403</fpage><lpage>413</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ibmb.2009.04.001</pubid><pubid idtype="pmpid" link="fulltext">19364528</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx</p></title><aug><au><snm>Meyer</snm><fnm>E</fnm></au><au><snm>Aglyamova</snm><fnm>GV</fnm></au><au><snm>Wang</snm><fnm>S</fnm></au><au><snm>Buchanan-Carter</snm><fnm>J</fnm></au><au><snm>Abrego</snm><fnm>D</fnm></au><au><snm>Colbourne</snm><fnm>JK</fnm></au><au><snm>Willis</snm><fnm>BL</fnm></au><au><snm>Matz</snm><fnm>MV</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>219</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-219</pubid><pubid idtype="pmcid">2689275</pubid><pubid idtype="pmpid">19435504</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Construction of a robust microarray from a non-model species (largemouth bass) using pyrosequencing technology</p></title><aug><au><snm>Garcia-Reyero</snm><fnm>N</fnm></au><au><snm>Griffitt</snm><fnm>RJ</fnm></au><au><snm>Liu</snm><fnm>L</fnm></au><au><snm>Kroll</snm><fnm>KJ</fnm></au><au><snm>Farmerie</snm><fnm>WG</fnm></au><au><snm>Barber</snm><fnm>DS</fnm></au><au><snm>Denslow</snm><fnm>ND</fnm></au></aug><source>J Fish Biol</source><pubdate>2008</pubdate><volume>72</volume><issue>9</issue><fpage>2354</fpage><lpage>2376</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1095-8649.2008.01904.x</pubid><pubid idtype="pmcid">2779536</pubid><pubid idtype="pmpid">19936325</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing</p></title><aug><au><snm>Vera</snm><fnm>JC</fnm></au><au><snm>Wheat</snm><fnm>CW</fnm></au><au><snm>Fescemyer</snm><fnm>HW</fnm></au><au><snm>Frilander</snm><fnm>MJ</fnm></au><au><snm>Crawford</snm><fnm>DL</fnm></au><au><snm>Hanski</snm><fnm>I</fnm></au><au><snm>Marden</snm><fnm>JH</fnm></au></aug><source>Mol Ecol</source><pubdate>2008</pubdate><volume>17</volume><issue>7</issue><fpage>1636</fpage><lpage>1647</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-294X.2008.03666.x</pubid><pubid idtype="pmpid" link="fulltext">18266620</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>High-throughput gene and SNP discovery in <it>Eucalyptus grandis</it>, an uncharacterized genome</p></title><aug><au><snm>Novaes</snm><fnm>E</fnm></au><au><snm>Drost</snm><fnm>DR</fnm></au><au><snm>Farmerie</snm><fnm>WG</fnm></au><au><snm>Pappas</snm><fnm>GJ</fnm><suf>Jr</suf></au><au><snm>Grattapaglia</snm><fnm>D</fnm></au><au><snm>Sederoff</snm><fnm>RR</fnm></au><au><snm>Kirst</snm><fnm>M</fnm></au></aug><source>BMC Genomics</source><pubdate>2008</pubdate><volume>9</volume><fpage>312</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-9-312</pubid><pubid idtype="pmcid">2483731</pubid><pubid idtype="pmpid">18590545</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Analysis of the <it>Pythium ultimum </it>transcriptome using Sanger and Pyrosequencing approaches</p></title><aug><au><snm>Cheung</snm><fnm>F</fnm></au><au><snm>Win</snm><fnm>J</fnm></au><au><snm>Lang</snm><fnm>JM</fnm></au><au><snm>Hamilton</snm><fnm>J</fnm></au><au><snm>Vuong</snm><fnm>H</fnm></au><au><snm>Leach</snm><fnm>JE</fnm></au><au><snm>Kamoun</snm><fnm>S</fnm></au><au><snm>Andr&#233; L&#233;vesque</snm><fnm>C</fnm></au><au><snm>Tisserat</snm><fnm>N</fnm></au><au><snm>Buell</snm><fnm>CR</fnm></au></aug><source>BMC Genomics</source><pubdate>2008</pubdate><volume>9</volume><fpage>542</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-9-542</pubid><pubid idtype="pmcid">2612028</pubid><pubid idtype="pmpid">19014603</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>Next generation transcriptomes for next generation genomes using est2assembly</p></title><aug><au><snm>Papanicolaou</snm><fnm>A</fnm></au><au><snm>Stierli</snm><fnm>R</fnm></au><au><snm>Ffrench-Constant</snm><fnm>RH</fnm></au><au><snm>Heckel</snm><fnm>DG</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2009</pubdate><volume>10</volume><fpage>447</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-10-447</pubid><pubid idtype="pmpid" link="fulltext">20034392</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>FlyBase: enhancing <it>Drosophila </it>Gene Ontology annotations</p></title><aug><au><snm>Tweedie</snm><fnm>S</fnm></au><au><snm>Ashburner</snm><fnm>M</fnm></au><au><snm>Falls</snm><fnm>K</fnm></au><au><snm>Leyland</snm><fnm>P</fnm></au><au><snm>McQuilton</snm><fnm>P</fnm></au><au><snm>Marygold</snm><fnm>S</fnm></au><au><snm>Millburn</snm><fnm>G</fnm></au><au><snm>Osumi-Sutherland</snm><fnm>D</fnm></au><au><snm>Schroeder</snm><fnm>A</fnm></au><au><snm>Seal</snm><fnm>R</fnm></au><au><snm>Zhang</snm><fnm>H</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><issue>Database issue</issue><fpage>D555</fpage><lpage>559</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn788</pubid><pubid idtype="pmcid">2686450</pubid><pubid idtype="pmpid">18948289</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>VectorBase: a data resource for invertebrate vector genomics</p></title><aug><au><snm>Lawson</snm><fnm>D</fnm></au><au><snm>Arensburger</snm><fnm>P</fnm></au><au><snm>Atkinson</snm><fnm>P</fnm></au><au><snm>Besansky</snm><fnm>NJ</fnm></au><au><snm>Bruggner</snm><fnm>RV</fnm></au><au><snm>Butler</snm><fnm>R</fnm></au><au><snm>Campbell</snm><fnm>KS</fnm></au><au><snm>Christophides</snm><fnm>GK</fnm></au><au><snm>Christley</snm><fnm>S</fnm></au><au><snm>Dialynas</snm><fnm>E</fnm></au><au><snm>Hammond</snm><fnm>M</fnm></au><au><snm>Hill</snm><fnm>CA</fnm></au><au><snm>Konopinski</snm><fnm>N</fnm></au><au><snm>Lobo</snm><fnm>NF</fnm></au><au><snm>MacCallum</snm><fnm>RM</fnm></au><au><snm>Madey</snm><fnm>G</fnm></au><au><snm>Megy</snm><fnm>K</fnm></au><au><snm>Meyer</snm><fnm>J</fnm></au><au><snm>Redmond</snm><fnm>S</fnm></au><au><snm>Severson</snm><fnm>DW</fnm></au><au><snm>Stinson</snm><fnm>EO</fnm></au><au><snm>Topalis</snm><fnm>P</fnm></au><au><snm>Birney</snm><fnm>E</fnm></au><au><snm>Gelbart</snm><fnm>WM</fnm></au><au><snm>Kafatos</snm><fnm>FC</fnm></au><au><snm>Louis</snm><fnm>C</fnm></au><au><snm>Collins</snm><fnm>FH</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><issue>Database issue</issue><fpage>D583</fpage><lpage>587</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn857</pubid><pubid idtype="pmcid">2686483</pubid><pubid idtype="pmpid">19028744</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Genome sequence of the pea aphid <it>Acyrthosiphon pisum</it></p></title><aug><au><snm>Consortium</snm><fnm>IAG</fnm></au></aug><source>PLoS Biology</source><pubdate>2010</pubdate><volume>8</volume><issue>2</issue><fpage>e1000313</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pbio.1000313</pubid><pubid idtype="pmcid">2826372</pubid><pubid idtype="pmpid">20186266</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>The <it>Rhodnius </it>Genome Project: The promises and challenges it affords in our understanding of reduviid biology and their role in Chagas' transmission</p></title><aug><au><snm>Huebner</snm><fnm>E</fnm></au></aug><source>Comparative Biochemistry and Physiology, Part A</source><pubdate>2007</pubdate><volume>148</volume><fpage>S130</fpage><xrefbib><pubid idtype="doi">10.1016/j.cbpa.2007.06.325</pubid></xrefbib></bibl><bibl id="B40"><title><p>Dissection and fixation of large milkweed bug (<it>Oncopeltus</it>) embryos</p></title><aug><au><snm>Liu</snm><fnm>P</fnm></au><au><snm>Kaufman</snm><fnm>TC</fnm></au></aug><source>CSH Protocols</source><pubdate>2009</pubdate><volume>2009</volume><issue>8</issue><note>pdb.prot5261</note></bibl><bibl id="B41"><title><p>In situ hybridization of large milkweed bug (<it>Oncopeltus</it>) tissues</p></title><aug><au><snm>Liu</snm><fnm>P</fnm></au><au><snm>Kaufman</snm><fnm>TC</fnm></au></aug><source>CSH Protocols</source><pubdate>2009</pubdate><volume>2009</volume><issue>8</issue><note>pdb.prot5262</note></bibl><bibl id="B42"><title><p>Morphology and husbandry of the large milkweed bug, <it>Oncopeltus fasciatus</it></p></title><aug><au><snm>Liu</snm><fnm>P</fnm></au><au><snm>Kaufman</snm><fnm>TC</fnm></au></aug><source>CSH Protocols</source><pubdate>2009</pubdate><volume>2009</volume><issue>8</issue><note>pdb.emo127</note></bibl><bibl id="B43"><title><p>The hormonal control of the development of hairs and bristles in the milkweed bug, <it>Oncopeltus fasciatus</it>, Dall</p></title><aug><au><snm>Lawrence</snm><fnm>PA</fnm></au></aug><source>Journal of Experimental Biology</source><pubdate>1966</pubdate><volume>44</volume><issue>3</issue><fpage>507</fpage><lpage>522</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">5962706</pubid></xrefbib></bibl><bibl id="B44"><title><p>Embryology of the Milkweed Bug, <it>Oncopeltus fasciatus </it>(Hemiptera)</p></title><aug><au><snm>Butt</snm><fnm>FH</fnm></au></aug><source>Cornell Experiment Station Memoir</source><pubdate>1949</pubdate><volume>283</volume><fpage>2</fpage><lpage>43</lpage></bibl><bibl id="B45"><title><p>Some new mutants of the large milkweed bug <it>Oncopeltus fasciatus </it>Dall</p></title><aug><au><snm>Lawrence</snm><fnm>PA</fnm></au></aug><source>Genetical Research Cambridge</source><pubdate>1970</pubdate><volume>15</volume><fpage>347</fpage><lpage>350</lpage><xrefbib><pubid idtype="doi">10.1017/S0016672300001713</pubid></xrefbib></bibl><bibl id="B46"><title><p>RNAi analysis of <it>Deformed</it>, <it>proboscipedia </it>and <it>Sex combs reduced </it>in the milkweed bug <it>Oncopeltus fasciatus</it>: novel roles for Hox genes in the hemipteran head</p></title><aug><au><snm>Hughes</snm><fnm>CL</fnm></au><au><snm>Kaufman</snm><fnm>TC</fnm></au></aug><source>Development</source><pubdate>2000</pubdate><volume>127</volume><issue>17</issue><fpage>3683</fpage><lpage>3694</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">10934013</pubid></xrefbib></bibl><bibl id="B47"><title><p>Late extraembryonic morphogenesis and its <it>zen</it>(RNAi)-induced failure in the milkweed bug <it>Oncopeltus fasciatus</it></p></title><aug><au><snm>Panfilio</snm><fnm>KA</fnm></au></aug><source>Dev Biol</source><pubdate>2009</pubdate><volume>333</volume><issue>2</issue><fpage>297</fpage><lpage>311</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ydbio.2009.06.036</pubid><pubid idtype="pmpid" link="fulltext">19580800</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>CAP3: A DNA sequence assembly program</p></title><aug><au><snm>Huang</snm><fnm>X</fnm></au><au><snm>Madan</snm><fnm>A</fnm></au></aug><source>Genome Res</source><pubdate>1999</pubdate><volume>9</volume><issue>9</issue><fpage>868</fpage><lpage>877</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.9.9.868</pubid><pubid idtype="pmcid">310812</pubid><pubid idtype="pmpid">10508846</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>Quality scores and SNP detection in sequencing-by-synthesis systems</p></title><aug><au><snm>Brockman</snm><fnm>W</fnm></au><au><snm>Alvarez</snm><fnm>P</fnm></au><au><snm>Young</snm><fnm>S</fnm></au><au><snm>Garber</snm><fnm>M</fnm></au><au><snm>Giannoukos</snm><fnm>G</fnm></au><au><snm>Lee</snm><fnm>WL</fnm></au><au><snm>Russ</snm><fnm>C</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Nusbaum</snm><fnm>C</fnm></au><au><snm>Jaffe</snm><fnm>DB</fnm></au></aug><source>Genome Res</source><pubdate>2008</pubdate><volume>18</volume><issue>5</issue><fpage>763</fpage><lpage>770</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.070227.107</pubid><pubid idtype="pmcid">2336812</pubid><pubid idtype="pmpid">18212088</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>Quantification of insect genome divergence</p></title><aug><au><snm>Zdobnov</snm><fnm>EM</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au></aug><source>Trends Genet</source><pubdate>2007</pubdate><volume>23</volume><issue>1</issue><fpage>16</fpage><lpage>20</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tig.2006.10.004</pubid><pubid idtype="pmpid" link="fulltext">17097187</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>Optimization of <it>de novo </it>transcriptome assembly from next-generation sequencing data</p></title><aug><au><snm>Surget-Groba</snm><fnm>Y</fnm></au><au><snm>Montoya-Burgos</snm><fnm>JI</fnm></au></aug><source>Genome Res</source><pubdate>2010</pubdate><volume>20</volume><issue>10</issue><fpage>1432</fpage><lpage>1440</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.103846.109</pubid><pubid idtype="pmpid" link="fulltext">20693479</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>Determination of anteroposterior polarity in <it>Drosophila</it></p></title><aug><au><snm>N&#252;sslein-Volhard</snm><fnm>C</fnm></au><au><snm>Frohnh&#246;fer</snm><fnm>HG</fnm></au><au><snm>Lehmann</snm><fnm>R</fnm></au></aug><source>Science</source><pubdate>1987</pubdate><volume>238</volume><fpage>1675</fpage><lpage>1681</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">3686007</pubid></xrefbib></bibl><bibl id="B53"><title><p>The molecular machinery of germ line specification</p></title><aug><au><snm>Ewen-Campen</snm><fnm>B</fnm></au><au><snm>Schwager</snm><fnm>EE</fnm></au><au><snm>Extavour</snm><fnm>CG</fnm></au></aug><source>Mol Reprod Dev</source><pubdate>2010</pubdate><volume>77</volume><issue>1</issue><fpage>3</fpage><lpage>18</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/mrd.21091</pubid><pubid idtype="pmpid" link="fulltext">19790240</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research</p></title><aug><au><snm>Conesa</snm><fnm>A</fnm></au><au><snm>G&#246;tz</snm><fnm>S</fnm></au><au><snm>Garcia-Gomez</snm><fnm>JM</fnm></au><au><snm>Terol</snm><fnm>J</fnm></au><au><snm>Talon</snm><fnm>M</fnm></au><au><snm>Robles</snm><fnm>M</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><issue>18</issue><fpage>3674</fpage><lpage>3676</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti610</pubid><pubid idtype="pmpid" link="fulltext">16081474</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>The construction of an EST database for <it>Bombyx mori </it>and its application</p></title><aug><au><snm>Mita</snm><fnm>K</fnm></au><au><snm>Morimyo</snm><fnm>M</fnm></au><au><snm>Okano</snm><fnm>K</fnm></au><au><snm>Koike</snm><fnm>Y</fnm></au><au><snm>Nohata</snm><fnm>J</fnm></au><au><snm>Kawasaki</snm><fnm>H</fnm></au><au><snm>Kadono-Okuda</snm><fnm>K</fnm></au><au><snm>Yamamoto</snm><fnm>K</fnm></au><au><snm>Suzuki</snm><fnm>MG</fnm></au><au><snm>Shimada</snm><fnm>T</fnm></au><au><snm>Goldsmith</snm><fnm>MR</fnm></au><au><snm>Maeda</snm><fnm>S</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2003</pubdate><volume>100</volume><issue>24</issue><fpage>14121</fpage><lpage>14126</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.2234984100</pubid><pubid idtype="pmcid">283556</pubid><pubid idtype="pmpid">14614147</pubid></pubidlist></xrefbib></bibl><bibl id="B56"><title><p>Normalization of full-length enriched cDNA</p></title><aug><au><snm>Bogdanova</snm><fnm>EA</fnm></au><au><snm>Shagin</snm><fnm>DA</fnm></au><au><snm>Lukyanov</snm><fnm>SA</fnm></au></aug><source>Mol Biosyst</source><pubdate>2008</pubdate><volume>4</volume><issue>3</issue><fpage>205</fpage><lpage>212</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1039/b715110c</pubid><pubid idtype="pmpid" link="fulltext">18437263</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>A scaling normalization method for differential expression analysis of RNA-seq data</p></title><aug><au><snm>Robinson</snm><fnm>MD</fnm></au><au><snm>Oshlack</snm><fnm>A</fnm></au></aug><source>Genome Biol</source><pubdate>2010</pubdate><volume>11</volume><issue>3</issue><fpage>R25</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2010-11-3-r25</pubid><pubid idtype="pmcid">2864565</pubid><pubid idtype="pmpid">20196867</pubid></pubidlist></xrefbib></bibl><bibl id="B58"><title><p>Functional analyses in the hemipteran <it>Oncopeltus </it><it>fasciatus </it>reveal conserved and derived aspects of appendage patterning in insects</p></title><aug><au><snm>Angelini</snm><fnm>DR</fnm></au><au><snm>Kaufman</snm><fnm>TC</fnm></au></aug><source>Dev Biol</source><pubdate>2004</pubdate><volume>271</volume><issue>2</issue><fpage>306</fpage><lpage>321</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ydbio.2004.04.005</pubid><pubid idtype="pmpid" link="fulltext">15223336</pubid></pubidlist></xrefbib></bibl><bibl id="B59"><title><p>The nuclear receptor E75A has a novel pair-rule-like function in patterning the milkweed bug, <it>Oncopeltus fasciatus</it></p></title><aug><au><snm>Erezyilmaz</snm><fnm>D</fnm></au><au><snm>Kelstrup</snm><fnm>H</fnm></au><au><snm>Riddiford</snm><fnm>L</fnm></au></aug><source>Dev Biol</source><pubdate>2009</pubdate><volume>334</volume><issue>1</issue><fpage>300</fpage><lpage>310</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ydbio.2009.06.038</pubid><pubid idtype="pmcid">2749522</pubid><pubid idtype="pmpid">19580803</pubid></pubidlist></xrefbib></bibl><bibl id="B60"><title><p>The role of the pupal determinant <it>broad </it>during embryonic development of a direct-developing insect</p></title><aug><au><snm>Erezyilmaz</snm><fnm>DF</fnm></au><au><snm>Rynerson</snm><fnm>MR</fnm></au><au><snm>Truman</snm><fnm>JW</fnm></au><au><snm>Riddiford</snm><fnm>LM</fnm></au></aug><source>Dev Genes Evol</source><pubdate>2009</pubdate><volume>219</volume><issue>11-12</issue><fpage>535</fpage><lpage>544</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s00427-009-0315-7</pubid><pubid idtype="pmcid">2884998</pubid><pubid idtype="pmpid" link="fulltext">20127251</pubid></pubidlist></xrefbib></bibl><bibl id="B61"><title><p>Key roles of the Broad-Complex gene in insect embryogenesis</p></title><aug><au><snm>Piulachs</snm><fnm>MD</fnm></au><au><snm>Pagone</snm><fnm>V</fnm></au><au><snm>Belles</snm><fnm>X</fnm></au></aug><source>Insect Biochem Mol Biol</source><pubdate>2010</pubdate><volume>40</volume><issue>6</issue><fpage>468</fpage><lpage>475</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ibmb.2010.04.006</pubid><pubid idtype="pmpid" link="fulltext">20403438</pubid></pubidlist></xrefbib></bibl><bibl id="B62"><title><p>Precocene-induced effects and possible role of juvenile hormone during embryogenesis of the milkweed bug <it>Oncopeltus fasciatus</it></p></title><aug><au><snm>Dorn</snm><fnm>A</fnm></au></aug><source>Gen Comp Endocrinol</source><pubdate>1982</pubdate><volume>46</volume><issue>1</issue><fpage>42</fpage><lpage>52</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0016-6480(82)90161-7</pubid><pubid idtype="pmpid">7060934</pubid></pubidlist></xrefbib></bibl><bibl id="B63"><title><p>Embryonic expression of juvenile hormone binding protein and its relationship to the toxic effects of juvenile hormone in <it>Manduca sexta</it></p></title><aug><au><snm>Orth</snm><fnm>AP</fnm></au><au><snm>Tauchman</snm><fnm>SJ</fnm></au><au><snm>Doll</snm><fnm>SC</fnm></au><au><snm>Goodman</snm><fnm>WG</fnm></au></aug><source>Insect Biochem Mol Biol</source><pubdate>2003</pubdate><volume>33</volume><issue>12</issue><fpage>1275</fpage><lpage>1284</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ibmb.2003.06.002</pubid><pubid idtype="pmpid" link="fulltext">14599499</pubid></pubidlist></xrefbib></bibl><bibl id="B64"><title><p>Comparing <it>de novo </it>assemblers for 454 transcriptome data</p></title><aug><au><snm>Kumar</snm><fnm>S</fnm></au><au><snm>Blaxter</snm><fnm>ML</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>571</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-571</pubid><pubid idtype="pmpid" link="fulltext">20950480</pubid></pubidlist></xrefbib></bibl><bibl id="B65"><title><p>Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction</p></title><aug><au><snm>Zhu</snm><fnm>YY</fnm></au><au><snm>Machleder</snm><fnm>EM</fnm></au><au><snm>Chenchik</snm><fnm>A</fnm></au><au><snm>Li</snm><fnm>R</fnm></au><au><snm>Siebert</snm><fnm>PD</fnm></au></aug><source>BioTechniques</source><pubdate>2001</pubdate><volume>30</volume><issue>4</issue><fpage>892</fpage><lpage>897</lpage><xrefbib><pubid idtype="pmpid">11314272</pubid></xrefbib></bibl><bibl id="B66"><title><p>Consed: a graphical tool for sequence finishing</p></title><aug><au><snm>Gordon</snm><fnm>D</fnm></au><au><snm>Abajian</snm><fnm>C</fnm></au><au><snm>Green</snm><fnm>P</fnm></au></aug><source>Genome Res</source><pubdate>1998</pubdate><volume>8</volume><issue>3</issue><fpage>195</fpage><lpage>202</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9521923</pubid></xrefbib></bibl><bibl id="B67"><title><p>Base-calling of automated sequencer traces using phred. I. Accuracy assessment</p></title><aug><au><snm>Ewing</snm><fnm>B</fnm></au><au><snm>Hillier</snm><fnm>L</fnm></au><au><snm>Wendl</snm><fnm>MC</fnm></au><au><snm>Green</snm><fnm>P</fnm></au></aug><source>Genome Res</source><pubdate>1998</pubdate><volume>8</volume><issue>3</issue><fpage>175</fpage><lpage>185</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9521921</pubid></xrefbib></bibl><bibl id="B68"><title><p>Base-calling of automated sequencer traces using phred. II. Error probabilities</p></title><aug><au><snm>Ewing</snm><fnm>B</fnm></au><au><snm>Green</snm><fnm>P</fnm></au></aug><source>Genome Res</source><pubdate>1998</pubdate><volume>8</volume><issue>3</issue><fpage>186</fpage><lpage>194</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9521922</pubid></xrefbib></bibl><bibl id="B69"><title><p>NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins</p></title><aug><au><snm>Pruitt</snm><fnm>KD</fnm></au><au><snm>Tatusova</snm><fnm>T</fnm></au><au><snm>Maglott</snm><fnm>DR</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><issue>Database issue</issue><fpage>D61</fpage><lpage>65</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl842</pubid><pubid idtype="pmcid">1716718</pubid><pubid idtype="pmpid">17130148</pubid></pubidlist></xrefbib></bibl><bibl id="B70"><title><p>BLAST+: architecture and applications</p></title><aug><au><snm>Camacho</snm><fnm>C</fnm></au><au><snm>Coulouris</snm><fnm>G</fnm></au><au><snm>Avagyan</snm><fnm>V</fnm></au><au><snm>Ma</snm><fnm>N</fnm></au><au><snm>Papadopoulos</snm><fnm>J</fnm></au><au><snm>Bealer</snm><fnm>K</fnm></au><au><snm>Madden</snm><fnm>TL</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2009</pubdate><volume>10</volume><fpage>421</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-10-421</pubid><pubid idtype="pmcid">2803857</pubid><pubid idtype="pmpid">20003500</pubid></pubidlist></xrefbib></bibl><bibl id="B71"><title><p>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium</p></title><aug><au><snm>Ashburner</snm><fnm>M</fnm></au><au><snm>Ball</snm><fnm>CA</fnm></au><au><snm>Blake</snm><fnm>JA</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Butler</snm><fnm>H</fnm></au><au><snm>Cherry</snm><fnm>JM</fnm></au><au><snm>Davis</snm><fnm>AP</fnm></au><au><snm>Dolinski</snm><fnm>K</fnm></au><au><snm>Dwight</snm><fnm>SS</fnm></au><au><snm>Eppig</snm><fnm>JT</fnm></au><au><snm>Harris</snm><fnm>MA</fnm></au><au><snm>Hill</snm><fnm>DP</fnm></au><au><snm>Issel-Tarver</snm><fnm>L</fnm></au><au><snm>Kasarskis</snm><fnm>A</fnm></au><au><snm>Lewis</snm><fnm>S</fnm></au><au><snm>Matese</snm><fnm>JC</fnm></au><au><snm>Richardson</snm><fnm>JE</fnm></au><au><snm>Ringwald</snm><fnm>M</fnm></au><au><snm>Rubin</snm><fnm>GM</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au></aug><source>Nat Genet</source><pubdate>2000</pubdate><volume>25</volume><issue>1</issue><fpage>25</fpage><lpage>29</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/75556</pubid><pubid idtype="pmpid" link="fulltext">10802651</pubid></pubidlist></xrefbib></bibl><bibl id="B72"><title><p>B2G-FAR: A Species Centered GO Annotation Repository</p></title><url>http://bioinfo.cipf.es/b2gfar/showspecies?species=7227</url></bibl><bibl id="B73"><title><p>The Universal Protein Resource (UniProt) in 2010</p></title><aug><au><cnm>UniProt_Consortium</cnm></au></aug><source>Nucleic Acids Res</source><pubdate>2010</pubdate><volume>38</volume><issue>Database issue</issue><fpage>D142</fpage><lpage>148</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2808944</pubid><pubid idtype="pmpid">19843607</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>
