<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-8-217</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Sampling <it>Daphnia</it>'s expressed genes: preservation, expansion and invention of crustacean genes with reference to insect genomes</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Colbourne</snm>
               <mi>K</mi>
               <fnm>John</fnm>
               <insr iid="I1"/>
               <email>jcolbour@cgb.indiana.edu</email>
            </au>
            <au id="A2">
               <snm>Eads</snm>
               <mi>D</mi>
               <fnm>Brian</fnm>
               <insr iid="I1"/>
               <email>bdeads@cgb.indiana.edu</email>
            </au>
            <au id="A3">
               <snm>Shaw</snm>
               <fnm>Joseph</fnm>
               <insr iid="I2"/>
               <email>joseph.r.shaw@dartmouth.edu</email>
            </au>
            <au id="A4">
               <snm>Bohuski</snm>
               <fnm>Elizabeth</fnm>
               <insr iid="I1"/>
               <email>ebohuski@indiana.edu</email>
            </au>
            <au id="A5">
               <snm>Bauer</snm>
               <mi>J</mi>
               <fnm>Darren</fnm>
               <insr iid="I3"/>
               <email>djbauer@cisunix.unh.edu</email>
            </au>
            <au id="A6">
               <snm>Andrews</snm>
               <fnm>Justen</fnm>
               <insr iid="I1"/>
               <email>jandrew@cgb.indiana.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>The Center for Genomics and Bioinformatics, and Department of Biology, Indiana University, Bloomington, Indiana 47405, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biology, Dartmouth College, Hanover, New Hampshire 03755, USA</p>
            </ins>
            <ins id="I3">
               <p>Hubbard Center for Genome Studies, University of New Hampshire, Durham, New Hampshire 03824, USA</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>217</fpage>
         <url>http://www.biomedcentral.com/1471-2164/8/217</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17612412</pubid>
               <pubid idtype="doi">10.1186/1471-2164-8-217</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>06</day>
               <month>9</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>06</day>
               <month>7</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>06</day>
               <month>7</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Colbourne et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Functional and comparative studies of insect genomes have shed light on the complement of genes, which in part, account for shared morphologies, developmental programs and life-histories. Contrasting the gene inventories of insects to those of the nematodes provides insight into the genomic changes responsible for their diversification. However, nematodes have weak relationships to insects, as each belongs to separate animal phyla. A better outgroup to distinguish lineage specific novelties would include other members of Arthropoda. For example, crustaceans are close allies to the insects (together forming Pancrustacea) and their fascinating aquatic lifestyle provides an important comparison for understanding the genetic basis of adaptations to life on land versus life in water.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>This study reports on the first characterization of cDNA libraries and sequences for the model crustacean <it>Daphnia pulex</it>. We analyzed 1,546 ESTs of which 1,414 represent approximately 787 nuclear genes, by measuring their sequence similarities with insect and nematode proteomes. The provisional annotation of genes is supported by expression data from microarray studies described in companion papers. Loci expected to be shared between crustaceans and insects because of their mutual biological features are identified, including genes for reproduction, regulation and cellular processes. We identify genes that are likely derived within Pancrustacea or lost within the nematodes. Moreover, lineage specific gene family expansions are identified, which suggest certain biological demands associated with their ecological setting. In particular, up to seven distinct ferritin loci are found in <it>Daphnia </it>compared to three in most insects. Finally, a substantial fraction of the sampled gene transcripts shares no sequence similarity with those from other arthropods. Genes functioning during development and reproduction are comparatively well conserved between crustaceans and insects. By contrast, genes that were responsive to environmental conditions (metal stress) and not sex-biased included the greatest proportion of genes with no matches to insect proteomes.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>This study along with associated microarray experiments are the initial steps in a coordinated effort by the <it>Daphnia </it>Genomics Consortium to build the necessary genomic platform needed to discover genes that account for the phenotypic diversity within the genus and to gain new insights into crustacean biology. This effort will soon include the first crustacean genome sequence.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Among the major groups of the phylum Arthropoda &#8211; Chelicerata, Myriapoda, Crustacea, Hexapoda (insects and relatives) &#8211; the crustaceans and insects are allies <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. They are together classified as members of the Pancrustacea, although their reciprocal monophyly is currently disputed <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Despite this phylogenetic uncertainty for taxa that have likely diverged some 600 million years ago <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> the model crustacean <it>Daphnia </it>is expected to share genes that are central to arthropod biology and development with well studied insects, such as <it>Drosophila</it>, <it>Anopheles</it>, <it>Bombyx </it>and <it>Apis</it>. Indeed, gene-by-gene investigations have already demonstrated the functional conservation of selected loci involved in germline formation and embryonic patterning between representative crustaceans and insects <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. Yet, these two classes of animals have also evolved in radically different environments; branchiopod crustaceans are adapted to aquatic habitats, while the insects are predominantly adapted to terrestrial habitats. It is therefore expected that proteins required for life in these particular environments will reflect the biotic and abiotic challenges faced by these particular taxonomic groups. Furthermore, model crustaceans like <it>Daphnia </it>(order Cladocera) have a highly specialized mode of reproduction called cyclical parthenogenesis <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, which involves environmental sex determination and is derived from obligate sex <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Thus, the genetic control of cyclical parthenogenesis may have arisen from modifications in the structure and/or the regulation of arthropod reproductive genes. Similar mechanisms may apply for a variety of other adaptations, including <it>Daphnia</it>'s morphological transmutations in response to predator kairmones (called cyclomorphosis), their ability to shift from direct development into diapause within ephemeral habitats, and mechanisms for acclimating to both natural and anthropogenic stressors such as hypoxia or metal contamination. The evolution of these traits is expected to involve species-specific modifications of gene regulation, the restructuring of genes common to arthropods <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and innovations unique to their aquatic habitats. Additionally, transitions in breeding systems and the origins of other adaptive traits probably also involve novel genes or lineage specific gene family expansions <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p>Comparative studies into the functional conservation of genes and the genetic basis of adaptation are made easier by the rapid development of genomic data and technologies. For example, cross-species comparisons within the Nematoda, based on over 265,000 expressed sequence tags (ESTs) from 30 species, indicate that roughly 40% of the 93,000 characterized genes have no known homologues within the phylum <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> while 23% of genes are unique to each species <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. These large differences in gene content reflect (in part) the ecological diversity of the sampled nematodes, including free-living species and others that are plant or animal parasites. Not surprising, genetic novelty can be linked to an organism's specialized lifestyle. For instance, unique sequences of the parasitoid nematode <it>Nippostrongylus brasiliensis </it>are nearly 10 times enriched with signal peptides compared to conserved sequences, suggesting that the proliferation of these genes is accelerated because of their defensive role against host immunity <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Aside from the deeply divergent nematode comparisons <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, studies have thus far been restricted to the eukaryotic crown group <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, contrasts among species from the same order (i.e., primates, rodents) or species belonging to a similar class (insects). This situation is a consequence of the currently sparse coverage of genome sequencing projects along the metazoan phylogenic tree. Therefore, the addition of a crustacean to the growing list of sequenced insect genomes will expand the analysis of gene content among the ecologically diverse arthropod assemblage and provide information on the degree of protein family expansions by appropriately rooting the insect phylogeny.</p>
         <p>Given the diversity of crustacean body plans, their fascinating biology and their key phylogenetic relationship to model invertebrates with sequenced genomes, the paucity of crustacean molecular data is striking. Indeed, protein sequences from <it>all </it>crustaceans represent only 0.1% of 6.9 million records in the NCBI taxonomic database. Among crustaceans, the freshwater zooplankton <it>Daphnia pulex </it>has a rich history of attracting attention from biologists &#8211; which now involves researchers in the fields of ecology and evolution, development, toxicology and genetics. Here, we present the first systematic study of transcribed sequences in <it>D. pulex</it>. The results of our survey highlight the diversity of crustacean genes that are shared with insects, and also uncovers gene family expansions that likely reflect the demands of aquatic existence, particularly homeostasis, defense/immunity, oxyregulation, and chemical sensing. In companion papers, we describe the development of the first <it>D. pulex </it>microarray used to investigate sex-biased transcriptional regulation of these genes (Eads et al. submitted) and the genomic response of this sentinel species to toxic metals commonly found in the environment (Shaw et al. submitted, and in prep). These studies are the initial steps in a coordinated effort by the <it>Daphnia </it>Genomics Consortium <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> to build the necessary data banks and reagents needed to discover genomic changes responsible for the phenotypic diversity within the genus and to gain new insights into crustacean biology. This effort will soon include the first crustacean genome sequence.</p>
         <p>We report on the construction of <it>D. pulex </it>cDNA libraries and the sequences and analyses of 1,546 ESTs, of which 1,414 represent approximately 787 nuclear genes. We analyze these transcribed sequences against those of sequenced model invertebrates. Comparing gene inventories by assigning homology among distantly related genomes is not trivial <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The first challenge is to discriminate between genetic gains or losses and genes whose sequences are sufficiently divergent to escape detection. The problem is exacerbated by lineage specific gene or genome duplications, by varying rates of molecular evolution and by the sometimes fragile association between sequence similarity and the preservation of gene functions <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The second challenge is to recognize that reference genome annotations and data banks are fluid, even those for premier model systems. Therefore, this study uses sequence similarity searches for <it>Daphnia </it>genes against a number of different genomic databases for five reference species while intentionally setting low statistical cut-off values. By comparing <it>Daphnia </it>sequences to genes from four insect species and using <it>Caenorhabditis </it>for an outgroup, we point to functional classes that are shared with the insects. The related microarray data of Eads et al. and Shaw et al. in companion papers demonstrate that most of the sequenced <it>Daphnia </it>genes are differentially transcribed in a manner consistent with their putative functions, thus reinforcing their provisional annotations based on sequence alignments to genes from model insects. Our study details the comparative and functional characterization of <it>Daphnia </it>transcripts using well studied insects and a phylogenetic approach.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Production and quality assessment of cDNA libraries</p>
            </st>
            <p>Equivalent non-normalized cDNA libraries were constructed from a genetically clonal <it>Daphnia </it>isolate sampled from a natural pond along the Oregon coast. The clone was cultured under growing conditions favoring parthenogenetic reproduction. Consequently, the animals were predominantly juvenile females, adult females and brood-carrying females with a small proportion of males.</p>
            <p>The strength of conclusions derived from the comparative analysis of expressed gene sequences rests, in large part, on the quality of cDNA libraries. Therefore, we performed quality control tests on 768 randomly chosen cDNA isolates. The cDNA size distribution was determined by agarose gel electrophoresis of PCR amplified inserts. The average molecular weight of inserts sampled from the libraries was 825 bp. To assess the cDNA diversity within the libraries, we sequenced single pass 5' reads from the cDNA inserts. Of the 768 sequence reads, 619 were informative. Only four plasmids were void of inserts and the failed reads (19%) were the result of capillary failures of the sequencer. Following an assembly of the ESTs, unique sequences comprised 68% of the total, reflecting the relative abundances of specific cDNA within the non-normalized libraries. This number diminished to 43% with over twice as much sequencing effort (see below).</p>
            <p>To assess potential contamination of cDNA clones derived from prokaryotes and mitochondria, and to measure the distribution of full length ORFs, we aligned the translated sequences to proteins in Genbank. Of the unique sequences, 50% matched Genbank entries with e-values &lt; 1 &#215; 10<sup>-10</sup>. A separate query of the NCBI non-redundant protein database identified 204 sequences with e-value scores &lt; 1 &#215; 10<sup>-27</sup>. A total of 34 ESTs (6%) were identified as mitochondrial gene transcripts. No ESTs were identified as non-<it>Daphnia</it>. Thus, the cDNA libraries are high quality with a high level of diversity and low levels of contaminant sequences.</p>
            <p>To investigate whether the libraries contained full-length or nearly full-length inserts, sequences from 170 clones with high similarity to known proteins (Blastx &lt; 1 &#215; 10<sup>-27</sup>) were investigated for the presence of a translational start site. Of these sequences, 109 ESTs (64%) contained unambiguous open reading frames with an annotated ATG translational start site at their 5' end, and 44 ESTs (26%) did not contain an ATG that aligned with the start sites of corresponding database sequences. Of the remaining sequences, 7 ESTs (4%) were likely full-length because gapped alignments of the amino acids suggested poor evolutionary conservation at the N-terminus of the proteins, and 10 ESTs (6%) were unresolved because alignments failed altogether. We therefore estimate that 64&#8211;68% of the cDNAs are full-length, or close to full-length. This result may be an overestimate since many conserved genes within our non-normalized libraries encode for ribosomal proteins (34% of 170) which seldom have long transcripts. Indeed, the maximum length of investigated cDNA for open reading frames was &lt; 2 kb, whereas the maximum length of PCR amplified inserts was nearly 3.5 kb. However, when the number of cDNA with and without annotated start sites were compared and sorted by their molecular weights, no association was found between the proportion of full-length transcripts and the size of cDNA, neither by including ribosomal genes (t = 0.39, df = 86; p = 0.70) nor by excluding these genes in the comparison (t = 0.19; df = 60; p = 0.85). A separate investigation of the consistency in our production of full-length or near full-length cDNA was conducted by calculating the proportions of sequences that shared nucleotides within the first 50 bases of the longest EST within contigs (see below for assembly of contigs). Of 233 ESTs forming 81 separate contigs, 202 (87%) shared the first 50 bp of the longest EST from each contig. These data suggest that the majority of the cDNA clones are near full-length.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of EST sequences</p>
            </st>
            <p>In total, we produced 5' sequence reads from 1,648 cDNA isolates. In addition to the 768 randomly selected clones, 880 were selected on the basis of their transcription profiles in microarray experiments. After the removal of vector, poly-A tails and poor quality reads (Table <tblr tid="T1">1</tblr>), 1,546 high quality ESTs with an average size of 540 bp (SD = 188, min = 107, max = 852 bp) remained to be clustered (Genbank accession numbers <ext-link ext-link-type="gen" ext-link-id="EE681877">EE681877</ext-link>-<ext-link ext-link-type="gen" ext-link-id="EE683416">EE683416</ext-link>). The ESTs were assembled into 804 clusters (including 568 singletons) with an average of 1.93 sequences/cluster (SD = 3.95, min = 1, max = 95). After excluding clusters identified as mitochondrial DNA sequences, 787 nuclear genes remained. These non-redundant sequences are hereafter referred to as assembled sequences. We expect that some pairs of assembled sequences will be found to derive from the same locus, either due to excessive polymorphisms between alleles, or because of the alternative use of 5'-exons, or due to sequences from truncated cDNA clones that failed to overlap. However, given the high proportion of estimated full-length clones in the libraries, we anticipate the latter class to be small.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p/>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>Number of sequenced cDNA isolates</p>
                     </c>
                     <c ca="left">
                        <p>1,529</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of sequences obtained:</p>
                     </c>
                     <c ca="left">
                        <p>1,648</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of low quality sequences removed</p>
                     </c>
                     <c ca="left">
                        <p>82</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of plasmids containing inserts &lt;100 bp</p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of cDNA isolates with ESTs</p>
                     </c>
                     <c ca="left">
                        <p>1,435</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of ESTs left to assemble</p>
                     </c>
                     <c ca="left">
                        <p>1,546</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of assembled sequences (contigs + singlets)</p>
                     </c>
                     <c ca="left">
                        <p>804</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of cDNA isolates represented by a single EST</p>
                     </c>
                     <c ca="left">
                        <p>612</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of mtDNA gene clusters</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>or Number of mtDNA ESTs</p>
                     </c>
                     <c ca="left">
                        <p>132</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of assembled sequences from nuclear genes</p>
                     </c>
                     <c ca="left">
                        <p>787</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Sequencing and clustering statistics for cDNA isolates printed on microarrays. Based on these numbers, 43% of the elements are unique.</p>
               </tblfn>
            </tbl>
            <p>We investigated the proportion of assembled sequences that may be composed of alternative transcripts of the same genes by further clustering these sequences using more relaxed parameters (see methods). Forty-seven additional clusters were discovered. Eleven were composed of 2 assembled sequences that are likely allelic variants of the same genes, based on their matches to a single location in a preliminary draft assembly of the <it>Daphnia </it>genome sequence at the mid-point of the genome sequencing project (4 &#215; coverage; deposited at wFleaBase). These allelic sequences were 84% to 97% similar to each other over a range from 77 to 665 overlapping nucleotides. By contrast, 21 of the additional clusters were of transcripts derived from duplicated genes or from conserved gene families, based on their matches to different locations in the draft genome sequence. In 14 cases, the clusters were composed of 2 sequences. Six clusters consisted of 3 sequences and a single cluster contained 4 similar sequences that shared 86&#8211;93% of their nucleotides in pair-wise comparisons. Overall similarities between sequences from closely related genes ranged from 62% to 93% among 220 to 807 overlapping bases. As expected for sequences originating from separate loci, their average similarity (85.5%) was significantly lower than that of allelic sequences (91%)(t = 2.02; df = 44; p = 0.01). Finally, 4 of the additional clusters were composed of paired splice variants from unique loci, while 8 clusters contained from 2 to 4 alternatively spliced transcripts from multiple loci. Therefore, the ESTs from this survey provided sequence tags for up to 787 new <it>Daphnia </it>genes, where some genes represent alternative transcripts or are closely related transcripts from duplicated genes.</p>
         </sec>
         <sec>
            <st>
               <p>Functional annotation of assembled sequences</p>
            </st>
            <p>Confident over the quality of the <it>D. pulex </it>cDNA libraries and EST sequences, we explored the range of likely biological or biochemical functions of the genes represented by the ESTs sequences by querying the NCBI non-redundant protein databank (NR) using Blastx <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Of the 787 assembled sequences, 452 (58%) matched at least one known protein with an e-value threshold of 1 &#215; 10<sup>-3 </sup>and a minimal value of 33 aligned amino acids (Additional file <supplr sid="S1">1</supplr>). The distribution of their e-value scores showed that 26% of matched sequences have scores &lt; 1 &#215; 10<sup>-50</sup>, while 79% have scores &lt; 1 &#215; 10<sup>-10 </sup>(Figure <figr fid="F1">1a</figr>). Therefore, searches for putative homologues in the protein database gave strong to suggestive information regarding possible biological and biochemical functions. As expected, a survey of the distribution of best Blastx matches against the NCBI taxonomic domains showed that the majority (72%) of assembled <it>Daphnia </it>sequences matched best with those derived from other invertebrates (Figure <figr fid="F1">1b</figr>), whereas 25% of the highest scoring hits matched best with vertebrate sequences (including those from rodents, primates and other mammals). The cDNA libraries are free of contaminants, as only 6 assembled sequences (1%) matched bacterial proteins. A closer examination of the distribution of the best Blastx hits within the classes of invertebrates showed that 79% of 323 assembled sequences matched annotated proteins from insects (Figure <figr fid="F1">1c</figr>): 23% from <it>Drosophila</it>, 16% from <it>Anopheles</it>, 15% from <it>Apis</it>, 4% from <it>Bombyx </it>and 21% from other insects. This large insect constituency within the best Blastx matches is clearly a consequence of the limited representation of sequences from Crustacea in the databanks. A survey of the NCBI protein database revealed that out of 6,897,314 archived sequences, only 10,485 (or 0.1%) were from Crustacea. Only 19 assembled sequences (6%) best matched proteins from Branchiopoda, the class that includes <it>Daphnia</it>, while an additional 10 assembled sequences best matched proteins from other classes of Crustacea (Malacostraca, Ostracoda).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Characterization of the <it>Daphnia pulex </it>EST sequences.</p>
                  <p>This file contains the following information about the analysis of EST sequences obtained for the study.</p>
                  <p>(1) EST information.</p>
                  <p> (2) Top match from Blastx searches of clustered <it>Daphnia </it>ESTs against the NCBI non-redundant (nr) protein database.</p>
                  <p>(3) Results using Blast2GO (<url>http://www.blast2go.de/</url>).</p>
                  <p> (4) Top match from Blastx searches of clustered <it>Daphnia </it>ESTs against all <it>Drosophila melanogaster </it>predicted gene translations from annotation 4.2.1 &#8211; 16 columns describe the results. &#8226; Cluster id (<it>Daphnia</it>, this study). &#8226; Subject id (<it>D. melanogaster </it>dmel-all-translation-r4.2.1 data).</p>
                  <p>(5) Differential expression results in 3 microarray experiments.</p>
                  <p>(6) Top match from Blastx searches of <it>Daphnia </it>ESTs against NCBI non-redundant (nr) protein database &#8211; 12 columns describe the results. &#8226; Subject id of the best match in the nr database.</p>
               </text>
               <file name="1471-2164-8-217-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Results from Blastx searches of the assembled <it>Daphnia pulex </it>cDNA sequences against the NCBI non-redundant protein database</p>
               </caption>
               <text>
                  <p>Results from Blastx searches of the assembled <it>Daphnia pulex </it>cDNA sequences against the NCBI non-redundant protein database. (A) Distribution of e-value scores. (B) Distribution of top matches against the NCBI taxonomic domains. (C) A more refined distribution of the best hits that were matched to protein sequences belonging to invertebrates.</p>
               </text>
               <graphic file="1471-2164-8-217-1"/>
            </fig>
            <p>The assembled <it>Daphnia </it>sequences that matched annotated proteins from genetic model species were assigned Gene Ontology (GO) terms using Blast2GO <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Their putative functions spanned a spectrum of biological and biochemical processes (Figure <figr fid="F2">2a</figr>). A total of 227 assembled sequences were assigned 799 biological process terms from the fourth level of the GO. The predominant terms were for metabolic processes ascribed to 190 assembled sequences. These terms included cellular metabolism (22%), primary metabolism (21%), macromolecule metabolism (18%), biosynthesis (10%), biopolymer metabolism (7%), catabolism (5%) and regulation of metabolism (&lt; 1%). From among these processes, 72 assembled sequences were annotated to involve protein biosynthesis, while 30 sequences involved the catabolism of proteins. Sixteen sequences were attributed roles in chitin metabolism, including chitinases and peritrophins. The next most predominant biological process terms were related to the localization of cellular components (establishment of localization, transport, protein localization), which were ascribed to 33 assembled sequences. Four of these sequences coded for genes involved in oxygen transport (hemoglobin). Thirteen of these sequences encoded genes with putative homologues in insects that specifically transport charged atoms like metals, of which seven were also ascribed the functions of cell and ion homeostasis (Figure <figr fid="F2">2a</figr>). Finally, the remaining assembled sequences with GO biological process terms were likely involved in cell communication (such as signal transduction, cell signaling and adhesion), development, and physiological processes that specify a response to external stimuli, stress and cell death.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The distribution of gene annotations for the list of 787 <it>Daphnia pulex </it>ESTs based on results from Blastx searches against the NCBI non-redundant protein database</p>
               </caption>
               <text>
                  <p>The distribution of gene annotations for the list of 787 <it>Daphnia pulex </it>ESTs based on results from Blastx searches against the NCBI non-redundant protein database. (A) The assignment of 799 annotations of biological process to 227 EST clusters from level 4 of the Gene Ontology. (B) The assignment of 371 annotations of molecular function to 288 EST clusters from level 3 of the Gene Ontology. Blastx queries recorded the best 5 matches with an E-value threshold of 1 &#215; 10<sup>-3 </sup>and a minimal value of 33 aligned amino acids. Gene Ontology (GO) terms were assigned to ESTs using Blast2GO [22] with the following configurations: Pre-eValue-hit filter 1 &#215; 10<sup>-3</sup>; Pre-similarity-hit filter 2; Annotation cut-off 35; GO weight 5.</p>
               </text>
               <graphic file="1471-2164-8-217-2"/>
            </fig>
            <p>A total of 288 assembled sequences were additionally assigned 371 GO molecular function terms (Figure <figr fid="F2">2b</figr>). One hundred and thirteen sequences were suggested to have catalytic activities that included hydrolase (17%), transferase (6%), oxidorectuctase (4%), ligase (1%) and isomerase activities, among others. Another 105 assembled sequences were likely involved in structural activities. Their GO terms included structural constituent of the ribosome (18%), structural constituent of the cuticle (9%) and structural constituent of the cytoskeleton (1%). Indeed, <it>Daphnia </it>genes matched to 67 of the total number of 194 listed <it>Drosophila </it>ribosomal components. The next major functional class was represented by 86 sequences putatively involved in binding, which included 27 sequences coding for nucleic acid binding proteins, 18 carbohydrate (also listed as pattern and chitin) binding proteins, 17 nucleotide binding proteins, and 15 proteins that bind ions (calcium, zinc, iron). The remaining 11 sequences within this class were protein or lipid binding. The final major functional class contained 26 assembled sequences assigned to have transporter or carrier activities; twelve sequences were annotated as ion transporters.</p>
            <p>A number of assembled sequences likely encode conserved proteins involved in gene regulatory functions. <it>Daphnia </it>genes with such potential functions based on sequence homologies included 19 sequences involved in transcription regulation and 14 sequences with translational regulator activity (Additional file <supplr sid="S2">2</supplr>). Examples of regulatory genes involved in arthropod development included a putative homologue to <it>maf-S</it>, which is a basic-leucine zipper (bZIP) transcription factor in <it>Drosophila </it>that is required for the development of pharyngeal structures <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. A <it>Daphnia </it>sequence also matched closely to the <it>Dorsal switch protein 1 </it>(<it>Dsp1</it>) gene that regulates a number of homeotic genes in <it>Drosophila </it><abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and is therefore involved in many developmental pathways. The putative homologue to the fly gene <it>shaggy </it>(<it>sgg</it>) was identified, which is part of the Notch, Wnt and Smoothened signaling pathways. Interestingly, two other regulators of the Notch signaling pathway called <it>Cdc42 </it><abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and <it>neurotic </it>(<it>nti </it>or <it>O-fut1</it>) were also identified (Additional file <supplr sid="S3">3</supplr>). The fly gene <it>nti </it>is specifically required for the proper localization of <it>Notch </it>at the cell surface <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, is essential for the physical interaction of <it>Notch </it>with its ligand <it>Delta</it>, and is an essential component for neurogenesis <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. A second gene involved in neurogenesis was identified as homologous to <it>similar to Deadpan </it>(<it>Side</it>), which is a basic helix-loop-helix (bHLH) transcription factor. A <it>Daphnia </it>gene was also matched with the fly zinc-finger-C4 transcription factor (ZnFC4) called <it>ftz transcription factor 1 </it>(<it>ftz-f1</it>), which coordinates stage-specific responses to the steroid hormone ecdysone during metamorphosis <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and directs key developmental events at the transition between prepupal and pupal stages of <it>Drosophila </it>development <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The putative functions for the other identified transcription regulators (Additional file <supplr sid="S2">2</supplr>) included the regulation of mitotic progression (the TFIIH transcription factor <it>Cdk7</it>) and important roles during gametogenesis (<it>Rab11, bic</it>, and the C2H2-zinc finger transcription factor <it>Meics</it>). Other putative transcription factors included genes matching <it>CG1876I </it>that contains a helix-turn-helix (HTH) DNA-binding motif, <it>CG18619 </it>that contains a bZIP DNA-binding motif and <it>CG3224 </it>that contains a putative zinc-finger DNA-binding motif. Finally, among the regulators of translation (Additional file <supplr sid="S2">2</supplr>), <it>Daphnia </it>genes matched to 7 of the total number of 24 listed <it>Drosophila </it>translational elongation genes, yet matched only to 2 of the 58 listed translational initiation genes in flies. Their putative functions include DNA repair (<it>RpLP0</it>), autophagic cell death (<it>eIF-5A</it>, <it>Ef1gamma</it>), immune response (<it>RpS6</it>, <it>Thor</it>), regulation of cell growth (<it>Thor</it>) and germ-line stem cell division (<it>piwi</it>).</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>Supplemental Table 1. <it>Daphnia </it>genes annotated as regulators of transcription and translation based on sequence conservation with <it>Drosophila </it>genes with known functions. Scores are reported from results obtained by Blastx against all predicted translations from version 4.2.1 of the <it>D. melanogaster </it>genome annotation. First and second columns under DE show genes that are differentially expressed (+ = yes) in microarray experiments comparing male versus female transcripts and metals versus no metals exposure, respectively. TF = transcription factor; TR = transcriptional regulation; TE = transcript elongation; E = translation elongation; R = translation regulation.</p>
               </text>
               <file name="1471-2164-8-217-S2.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>Supplemental Table 2. <it>Daphnia </it>genes annotated as signaling proteins and other regulators based on sequence conservation with <it>Drosophila </it>genes with known functions. Scores are reported from results obtained by Blastx against all predicted translations from version 4.2.1 of the <it>D. melanogaster </it>genome annotation. First and second columns under DE show genes that are differentially expressed (+ = yes) in microarray experiments comparing male versus female transcripts and metals versus no metals exposure, respectively.</p>
               </text>
               <file name="1471-2164-8-217-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>To discover <it>Daphnia </it>genes that may jointly participate in conserved biological processes or within gene interaction networks, we investigated GO classes that were highly represented within our list of putative homologues to fly proteins. Significant functional groupings of <it>Daphnia </it>genes were expected, because more than half of the sequenced ESTs were chosen based on their differential expression patterns in separate microarray experiments examining developmental differences among males and females, juveniles and adults, and toxicological responses to metals (Eads et al. submitted; Shaw et al. submitted). Seventeen genes were identified as candidates for gametogenesis (Additional file <supplr sid="S4">4</supplr>); the majority of these loci are involved in oocyte development in flies. <it>Daphnia </it>loci matching the fly genes <it>mago</it>, <it>Rab11</it>, <it>chic</it>, <it>Tm1</it>, <it>tsu </it>and <it>bic </it>may play conserved evolutionary roles in specifying the anterior-posterior axis of the oocyte. All 6 genes save <it>bicaudal </it>(<it>bic</it>) coordinate to assemble the pole plasm at the posterior end of the <it>Drosophila </it>oocyte by localizing maternally derived transcripts for <it>oskar</it>. Moreover, <it>mago</it>, <it>Tm1 </it>and <it>Rab11 </it>are known to interact in genetic screens <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, while a two-hybrid-based fly protein interaction study <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> implicates <it>tsu </it>and possibly a translational elongation <it>RpLP1 </it>homologue (Additional file <supplr sid="S3">3</supplr>) within this network. One other <it>Daphnia </it>gene with weak sequence similarity to <it>Sop2 </it>may have transport functions during oogenesis <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> and two genes similar to the signaling gene <it>Cdc42 </it>and to <it>tsr </it>were respectively identified, which are involved in follicle cell development <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. Other genes similar to <it>snf</it>, <it>RpS3A </it>and <it>sgg </it>in flies are also candidate for oogenesis. Only two genes from our survey have known homologues in flies that function in spermatogenesis. The gene <it>Act5C </it>has a role in sperm individualization <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and <it>Meics </it>is associated with central spindle and mid-body microtubules during meiosis <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. The <it>chic </it>gene in flies is important in gametogenesis for both sexes <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. These <it>Daphnia </it>genes sharing sequence similarities with <it>Drosophila </it>loci, which are known to coordinate conserved developmental processes, are prime candidates for functional investigations of early crustacean development.</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>Supplemental Table 4. <it>Daphnia </it>genes annotated as candidates for gametogenesis based on sequence conservation with <it>Drosophila </it>genes with known functions. Processes include: SP = spermatogenesis; OO = oogenesis; FCD = follicle cell development; GCD = germ cell development; GT = gametogenesis. Two assembled sequences matched CG4027 and two other sequences matched CG2168.</p>
               </text>
               <file name="1471-2164-8-217-S4.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>In contrast to genes that participate in biological processes that are shared between Crustacea and Insecta, gene families that have expanded in <it>Daphnia </it>compared to insects may be indicative of new gene functions linked to their specific biology and ecological setting. Therefore, we identified GO classes that were overrepresented within our dataset compared to the Gene Ontologies for the <it>D. melanogaster </it>proteome using Fisher's exact test, and counted multiple assembled sequences that matched to unique <it>Drosophila </it>proteins (Additional file <supplr sid="S1">1</supplr>). Some lineage expansions seemed to occur primarily by tandem duplication, while other radiations implied interesting functional specializations or innovations. Among the 53 <it>Daphnia </it>sequences that were provisionally annotated as cuticle proteins and genes involved in chitin metabolism and molting, only 13 singularly matched to a fly gene (Additional file <supplr sid="S5">5</supplr>). In most cases, 2&#8211;4 assembled sequences matched to the same protein in the fly genome. Yet in another case, 15 sequences matched with the <it>D. melanogaster </it>gene CG6305, which contains an insect cuticle protein domain. Because we could not produce a reliable sequence alignment for all 15 assembled sequences, we conclude that the observed gene expansion was not an artificial result from inadequate clustering of redundant ESTs. Yet, from the pairwise comparisons of these 15 sequences, alternative splice variants for three genes were identified: Contigs 20 and 180 showed >90% sequence similarity, Contigs 23, 241 and 257 were >94% similar, and Contigs 19 and 24 were >85% identical over shared exons. Additional cDNA sequence data aligned to a completed genome sequence assembly for <it>Daphnia </it>is needed to confirm that cuticle proteins are expanded gene families compared to insects. However, our study also uncovered clearer examples of gene expansions.</p>
            <suppl id="S5">
               <title>
                  <p>Additional file 5</p>
               </title>
               <text>
                  <p>Supplemental Table 4. <it>Daphnia </it>genes annotated as genes associated with exoskeletal function and molting. These include: CP = structural cuticle proteins; PM = peritrophic membrane; CM = cuticle metabolism; CA = chitinase; M = molting; CB = cuticle binding. Assignments are to proteins based on sequence conservation with <it>Drosophila </it>genes with known functions.</p>
               </text>
               <file name="1471-2164-8-217-S5.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Unlike insects, which have three ferritin genes that play important roles in iron homeostasis of cells (Fer1HCH, Fer2LCH) and of organelles (Fer3HCH), the annotation of <it>Daphnia </it>sequences revealed seven assembled sequences with strong matches to <it>Drosophila </it>ferritin proteins (Table <tblr tid="T2">2</tblr>). Singlet 73 showed a strong match to the <it>Drosophila </it>Fer1HCH protein via a Blastx alignment (bit score = 137), but was poorly matched to crustacean sequences, even to a <it>D. pulex </it>ferritin sequence within Genbank (AJ245734; bit score = 71). The remaining six <it>Daphnia </it>sequences, plus the Genbank entry, all aligned best to other crustacean sequences and to the single Fer3HCH locus of <it>Drosophila</it>. Therefore, Singlet 73 represents the first sequence of an orthologous crustacean Fer1HCH gene. This result was verified by constructing a phylogeny using representative insect and crustacean protein sequences and by including additional <it>Daphnia </it>ferritin-like sequences that were extracted from an ongoing <it>D. pulex </it>cDNA sequencing project (Colbourne et al. in preparation).</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p/>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p><it>Daphnia </it>ID</p>
                     </c>
                     <c ca="center">
                        <p><it>Drosophila </it>gene ID</p>
                     </c>
                     <c ca="center">
                        <p><it>Drosophila </it>gene name</p>
                     </c>
                     <c ca="center">
                        <p>FlyBase ID</p>
                     </c>
                     <c ca="center">
                        <p>% similarity</p>
                     </c>
                     <c ca="center">
                        <p>E-value</p>
                     </c>
                     <c ca="center">
                        <p>Bit score</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Singlet 73</p>
                     </c>
                     <c ca="center">
                        <p>CG2216</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Fer1HCH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>FBgn0015222</p>
                     </c>
                     <c ca="center">
                        <p>39.81</p>
                     </c>
                     <c ca="center">
                        <p>4.00E-33</p>
                     </c>
                     <c ca="center">
                        <p>137</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Contig 91</p>
                     </c>
                     <c ca="center">
                        <p>CG4349</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Fer3HCH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>FBgn0030449</p>
                     </c>
                     <c ca="center">
                        <p>39.66</p>
                     </c>
                     <c ca="center">
                        <p>6.00E-27</p>
                     </c>
                     <c ca="center">
                        <p>117</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Contig 26</p>
                     </c>
                     <c ca="center">
                        <p>CG4349</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Fer3HCH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>FBgn0030449</p>
                     </c>
                     <c ca="center">
                        <p>38.22</p>
                     </c>
                     <c ca="center">
                        <p>8.00E-23</p>
                     </c>
                     <c ca="center">
                        <p>103</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Contig 138</p>
                     </c>
                     <c ca="center">
                        <p>CG4349</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Fer3HCH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>FBgn0030449</p>
                     </c>
                     <c ca="center">
                        <p>40.61</p>
                     </c>
                     <c ca="center">
                        <p>1.00E-22</p>
                     </c>
                     <c ca="center">
                        <p>103</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Contig 217</p>
                     </c>
                     <c ca="center">
                        <p>CG4349</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Fer3HCH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>FBgn0030449</p>
                     </c>
                     <c ca="center">
                        <p>40.16</p>
                     </c>
                     <c ca="center">
                        <p>4.00E-14</p>
                     </c>
                     <c ca="center">
                        <p>74.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Contig 40</p>
                     </c>
                     <c ca="center">
                        <p>CG4349</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Fer3HCH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>FBgn0030449</p>
                     </c>
                     <c ca="center">
                        <p>33.75</p>
                     </c>
                     <c ca="center">
                        <p>2.00E-08</p>
                     </c>
                     <c ca="center">
                        <p>54.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Contig 42</p>
                     </c>
                     <c ca="center">
                        <p>CG4349</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Fer3HCH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>FBgn0030449</p>
                     </c>
                     <c ca="center">
                        <p>26.25</p>
                     </c>
                     <c ca="center">
                        <p>2.00E-06</p>
                     </c>
                     <c ca="center">
                        <p>47.8</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><it>Daphnia </it>genes annotated as candidates for iron ion homeostasis based on sequence conservation with <it>Drosophila </it>genes with known functions. Contig 26 and 138 are two alleles from the same locus. Contigs 40, 42 and 91 are also sequence variants from the same locus.</p>
               </tblfn>
            </tbl>
            <p>The Neighbor-Joining tree of 35 aligned amino acid sequences clustered the pancrustacean ferritins into three main groups (Figure <figr fid="F3">3</figr>). Ferritin 1 contained insect genes plus two cDNA from <it>Daphnia </it>libraries; the Singlet 73 amino acid sequence was identical to a sequence extracted from other cDNA libraries (branch G, Figure <figr fid="F3">3</figr>), while branch F was a unique gene that stemmed at the base of the group. The ferritin 2 group was solely composed of insect genes. However, both crustacean and insect sequences clustered into the third group containing the insect ferritin 3 genes. Although this group contained single copies of the insect genes, the <it>Daphnia </it>genes were further subdivided among five branches representing distinct ferritin 3 loci within the <it>D. pulex </it>genome. Branches D and E were gene sequences derived from the other libraries and showed 45% sequence divergence from each other, while branches A, B, C each contained at least one sequence from this present study. Further investigations indicated that the multiple sequences clustering within branches A, B and C are different alleles of the same locus. Given that insects and more distant outgroups have only three ferritin genes, <it>Daphnia </it>ferritins clearly expanded to include <it>possibly </it>one additional ferritin 1 locus and <it>minimally </it>four additional ferritin 3 genes.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Lineage specific expansion of the <it>Daphnia pulex </it>ferritin genes</p>
               </caption>
               <text>
                  <p>Lineage specific expansion of the <it>Daphnia pulex </it>ferritin genes. Neighbor-Joining (NJ) tree inferred from the deduced amino acid sequences of the <it>Daphnia </it>ferritin genes, including loci from <it>Drosophila melanogaster </it>plus other representative insect and crustacean amino acid sequences obtained from the NCBI and FlyBase protein sequence repositories. Ferritin 1 group contains insect and <it>Daphnia </it>Fer1HCH gene(s). Ferritin 2 group only contains insect Fer2LCH loci and the Ferritin 3 group contains insect and crustacean Fer3HCH genes. The amino acid sequence alignment was obtained by using t-coffee [72] and is available by request. The NJ tree was constructed using MEGA3 [73] using the Poisson correction for calculating the distance matrix. The bootstrap support values are shown at the main branch nodes of the tree, which are derived from 1000 pseudo-replication of the data. <it>D. pulex </it>sequences denoted by * were obtained from an ongoing cDNA sequencing project by the Joint Genome Institute and the <it>Daphnia </it>Genomics Consortium (Colbourne et al. in prep) and are deposited in Genbank under accession numbers <ext-link ext-link-type="gen" ext-link-id="DQ983425">DQ983425</ext-link>-<ext-link ext-link-type="gen" ext-link-id="DQ983438">DQ983438</ext-link>. GenInfo (GI) accessions for all other sequences: 6946692; 61744051; 26006755; 46561742; 91081285; 87083910; 66504201; 1807496; 13195275; 55242312; 66524157; 91077442; 24651358; 95702694; 18031707; 62722854; 6409191; 91077446; 66524161; 7272336; 62722856.</p>
               </text>
               <graphic file="1471-2164-8-217-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Gene conservation between the crustacean <it>Daphnia </it>and the true insects</p>
            </st>
            <p>To examine the conservation of genes represented by the EST sequences across Pancrustacea, we used tBlastx to match the <it>Daphnia </it>assembled sequences to the NCBI UniGene sets for <it>Drosophila melanogaster</it>, <it>Anopheles gambiae</it>, <it>Bombyx mori</it>, <it>Apis mellifera </it>and included <it>Caenorhabditis elegans </it>for an outgroup. The assembled sequences were clustered into 27 groups, based on the strength of these sequence matches across these taxonomic data sets. This arrangement identified a variety of gene classes that share patterns of sequence conservation (Figure <figr fid="F4">4</figr>). The first class of interest was composed of 124 genes (16%) that are conserved equally among all species included in this study. This class was mostly enriched by genes that participate in protein metabolism including protein modifications (67 genes to GO:0044267; p = 4.8 &#215; 10<sup>-7</sup>) plus 32 genes that were likely involved in cellular metabolism (total of 99 genes to GO:0044237; p = 3.2 &#215; 10<sup>-8</sup>). Other genes enriched within this class included transcriptional regulators (12 genes to GO:0045449; p = 2.4 &#215; 10<sup>-3</sup>). The second class of interest was composed of <it>Daphnia </it>sequences that matched a nematode protein plus at least one insect locus (183 genes) and others that had no matches to nematode proteins yet matched to at least one insect proteome (167 genes). Therefore, 21% of the sequences were derived within the Pancrustacea &#8211; thus shared by <it>Daphnia </it>plus at least one insect in our set &#8211; or lost within the nematode. Especially noticeable were 43 assembled sequences that had no detectable homologues in worms and were uniformly conserved across the four insects (Figure <figr fid="F4">4</figr>). Genes that were absent in nematodes were enriched with structural constituents of the cuticle (26 genes to GO:0042302; p = 1.1 &#215; 10<sup>-12</sup>) and loci having serine-type endopeptidase activity (10 genes to GO:0004252; p = 3.6 &#215; 10<sup>-4</sup>), of which 8 genes were annotated as also having chymotrypsin activity (GO:0004263; p = 0.003). The third class of interest consisted of 309 genes (39%) that had no matches to insect proteomes. At face value, this result suggests that these orphaned genes are unique to <it>Daphnia </it>or Crustacea; either they have been lost in the insects &#8211; as suggested by 17 <it>Daphnia </it>genes (2%) showing sequence similarity to proteins in the nematode database &#8211; or acquired by <it>Daphnia </it>or Crustacea since diverging from their last common ancestor.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>The clustering of the <it>Daphnia pulex </it>assembled ESTs based on their matches to genes from multiple databases, obtained from tBlastx searches against NCBI UniGene sets for <it>Caenorhabditis elegans </it>(CE)(build #23), <it>Bombyx mori </it>(BM)(build #7), <it>Apis mellifera </it>(AM)(build #5), <it>Anopheles gambiae </it>(AG)(build #29), <it>Drosophila melanogaster </it>(DM)(build #37) and from Blastx searches against the NCBI non-redundant (NR) protein database</p>
               </caption>
               <text>
                  <p>The clustering of the <it>Daphnia pulex </it>assembled ESTs based on their matches to genes from multiple databases, obtained from tBlastx searches against NCBI UniGene sets for <it>Caenorhabditis elegans </it>(CE)(build #23), <it>Bombyx mori </it>(BM)(build #7), <it>Apis mellifera </it>(AM)(build #5), <it>Anopheles gambiae </it>(AG)(build #29), <it>Drosophila melanogaster </it>(DM)(build #37) and from Blastx searches against the NCBI non-redundant (NR) protein database. The color intensity is proportional to the Bit Score, which ranges from &lt;50 (black) to 535 (bright yellow). Three classes of interesting genes are indicated (see text).</p>
               </text>
               <graphic file="1471-2164-8-217-4"/>
            </fig>
            <p>There are many potential sources of errors that can inflate our estimate of the fraction of unique <it>Daphnia </it>genes compared to the selected pancrustaceans. For instance, sequences may fail to align for technical reasons. This may occur if the <it>Daphnia </it>sequences included untranslated regions (UTR) of the cDNA and not the coding regions. Indeed, the mean size of predicted open reading frames (ORFs) within this class differed significantly from that of genes having sequence matches to insect proteomes (t = 12; p &lt; 0.0001; df = 785). For example, over half of the assembled sequences with no matches had ORFs smaller than 225 bases compared to 16% of matched sequences (Figure <figr fid="F5">5</figr>). Therefore, the trivial explanation that these sequences were mostly UTR cannot be dismissed for a large fraction of these genes. Other technical explanations for the absence of matches include genes that had not been annotated or included in the insect UniGene sets. Further analysis by aligning the non-matching sequences to all predicted <it>Drosophila </it>gene translations uncovered 9 additional matches with e-values ranging from 4 &#215; 10<sup>-3 </sup>to 2 &#215; 10<sup>-10</sup>. Another 4 <it>Daphnia </it>sequences were found to have matches to <it>Drosophila </it>proteins, based on tBlastx searches against the full genome sequence (e-values ranging from 4 &#215; 10<sup>-3 </sup>to 9 &#215; 10<sup>-29</sup>). Finally, given the tremendous evolutionary divergence between <it>Daphnia </it>and insects, matches may not have been detected from loci that are not under similar evolutionary constraints. We are unable to investigate this last point with the current data.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>The distribution of predicted open reading frames (ORFs) for two classes of assembled EST sequences for <it>Daphnia pulex</it></p>
               </caption>
               <text>
                  <p>The distribution of predicted open reading frames (ORFs) for two classes of assembled EST sequences for <it>Daphnia pulex</it>. Black bars represent genes with no detectable matches to insect proteomes. Grey bars represent genes with matches to insect proteins based on Blastx searches.</p>
               </text>
               <graphic file="1471-2164-8-217-5"/>
            </fig>
            <p>It was previously shown that gene preservation is correlated with gene function. In particular, correlations have been found between the level of gene conservation and sex-biased gene expression among insects <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. There is reason to believe that such correlations are extended to other biological functions. In comparing 787 <it>Daphnia </it>assembled sequences to those of insects, 39% of the genes were characterized as orphans because no sequence matches were detected. Interestingly, the orphan genes were not randomly distributed among the gene expression classes. Three specific observations were made by incorporating the gene expression datasets (Table <tblr tid="T3">3</tblr>). First, 47% of male-biased genes did not match insect proteins compared to only 22% of female-biased genes. The two fold difference in sequence similarity among the sex-biased genes in <it>Daphnia </it>is consistent with differences seen among the insects, reflecting the overall accelerated evolution of male reproductive genes <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. Second, 46% of metal responsive genes did not match insect proteins compared to only 34% of non metal responsive genes. Third, genes that were responsive to metals and not sex-biased included the greatest proportion of orphans (50%), whereas genes that were female biased and not responsive to metals included the fewest (12%). These results suggest that lineage specific genes are correlated with certain biological functions associated with an organism's ecological challenges.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p/>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Response to metals</p>
                     </c>
                     <c ca="left">
                        <p>Up in males</p>
                     </c>
                     <c ca="left">
                        <p>Up in females</p>
                     </c>
                     <c ca="left">
                        <p>No change between sexes</p>
                     </c>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Responsive to cadmium and/or arsenic</p>
                     </c>
                     <c ca="left">
                        <p>45% (85/188)</p>
                     </c>
                     <c ca="left">
                        <p>42% (25/59)</p>
                     </c>
                     <c ca="left">
                        <p>49% (47/96)</p>
                     </c>
                     <c ca="left">
                        <p>46% (157/343)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>No change in both metals</p>
                     </c>
                     <c ca="left">
                        <p>48% (95/197)</p>
                     </c>
                     <c ca="left">
                        <p>12% (13/112)</p>
                     </c>
                     <c ca="left">
                        <p>33% (44/135)</p>
                     </c>
                     <c ca="left">
                        <p>34% (152/444)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="left">
                        <p>47% (180/385)</p>
                     </c>
                     <c ca="left">
                        <p>22% (38/171)</p>
                     </c>
                     <c ca="left">
                        <p>39% (91/231)</p>
                     </c>
                     <c ca="left">
                        <p>39% (309/787)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The percentage of <it>Daphnia pulex </it>assembled EST sequences with no matches to insect proteins, partitioned by their differential expression patterns in experiments designed to detect transcriptional differences between the sexes (Eads et al. submitted) and genes responding to toxic metal exposure to cadmium and arsenic (Shaw et al. submitted, and in prep). A 5% false discovery rate is applied to all three experimental results. The number of orphan genes over the total number of genes within the partition is indicated in parentheses.</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Diversity in the gene complement of species arises from the expansion of shared ancestral gene families, the loss of existing genes, or the acquisition of newly invented genes <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp> and can account for lineage specific innovations. It is estimated that nearly half of the paralogous gene families within eukaryotic genomes originated by lineage specific gene expansions; many are related to an organism's unique mode of life <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. For example, the evolution of disease resistance and of self-incompatibility in plant mating systems can partly be attributed to the radiation of novel receptor-like kinases within the plant genome <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. In <it>Drosophila</it>, the trypsin-like serine proteases have expanded to 178 genes <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, suggesting important novel defenses by the fly immune system. Odorant receptors form the largest recorded nematode-specific gene family expansion &#8211; numbering ~800 genes compared to 60 genes in flies <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> &#8211; which suggests the importance of chemosensing in the soil environment. A similar genomic inventory for a branchiopod crustacean genome will soon be made available by the <it>Daphnia </it>Genomics Consortium to ultimately contrast the evolutionary diversification of arthropod genes in relation to the aquatic and terrestrial habits of these animals. Yet, crustaceans and insects also share many key biological features due to their common ancestry as members of the Pancrustacea. This study presents the results of a first investigation into the sequence conservation and putative function of <it>D. pulex </it>genes, which are identified by sequencing a set of 1,648 cDNA isolates that were interrogated in three microarray studies. From clustering the ESTs, we characterize 787 <it>Daphnia </it>loci based on their sequence similarity to genes within a variety of databases, including those for the insects <it>Bombyx</it>, <it>Apis</it>, <it>Anopheles, Drosophila</it>, and for the nematode <it>Caenorhabditis</it>.</p>
         <sec>
            <st>
               <p>Shared genes</p>
            </st>
            <p>In this study, we characterize two non-normalized cDNA libraries from a clonal population reared under standard laboratory conditions. As a result, the diversity of biological processes and molecular functions represented among the sequenced <it>Daphnia </it>genes is relatively modest. Almost one quarter of the genes are likely involved in metabolic processes. More than one quarter of the genes are predicted to have catalytic or structural activities. Although a large fraction of arthropod genomes is composed of genes having these basic cellular functions (35&#8211;40% of <it>Drosophila </it>genes for instance), the diversity of transcribed genes discovered in <it>Daphnia </it>would be augmented by creating libraries from animals under a variety of environmental conditions. Yet, our libraries do contain cDNA from daphniids of mixed life stages, including gravid females, embryos, juveniles and a small number of males. Therefore, some <it>Daphnia </it>genes have sequence similarity to insect proteins associated with reproduction, development and growth. These include regulatory genes like <it>Dsp1</it>, which operates in patterning the developing fly embryo by acting as a corepressor of the transcriptional regulator Dorsal protein <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Under different circumstances, <it>Dsp1 </it>can also act as an activator or repressor of thorax-group and polycomb-group homeotic genes in <it>Drosophila </it><abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Because all known homeotic targets of <it>Dsp1 </it>are conserved in sequence and function in metazoans, <it>Daphnia</it>'s putative orthologue is likely to share this regulatory function. As expected, this gene transcript is enriched in pregnant females compared to males in <it>Daphnia </it>microarray experiments (Additional file <supplr sid="S2">2</supplr>). Putative homologues to three regulatory genes (<it>sgg</it>, <it>Cdc42</it>, <it>O-fut1</it>) within the Notch signaling pathway are identified, which is one of a small number of signal transduction pathways that are highly conserved in insects and throughout animal evolution. The gene <it>sgg </it>is a point of convergence between the Notch and Wnt/wingless signaling pathways <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. We predict that further sequencing of <it>Daphnia </it>cDNA will uncover more genes operating within these and other conserved signaling mechanisms: nuclear receptors, Sonic Hedgehog, receptor tyrosine kinases, JAK/STAT, and BMP/TGF-beta. An example is a homologue to the <it>ftz-f1 </it>nuclear hormone receptor that is also found on the microarray. Like all arthropods, <it>Daphnia </it>growth is synchronized with molting and the regeneration of the cuticle, which is governed by pulses of ecdysteroid hormones. Although one isoform of <it>ftz-f1 </it>is a transcriptional regulator of the embryonic segmentation gene <it>fushi tarazu</it>, a second isoform is necessary for larval molting in <it>Drosophila </it>and its premature expression results in the disruption of the epicuticle, suggesting that targets for this transcription factor in flies include genes involved in cuticle formation <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. The function of this gene is conserved in <it>Caenorhabditis</it>, where it is also required for epidermal development and regulates molting <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. No differential expression is observed on the microarray for the signaling genes discussed above (Additional file <supplr sid="S3">3</supplr>). Overall, our survey of <it>Daphnia </it>ESTs uncovers genes expected to be present in crustacean genomes based on their important regulatory roles in conserved cellular and developmental processes.</p>
            <p>Roughly half of the cDNA isolates that are sequenced for this study are chosen based on their differential expression patterns between males and females and on their responses to toxic metals. These experimental conditions reflect two research interests of our labs involving <it>Daphnia</it>: the genetic basis of environmental sex determination and cyclical parthogenesis, and understanding how populations adapt to environmental change in aquatic habitats including industrial pollutants. Therefore, we expect that the provisional annotations of 787 assembled sequences include a fraction of genes sharing functional attributes that would in part be shared with other arthropods and others that are more reflective of <it>Daphnia</it>'s unique biology. Certainly, 17 genes are candidates for gametogenesis. The majority of these genes play significant roles in oogenesis. In particular, six sequences have strong matches to conserved genes that specify the oocyte polarity and four loci are known to genetically interact in flies. By contrast, only three of these 17 <it>Daphnia </it>sequences match genes that are known to function during spermatogenesis in <it>Drosophila</it>. This discrepancy between the numbers of sex specific transcripts for genes involved in reproduction is likely caused by the small representation of males within the <it>Daphnia </it>cultures used to create the cDNA libraries. Equally impressive are the large number of cuticle proteins identified.</p>
         </sec>
         <sec>
            <st>
               <p>Expanded protein families</p>
            </st>
            <p>In the comparative study of the <it>Anopheles </it>and <it>Drosophila </it>proteomes <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, cuticular proteins were noted to be particularly active in their lineage specific expansions and deletions. This present study identifies 53 sequences that are either structural components of the cuticle or involved in chitin metabolism. Their abundance within our dataset is a consequence of their transcriptional responses to the microarray experiments; all but four of the assembled sequences were differentially regulated on the arrays. Yet as noted in the comparisons between the two dipteran insects, many <it>Daphnia </it>sequences share similarities to single loci within insect genomes. Among these cuticle loci are 15 assembled sequences that are best aligned to a single <it>Drosophila </it>gene when compared to the rest of the fly proteome. Alternative transcripts account for only 4 sequences. Thus, no less than 11 loci remain as possible representatives of a large lineage specific gene expansion of <it>Daphnia </it>cuticle genes. Further investigations are obviously required to verify this finding, including more thorough sampling of the <it>Daphnia </it>transcriptome and functional data such as <it>in situ </it>hybridizations to support the notion that these genes may have contributed to biological innovations. Regarding the 15 transcripts on the current array, all but one is enriched in males compared to females, and with the addition of differential expression patterns under metal stress, these transcripts can be grouped into five separate expression profiles.</p>
            <p>A first compelling case for lineage specific gene expansion is made by investigating the diversity of ferritin genes within the <it>Daphnia </it>ESTs. Except for <it>Aedes aegypti</it>, their number of ferritin genes are evolutionarily conserved <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. Ferritins are the principle iron storage proteins for nearly all animals, and their abundance within cells is controlled in part by iron-regulatory proteins that interact with iron-regulatory elements (IREs) within alternatively spliced 5' UTRs of certain mRNAs <abbrgrp><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr></abbrgrp>. In insects, ferritins consist of a heavy-chain homolog (HCH) and a light-chain homolog (LCH) forming heterodimers that function in the secretory pathways of cells, and which also appear to act as iron transporters <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. The genes encoding the subunits (Fer1HCH, Fer2LCH) are positioned in the <it>Drosophila </it>genome in a back-to-back orientation, enabling coordinated regulation of their transcription <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. This feature is conserved in all insects studied thus far <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. Except in <it>Bombyx</it>, which has IREs within the UTRs of both subunits, insect IREs are predominantly localized to the Fer1HCH locus. Recently, a third <it>Drosophila </it>ferritin (Fer3HCH) has been described that controls iron homeostasis of the mitochondria, yet its transcription is not responsive to iron treatment <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. As in humans and mice, the gene is predominantly expressed in adult testis. Our phylogeny of <it>Daphnia </it>ferritin gene transcripts uncovers six or seven distinct <it>Daphnia </it>loci (Figure <figr fid="F3">3</figr>). The branch F locus cannot be unequivocally included as part of the ferritin expansion until a description of the gene is available based on its alignment to the genome sequence. However, all of the other loci are defined based on their sequence alignments to distinct genome scaffolds assembled at this point in the <it>Daphnia </it>genome sequencing project (not shown).</p>
            <p>Branch G of the ferritin phylogeny represents the first characterized crustacean orthologue to the insect Fer1HCH genes. Following naming conventions, we designate this gene as Dpu_Fer1HCH. The gene has four introns and is the only <it>Daphnia </it>ferritin on the array showing differential expression for all three experimental conditions; its transcripts are enriched in males, depleted when exposed to cadmium and enriched when challenged by arsenic (Eads et al. submitted; Shaw et al submitted and in prep). Therefore like the insect subunit, this locus responds to metal ion treatments. The other five <it>Daphnia </it>ferritin genes are homologous to the insect Fer3HCH loci and are arranged within a monophyletic cluster, suggesting that they originate from a series of gene duplications. Indeed, each has retained two introns, despite showing amino acid sequence divergences from 14 to 56%. These genes are designated Dpu_Fer3HCH-1 to Dpu_Fer3HCH-5. Regrettably, expression data is not available for genes representing branches D and E. However, the microarray results show that the remaining three loci differ in their transcriptional responses to the experimental treatments. Like the <it>Drosophila </it>Fer3HCH gene, elements on the array whose sequences cluster within branch C do not respond to metals, yet unlike the fly gene, Contig 26 transcripts are enriched in females. By contrast, a single element on the array representing branch B is enriched in males, and all 4 cDNA elements from this gene respond to arsenic treatment. These differences are likely the result of undetected splice variants. Lastly, all elements representing branch A show elevated expression patterns when treated with arsenic, yet no sex specific expression is detected. These additional observations strongly support the existence of a Fer3HCH gene expansion that diversified within a crustacean lineage leading to <it>D. pulex</it>. A homologue to the insect Fer2LCH genes has yet to be discovered in Crustacea.</p>
         </sec>
         <sec>
            <st>
               <p>Orphan genes</p>
            </st>
            <p>The present study is a preliminary annotation of the emerging <it>D. pulex </it>genome using comparative and functional data to predict the fraction of genes unique to <it>Daphnia</it>. Aside from gene expansions that are suggestive of adaptations specific to aquatic environments, accounts of orphan genes (defined here has having no matches to insects) can offer equally important insights into crustacean biology from the perspective of a class of genes that usually has the shortest average lengths and are most rapidly evolving <abbrgrp><abbr bid="B51">51</abbr><abbr bid="B57">57</abbr></abbrgrp>. Our searches using Blastx of 787 assembled sequences against the proteome of four insects suggest that ~68% of the <it>Daphnia </it>genes are shared with at least one insect. Less than half of these show matches to the individual protein databases of <it>Bombyx</it>, <it>Apis</it>, <it>Anopheles </it>and <it>Drosophila</it>. This discrepancy is likely a combined effect of incomplete datasets and of lineage specific gene losses among the insects. Taking into account associated matches to proteins from our chosen outgroup (<it>Caenorhabditis</it>), we discover that 21% of the <it>Daphnia </it>genes are either derived within Pancrustacea or lost within the nematodes. Those genes that are uniformly conserved across all four insect species are primarily cuticle proteins and serine proteases having trypsin activity. These gene families are also listed as two of the top 20 most significant expansions or reductions between the <it>Anopheles </it>and <it>Drosophila </it>proteomes <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, which diverged some 250 million years ago. It is tempting to speculate that, in both crustaceans and insects, a fraction of gene families are equally active in their evolutionary diversification. Such gene families would be candidates for detailed investigations leading to a better understanding of how Pancrustacea succeeded in exploiting its range of ecological settings.</p>
            <p>A careful evaluation of assembled sequences showing no matches to the insect proteomes suggests that ~1/3 of the genes are either derived in crustaceans or lost within insects. This estimation is admittedly from a very limited sampling of the total number of <it>Daphnia </it>genes and is derived from sequencing non-normalized cDNA libraries that were created under standard laboratory conditions and interrogated by microarrays. Although this fraction cannot be extrapolated to the genome, it is comparable to findings from other taxa. In <it>D. melanogaster</it>, 10% of the genes have homologous best hits in non-insect species plus 19% have no homologous hits to other species, while the combined estimate for <it>A. gambiae </it>is 21% <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. Within the nematodes, which diverged ca. 600 mya, 23% of the genes are estimated to be unique to species <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Of course, the fraction of species specific genes declines dramatically when evaluating close allies; comparing two <it>Caenorhabditis </it>species reveals that 4% of their genes are unique <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>, and the mouse gene set differs from the human set by only 1% <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>. A future investigation of a larger <it>D. pulex </it>gene collection against an equivalent dataset for the congener <it>D. magna </it><abbrgrp><abbr bid="B60">60</abbr></abbrgrp> will help define the true estimate of species specific genes in <it>Daphnia</it>. Additional contributions of EST data for non-branchiopod crustaceans will further define the crustacean proteome and shed light on the biological factors that led to the group's divergence from insects. However, the sequences presented by our present study are accompanied by expression data from three microarray experiments, which authenticate the orphan sequences as genes and support the notion that ecological factors are more likely to contribute to sequence and functional divergences among genomes.</p>
            <p>Combining the gene expression data obtained by Eads et al. (submitted) and Shaw et al. (submitted, and in prep) for the 787 assembled sequences reveals that the majority of genes with sex biased expression, including developmental and regulatory loci, do not respond to the cadmium and arsenic metal toxicity. A clear example is provided by genes predicted to regulate translation. All 15 genes save two are differentially expressed in males versus females and only three genes also show transcriptional responses to metal toxicity (Additional file <supplr sid="S2">2</supplr>). We find that this class of sex biased genes proportionally contains the fewest orphans. The relatively larger number of sequences with matches to the insect proteome suggests that genes functioning during development and reproduction are generally well conserved between crustaceans and insects. Further work is required to elucidate crustacean and <it>Daphnia </it>specific components of these central processes. By contrast, nearly half of the <it>Daphnia </it>genes that respond to metals, but show no differences between the sexes, are likely absent in insects. This is explicable in light of the fact that metal exposure is an ecological stressor that varies between aquatic and terrestrial environments <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>, which has catalyzed the evolution of certain protein types (cuticles, iron metabolism, defense) to increasingly specialized functions. The extent to which ecology has shaped the genome organization of pancrustaceans is an important future direction for research. For example, the mosquito <it>A. gambiae </it>spends part of its larval stage in water; by comparing genes differentially expressed during this stage to expression patterns in <it>D. pulex</it>, it may be possible to examine the effects of an aquatic lifestyle on the expression of particular protein families.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>This work investigates the sequence preservation and expansion of genes from the crustacean <it>D. pulex </it>compared to insect proteomes, based on the analysis of 1,546 ESTs that represent 787 unique transcripts. Our sampling of cDNA from this emerging genomic model species reveals sequences that have largely been conserved in both groups representing arthropods evolving in water or on land. Genes that function for reproduction, regulation of cellular processes and development are identified; some are known to genetically interact in the model insect species <it>Drosophila</it>. This provisional annotation of <it>Daphnia </it>sequences is further verified by companion studies using cDNA microarrays to examine transcription in males, females and embryos (Eads et al. submitted) and under toxic metal stress (Shaw et al. submitted, and in prep). Here we identify cases of lineage specific gene family expansions by a series of gene duplications. For instance, there are as many as seven distinct ferritin loci indicated by cDNA and genome data, including a crustacean orthologue to the insect Ferritin 1 locus and a monophyletic grouping of five Ferritin 3 genes. Finally, our results suggest that, as we study the genomes of organisms distantly related to the classic model laboratory organisms, the majority of unknown genes will be functionally linked to the organisms' ecology. Compared to the gene sets showing differential expression among developmental stages, we observe that sets responding to ecological stress contain a greater proportion of loci with no sequencing similarity to previously characterized arthropod genes. A comprehensive inventory of putative orthologs, orphan genes, and lineage specific gene expansions coupled with functional genomics data will provide important insights into genomic changes that led to the adaptive radiation of crustaceans.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>cDNA library construction and quality assurance</p>
            </st>
            <p>For the purpose of creating a collection of cDNA for printing onto microarrays, a clonal isolate of a <it>D. pulex </it>/<it>D. pulicaria </it>hybrid (called log52) was cultured under standard laboratory conditions by Jim Haney (University of New Hampshire) within a large, aerated, 200 liter container of filtered lake water by feeding a concentrated monoculture of green algae (<it>Scenedesmus acutus</it>). Animals at all life stages were harvested and immediately processed. Total RNA was isolated using Trizol reagent (Invitrogen Life Sciences) and was subsequently purified using the RNeasy protocol (Qiagen). The cDNA libraries were constructed by Darren Bauer and Kelley Thomas (University of New Hampshire) using the Creator SMART (Clontech) system by following the manufacture's instructions. The cDNA was ligated into the pDNR-LIB vector supplied by Clontech.</p>
            <p>To control for bias towards smaller fragments inserting during the ligation of cDNA into plasmids, reaction were performed on four cDNA size fractions. Size fractionation was performed as per the SMART cDNA protocol using the CHROMA SPIN-400 column. The column was prepared for drip procedure by inverting several times to completely resuspend the gel matrix and storage buffer was drained by gravity flow. Seven hundred microliters of column buffer were added to the column and allowed to drain out, then 100 &#956;l of a mixture of <it>Sfi </it>I-digested cDNA and xylene cyanol dye were applied to the matrix and allowed to fully absorb. One hundred microliters of column buffer were added to the matrix and allowed to fully absorb, then 600 &#956;l of column buffer were added and single-drop fractions were collected in 16 tubes. The profile of the fractions was verified by running 3 &#956;l of each fraction on a 1.1% agarose/EtBr gel at 150v for 10 minutes. The samples were then pooled into two size classes; fractions 7 and 8 were pooled into the "large size" and fractions 9 &amp; 10 were pooled into the "small size".</p>
            <p>From the libraries, 768 colonies were chosen for quality assurance tests. The bacterial transformants were amplified in selective 2xYT media, plasmids were purified by an alkaline lysis protocol according to the manufacturer's instructions (PerfectPrep, Eppendorf) and quantified by spectrophotometry. The molecular weights of cDNA inserts were measured by PCR amplification of cDNA inserts using the M13 vector primers M13fw (GTG TAA AAC GAC GGC CAG TAG) and M13rev (AAA CAG CTA TGA CCA TGT TCA C) followed by agarose gel electrophoresis against standards and visualized using a Kodak 440cf imaging station. Sequencing reactions were performed by priming at the 5' end of cDNA using vector primer pDNRlib30-50 (TAT ACG AAG TTA TCA GTC GAC G), ABI BigDye chemistry and the 3730 sequencer. Vector and poor quality sequences were trimmed from the sequencing reads and ESTs were assembled into contigs using the SeqManII software (DNASTAR package). Homologies with Genbank entries were discovered using Blastx against the non-redundant (nr) protein database. Those sequences with expectation-values better than 1 &#215; 10<sup>-27 </sup>were further examined for the presence of an annotated ATG start codon at the 5' end of the open reading frame (ORF). This last step was accomplished using NCBI's ORF finder tool <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. Only those sequences whose Methionine aligned (including gaps) with the first amino acid of complete sequences were considered full-length transcripts.</p>
         </sec>
         <sec>
            <st>
               <p>Characterization of the ESTs</p>
            </st>
            <p>On thousand twenty-eight additional cDNA samples were chosen for sequencing based on the microarray results obtained by Eads et al. (submitted) and Shaw et al. (submitted, and in prep). The sequencing reactions were carried out as outlined above. All 1,648 sequence reads from this study, with their quality scores, were obtained from ABI sequencer data files using phred <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> with default parameter values. The reads were then processed by discarding low quality and vector sequences using Lucy v1.19p <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> with default parameter values, by removing poly-A tails using EMBOSS trimest <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> and by discarding sequences with lengths under 100 bases. The remaining high quality EST set was reduced to a non-redundant set of unique gene transcripts by clustering with phrap <abbrgrp><abbr bid="B66">66</abbr></abbrgrp> using the following parameters: mismatch penalty = -5; minimum match = 50; minimum score = 100. The resulting contigs and singlets that matched to mitochondrial gene transcripts (Genbank accession <ext-link ext-link-type="gen" ext-link-id="NC 000844">NC 000844</ext-link>) using Blastn were removed from subsequent analyses. To investigate whether the set of assembled sequences contain alternative transcripts of the same loci, the contigs and singlets were further clustered using the SeqManII software with the following relaxed parameters: match size = 12; maximum added gap length = 70; minimum percent match = 80; no gap penalty; gap length penalty = 0.70.</p>
            <p>The putative open reading frames (ORFs) for the assembled sequences were determined in three steps using Prot4EST v2.2 <abbrgrp><abbr bid="B67">67</abbr></abbrgrp> with DECODER having been disabled, ESTScan &#8211; which is an integral component of Prot4EST &#8211; and getorf from EMBOSS. The ORFs were selected during the first step when the assembled sequence translations aligned to proteins within the NCBI NR database with a Blastx e-value better than 1 &#215; 10<sup>-8</sup>. Failing this first step, the ORFs where selected during the second step using ESTScan or simply by recording the longest uninterrupted ORFs when they were located on the positive stands of the sequences. Otherwise, the longest ORFs were selected during step three, based on the results obtained by using the EMBOSS program that restricted sequence translations from the negative strand. This restriction was justified by observing only 3 ORFs on the negative strand from among 376 predictions from step one.</p>
            <p>Numerous sequence similarity searches were done for both the high quality EST set and the assembled sequences. First, queries were performed against the NCBI NR protein database (Genbank release 148) using a local installation of the WU-BLAST program <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>. The taxonomic domains were added to the results by parsing the taxa ID from the top match for each query and by retrieving the associated information from the NCBI <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>. Second, for a more confident assessment of whether the assembled sequences were shared with insects, they were compared to protein sequences archived in the NCBI UniGene sets for <it>Bombyx mori </it>(build #7), <it>Apis mellifera </it>(build #5), <it>Anopheles gambiae </it>(build #29) and <it>Drosophila melanogaster </it>(build #37) using tBlastx with an expectation threshold set at E &lt; 0.005. The same search was performed against the <it>Caenorhabditis elegans </it>(build #23) UniGene database to judge whether the differences can be attributed to gains or losses within the representative insects or crustacean. The assembled sequences were clustered based on the distribution of bit scores across the databases using self organizing maps followed by k-means clustering within 28 nodes in Cluster v2.11 <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>. Third, to further ascertain whether the assembled sequences can be aligned to known proteins, queries were made against all <it>Drosophila melanogaster </it>gene translations that are predicted by the annotation v4.2.1 of the genome sequence assembly and against the genome nucleotide sequences of <it>Drosophila melanogaster </it>and <it>Caenorhabditis elegans </it>using tBlastx.</p>
            <p>The assembled sequences were classified into gene ontology (GO)-defined functional classes using the program Blast2GO <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> and by extracting the GO annotations from FlyBase for sequences that strongly matched <it>D. melanogaster </it>gene transcripts. The putative gene annotations were examined for functional classes that are enriched within our lists of <it>Daphnia </it>genes compared to the total set of GO terms for all <it>Drosophila </it>genes using Gostat <abbrgrp><abbr bid="B71">71</abbr></abbrgrp> and by testing for the enrichment of GO terms within subsets of the assembled sequences using Fisher's Exact Test executed within Blast2GO.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JKC, BDE and JA conceived the study, designed and implemented the comparative analyses, and drafted the manuscript. JS and BDE performed the microarray experiments that guided the sequencing efforts, contributed the expression data and interpreted the results in light of this study. DB created the cDNA libraries. EB and JKC characterized the cDNA libraries and contributed EST sequences with BDE. All authors read and improved the final manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Note added to proof</p>
         </st>
         <p>The recently released Draft <it>Daphnia pulex</it> genome   sequence (July 7, 2007) suggests that Daphnia possess a single copy of   the Ferritin 1 gene, represented by Singlet 73 on the phylogenetic tree   (Figure <figr fid="F3">3</figr>).  </p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This project was financed by collaborative grants from the National Science Foundation (DEB 0221837 and FIBR 0328516) and by seed funds from The Center for Genomics and Bioinformatics, supported in part by the Indiana Genomics Initiative (INGEN) under the Lilly Endowment and by Shared University Research grants from IBM, Inc. to Indiana University. Computer support was provided by Phillip Steinbachs and Sumit Middha at The Center for Genomics and Bioinformatics and by Dick Repasky at the Indiana University Information Technology Services. We thank Kelley Thomas and Jim Haney (University of New Hamphire) for their contributions towards the construction of cDNA libraries. Don Gilbert (Indiana University) and Sumit Middha provided important bioinformatics support. This work benefits from, and contributes to the <it>Daphnia </it>Genomic Consortium.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Gene translocation links insects and crustaceans</p>
            </title>
            <aug>
               <au>
                  <snm>Boore</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Lavrov</snm>
                  <fnm>DV</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>NATURE</source>
            <pubdate>1998</pubdate>
            <volume>392</volume>
            <issue>6677</issue>
            <fpage>667</fpage>
            <lpage>668</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/33577</pubid>
                  <pubid idtype="pmpid" link="fulltext">9565028</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Mitochondrial genomes suggest that hexapods and crustaceans are mutually paraphyletic</p>
            </title>
            <aug>
               <au>
                  <snm>Cook</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Yue</snm>
                  <fnm>QY</fnm>
               </au>
               <au>
                  <snm>Akam</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>P ROY SOC B-BIOL SCI P ROY SOC B-BIOL SCI</source>
            <pubdate>2005</pubdate>
            <volume>272</volume>
            <issue>1569</issue>
            <fpage>1295</fpage>
            <lpage>1304</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1098/rspb.2004.3042</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Ecdysozoan phylogeny and Bayesian inference: first use of nearly complete 28S and 18S rRNA gene sequences to classify the arthropods and their kin</p>
            </title>
            <aug>
               <au>
                  <snm>Mallatt</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Garey</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Shultz</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Mol Phylogenet Evol</source>
            <pubdate>2004</pubdate>
            <volume>31</volume>
            <issue>1</issue>
            <fpage>178</fpage>
            <lpage>191</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ympev.2003.07.013</pubid>
                  <pubid idtype="pmpid" link="fulltext">15019618</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Hexapod origins: Monophyletic or paraphyletic?</p>
            </title>
            <aug>
               <au>
                  <snm>Nardi</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Spinsanti</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Boore</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Carapelli</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dallai</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Frati</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>299</volume>
            <issue>5614</issue>
            <fpage>1887</fpage>
            <lpage>1889</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1078607</pubid>
                  <pubid idtype="pmpid" link="fulltext">12649480</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The colonization of land by animals: molecular phylogeny and divergence times among arthropods</p>
            </title>
            <aug>
               <au>
                  <snm>Pisani</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Poling</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Lyons-Weiler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hedges</snm>
                  <fnm>SB</fnm>
               </au>
            </aug>
            <source>BMC Biol</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <issue>1</issue>
            <fpage>1</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">333434</pubid>
                  <pubid idtype="pmpid" link="fulltext">14731304</pubid>
                  <pubid idtype="doi">10.1186/1741-7007-2-1</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Hox genes and the evolution of the arthropod body plan</p>
            </title>
            <aug>
               <au>
                  <snm>Hughes</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>TC</fnm>
               </au>
            </aug>
            <source>Evol Dev</source>
            <pubdate>2002</pubdate>
            <volume>4</volume>
            <issue>6</issue>
            <fpage>459</fpage>
            <lpage>499</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1525-142X.2002.02034.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12492146</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Exploring embryonic germ line development in the water flea, Daphnia magna, by zinc-finger-containing VASA as a marker</p>
            </title>
            <aug>
               <au>
                  <snm>Sagawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yamagata</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shiga</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Gene Expression Patterns</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <issue>5</issue>
            <fpage>669</fpage>
            <lpage>678</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.modgep.2005.02.007</pubid>
                  <pubid idtype="pmpid" link="fulltext">15939379</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Pax3/7 genes reveal conservation and divergence in the arthropod segmentation hierarchy</p>
            </title>
            <aug>
               <au>
                  <snm>Davis</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>D'Alessio</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>NH</fnm>
               </au>
            </aug>
            <source>DEV BIOL</source>
            <pubdate>2005</pubdate>
            <volume>285</volume>
            <issue>1</issue>
            <fpage>169</fpage>
            <lpage>184</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ydbio.2005.06.014</pubid>
                  <pubid idtype="pmpid" link="fulltext">16083872</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Genetics of Daphnia</p>
            </title>
            <aug>
               <au>
                  <snm>Hebert</snm>
                  <fnm>PDN</fnm>
               </au>
            </aug>
            <source>Memorie dell'Instituto Italiano di Idrobiologia</source>
            <editor>Peters RH, Bernardi R</editor>
            <pubdate>1987</pubdate>
            <volume>45</volume>
            <fpage>439</fpage>
            <lpage>460</lpage>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Phylogenetic evidence for a single long-lived clade of crustacean cyclic parthenogens and its implications far the evolution of sex</p>
            </title>
            <aug>
               <au>
                  <snm>Taylor</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Crease</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>P ROY SOC LOND B BIO</source>
            <pubdate>1999</pubdate>
            <volume>266</volume>
            <issue>1421</issue>
            <fpage>791</fpage>
            <lpage>797</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1098/rspb.1999.0707</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Comparative developmental genetics and the evolution of arthropod body plans</p>
            </title>
            <aug>
               <au>
                  <snm>Angelini</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>TC</fnm>
               </au>
            </aug>
            <source>ANNU REV GENET</source>
            <pubdate>2005</pubdate>
            <volume>39</volume>
            <fpage>95</fpage>
            <lpage>119</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genet.39.073003.112310</pubid>
                  <pubid idtype="pmpid" link="fulltext">16285854</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The role of lineage-specific gene family expansion in the evolution of eukaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Lespinet</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>7</issue>
            <fpage>1048</fpage>
            <lpage>1059</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186617</pubid>
                  <pubid idtype="pmpid" link="fulltext">12097341</pubid>
                  <pubid idtype="doi">10.1101/gr.174302</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>A transcriptomic analysis of the phylum Nematoda</p>
            </title>
            <aug>
               <au>
                  <snm>Parkinson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mitreva</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Whitton</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Thomson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Daub</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schmid</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hall</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>McCarter</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Blaxter</snm>
                  <fnm>ML</fnm>
               </au>
            </aug>
            <source>NAT GENET</source>
            <pubdate>2004</pubdate>
            <volume>36</volume>
            <issue>12</issue>
            <fpage>1259</fpage>
            <lpage>1267</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1472</pubid>
                  <pubid idtype="pmpid" link="fulltext">15543149</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Comparative genomics of nematodes</p>
            </title>
            <aug>
               <au>
                  <snm>Mitreva</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Blaxter</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Bird</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>McCarter</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>TRENDS GENET</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>10</issue>
            <fpage>573</fpage>
            <lpage>581</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.08.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">16099532</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Signal sequence analysis of expressed sequence tags from the nematode Nippostrongylus brasiliensis and the evolution of secreted proteins in parasites</p>
            </title>
            <aug>
               <au>
                  <snm>Harcus</snm>
                  <fnm>YM</fnm>
               </au>
               <au>
                  <snm>Parkinson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fernandez</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Daub</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Selkirk</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Blaxter</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Maizels</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>6</issue>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">463072</pubid>
                  <pubid idtype="pmpid" link="fulltext">15186490</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Caenorhabditis elegans is a nematode</p>
            </title>
            <aug>
               <au>
                  <snm>Blaxter</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <issue>5396</issue>
            <fpage>2041</fpage>
            <lpage>2046</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.282.5396.2041</pubid>
                  <pubid idtype="pmpid" link="fulltext">9851921</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Fedorova</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Jackson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Jacobs</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Krylov</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Makarova</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Mazumder</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Mekhedov</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Nikolskaya</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>Rao</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Smirnov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sorokin</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Sverdlov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Vasudevan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Yin</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>2</issue>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Daphnia Genomics Consortium [http://daphnia.cgb.indiana.edu/]</p>
            </title>
            <aug>
               <au>
                  <cnm>DGC</cnm>
               </au>
            </aug>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Prediction of protein function and pathways in the genome era</p>
            </title>
            <aug>
               <au>
                  <snm>Gabaldon</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Huynen</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>CELL MOL LIFE SCI</source>
            <pubdate>2004</pubdate>
            <volume>61</volume>
            <issue>7-8</issue>
            <fpage>930</fpage>
            <lpage>944</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00018-003-3387-y</pubid>
                  <pubid idtype="pmpid" link="fulltext">15095013</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Orthologs, paralogs, and evolutionary genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>ANNU REV GENET</source>
            <pubdate>2005</pubdate>
            <volume>39</volume>
            <fpage>309</fpage>
            <lpage>338</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genet.39.073003.114725</pubid>
                  <pubid idtype="pmpid" link="fulltext">16285863</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <issue>17</issue>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research</p>
            </title>
            <aug>
               <au>
                  <snm>Conesa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gotz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Garcia-Gomez</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Terol</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Talon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Robles</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BIOINFORMATICS</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>18</issue>
            <fpage>3674</fpage>
            <lpage>3676</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti610</pubid>
                  <pubid idtype="pmpid" link="fulltext">16081474</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Cap 'n' collar B cooperates with a small Maf subunit to specify pharyngeal development and suppress Deformed homeotic function in the Drosophila head</p>
            </title>
            <aug>
               <au>
                  <snm>Veraksa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>McGinnis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>XL</fnm>
               </au>
               <au>
                  <snm>Mohler</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>McGinnis</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>2000</pubdate>
            <volume>127</volume>
            <issue>18</issue>
            <fpage>4023</fpage>
            <lpage>4037</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10952900</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>DSP1, an HMG-like protein, is involved in the regulation of homeotic genes</p>
            </title>
            <aug>
               <au>
                  <snm>Decoville</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Giacomello</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Leng</snm>
                  <fnm>M</fn