<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-5-96</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>FunnyBase: a systems level functional annotation of <it>Fundulus </it>ESTs for the analysis of gene expression</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Paschall</snm>
               <mi>E</mi>
               <fnm>Justin</fnm>
               <insr iid="I1"/>
               <email>jep67c@umkc.edu</email>
            </au>
            <au id="A2">
               <snm>Oleksiak</snm>
               <mi>F</mi>
               <fnm>Marjorie</fnm>
               <insr iid="I2"/>
               <email>mfoleksi@ncsu.edu</email>
            </au>
            <au id="A3">
               <snm>VanWye</snm>
               <mi>D</mi>
               <fnm>Jeffrey</fnm>
               <insr iid="I3"/>
               <email>jvanwye@rsmas.miami.edu</email>
            </au>
            <au id="A4">
               <snm>Roach</snm>
               <mi>L</mi>
               <fnm>Jennifer</fnm>
               <insr iid="I3"/>
               <email>jlroach@rsmas.miami.edu</email>
            </au>
            <au id="A5">
               <snm>Whitehead</snm>
               <fnm>J Andrew</fnm>
               <insr iid="I3"/>
               <email>awhitehead@rsmas.miami.edu</email>
            </au>
            <au id="A6">
               <snm>Wyckoff</snm>
               <mi>J</mi>
               <fnm>Gerald</fnm>
               <insr iid="I1"/>
               <email>wyckoffg@umkc.edu</email>
            </au>
            <au id="A7">
               <snm>Kolell</snm>
               <mi>J</mi>
               <fnm>Kevin</fnm>
               <insr iid="I1"/>
               <email>kolellk@umkc.edu</email>
            </au>
            <au id="A8" ca="yes">
               <snm>Crawford</snm>
               <mi>L</mi>
               <fnm>Douglas</fnm>
               <insr iid="I3"/>
               <email>dcrawford@rsmas.miami.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Division of Molecular Biology and Biochemistry, 5100 Rockhill Rd., University of Missouri-Kansas City 64110, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Environmental &amp; Molecular Toxicology, North Carolina State University; Raleigh, NC 27695-7633 USA</p>
            </ins>
            <ins id="I3">
               <p>Division of Marine Biology and Fisheries, NIEHS Marine and Freshwater Biomedical Sciences Center, Rosenstiel School of Marine &amp; Atmospheric Science, University of Miami, Miami, FL 33149, USA</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2004</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>96</fpage>
         <url>http://www.biomedcentral.com/1471-2164/5/96</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15610557</pubid>
               <pubid idtype="doi">10.1186/1471-2164-5-96</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>17</day>
               <month>8</month>
               <year>2004</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>20</day>
               <month>12</month>
               <year>2004</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>20</day>
               <month>12</month>
               <year>2004</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2004</year>
         <collab>Paschall et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>While studies of non-model organisms are critical for many research areas, such as evolution, development, and environmental biology, they present particular challenges for both experimental and computational genomic level research. Resources such as mass-produced microarrays and the computational tools linking these data to functional annotation at the system and pathway level are rarely available for non-model species. This type of "systems-level" analysis is critical to the understanding of patterns of gene expression that underlie biological processes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We describe a bioinformatics pipeline known as <it>FunnyBase </it>that has been used to store, annotate, and analyze 40,363 expressed sequence tags (ESTs) from the heart and liver of the fish, <it>Fundulus heteroclitus</it>. Primary annotations based on sequence similarity are linked to networks of systematic annotation in Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) and can be queried and computationally utilized in downstream analyses. Steps are taken to ensure that the annotation is self-consistent and that the structure of GO is used to identify higher level functions that may not be annotated directly. An integrated framework for cDNA library production, sequencing, quality control, expression data generation, and systems-level analysis is presented and utilized. In a case study, a set of genes, that had statistically significant regression between gene expression levels and environmental temperature along the Atlantic Coast, shows a statistically significant (P &lt; 0.001) enrichment in genes associated with amine metabolism.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The methods described have application for functional genomics studies, particularly among non-model organisms. The web interface for <it>FunnyBase </it>can be accessed at <url>http://genomics.rsmas.miami.edu/funnybase/super_craw4/</url>. Data and source code are available by request at <email>jpaschall@bioinfobase.umkc.edu</email>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Investigating patterns of gene expression using mouse and human microarrays has produced insights into cancer <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>, cardiac diseases <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>, and metabolic disorders <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. These and many other functional genomics studies rely on full genomic sequence to establish well-annotated databases. Yet, microarrays based on EST collections are increasingly being used for diverse species, from honey bees to fish <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp> and including simple diploblastic organisms <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. These studies within a diversity of organisms provide insights not provided by 'model' species (species that are genetically well defined or with annotated genomes <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>). For example, 'non-model' organisms have provided insight into the natural variation in gene expression <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, social castes among bees <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>, hypoxia <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, and physiological responses to variation in the thermal environment <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. To investigate adaptive variation in gene expression we use the teleost <it>Fundulus heteroclitus </it>(killifish) <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B29">29</abbr></abbrgrp>.</p>
         <p>The killifish <it>Fundulus heteroclitus </it>are distributed along the eastern coast of North America which has one of the steepest thermal clines in the world: northern populations have environmental temperatures more than 12&#176;C below southern populations across 12 degrees of latitude. Migration among populations is sufficient to minimize random genetic drift <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> but not frequent enough to extinguish local adaptation <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. Populations are large (>10,000) and affected by historical, demographic and selective constraints, providing a framework for the partitioning of variation in gene expression within and among populations. Additionally, the well-established phylogenetic relationship among <it>Fundulus </it>species can be used to discern adaptive changes <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. These characteristics make <it>F. heteroclitus </it>an ideal species to investigate adaptive variation in gene expression.</p>
         <p>Microarrays from diverse EST collections offer opportunities to address many biological problems, but to effectively use this information often requires a locally generated bioinformatics approach. Tools like the TIGR Gene index <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and Unigene <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> provide significant information on many species, yet these databases do not meet the needs of functional genomics projects for many non-model species. Currently, TIGR and NCBI provide gene indices for 28 and 23 animal species, respectively. Yet, there are 63 animal species with more than 10,000 ESTs <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. The number of species with ESTs >10,000 has continued to grow, and there was approximately a 20% increase in the preceding three months. While annotation from these resources can be accessed through web-based homology searches, for many laboratory collections of ESTs it is difficult to use existing tools to achieve a systems-level view of gene functions and relationships. Rather than simply browsing functional information over the web for a different group's project, laboratories that produce novel EST collections and microarrays require customized databases providing access to integrated functional annotation as expression data are being analyzed.</p>
         <p>We have developed <it>FunnyBase </it>to meet these functional genomics needs. <it>FunnyBase </it>provides functional information for >40,000 ESTs from the teleost fish <it>Fundulus heteroclitus</it>, provides the means to quickly process, evaluate, and store annotation based on similarity searches of public resources, and integrates these data with species-specific clustering and microarray analysis. Perhaps ironically, the greatest challenge for functional annotation based on similarity searches is an overabundance of data. There are a number of databases to chose from, and often the single best hit from a given database search is not the most informative. <it>FunnyBase </it>implements a strategy to make maximum use of systems-level functional information from Gene Ontology (GO) <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> assignments and membership in metabolic pathways as defined by the Kyoto Encyclopedia of Genes and Genomes (KEGG) <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Specifically, several sequence databases are queried and results integrated to maximize the number of annotated sequences. Alignments and scores for all homology based associations are tracked, allowing further evaluation and statistical studies.</p>
         <p>Microarray data using genes annotated in <it>FunnyBase </it>can be systematically analyzed in the context of biological functions. We present a case study to illustrate how assessment of systems-level annotation can identify statistically significant functional differences among sets of genes.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p><it>FunnyBase </it>(Fig. <figr fid="F1">1</figr>) is divided into 3 modules: Sequence Pipeline, Hierarchical Annotation, and Microarray Production and Analysis. The Sequence Pipeline takes sequences and quality output files from the sequencer, applies vector screening, quality trimming, clone tracking, and clustering (described below) to produce a set of unique sequences that are deposited in the 'Sequence Data' and 'Cluster Data' tables. Notice "unique sequences" are a combination of singletons (single unique ESTs) and clusters of overlapping sequences.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p><it>FunnyBase </it>annotation scheme</p>
            </caption>
            <text>
               <p><it>FunnyBase </it>annotation scheme. The integration of the three <it>FunnyBase </it>modules: sequence pipeline, hierarchical annotations and microarray production and analysis. Database tables are shown in cylinders, arrows are data flow, and dashed lines indicate the integration of data from multiple sources.</p>
            </text>
            <graphic file="1471-2164-5-96-1"/>
         </fig>
         <p>The Hierarchical Annotation module uses the consensus sequences from the clusters or singletons, and integrates primary annotation such as gene name and description with associated pathways and systems-level functional annotation. This may include gene function (e.g., enzyme catalyst), metabolic or signal pathway (e.g., oxidative phosphorylation), or biological function (e.g., protein translation). Sequence data from the first module and functional annotation from the second are matched using database similarity searches (BLASTX and BLASTN). E-values, bit scores and local alignments are stored in the 'Similarity Data' table for all significant matches. One of the strengths of <it>FunnyBase </it>is the use of different sequence databases (SwissProt <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, NCBI UniGene <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, and NCBI non-redundant NR <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>) to provide separate annotations. Although these databases are not completely independent, the three separate annotations provide verification of gene names.</p>
         <p>The third module, Microarray Production &amp; Analysis provides a list of unique genes to be printed and integrates expression data from microarray experiments with the Hierarchical Annotation module. This provides functional annotation for expression data. <it>FunnyBase </it>annotation is accessible through the web or through local SQL queries and data-mining scripts.</p>
         <sec>
            <st>
               <p>EST isolation and sequencing</p>
            </st>
            <p>The overall strategy used to isolate and sequence thousands of <it>Fundulus </it>cDNAs was (1) generate a high quality unidirectional cDNA library, (2) normalize the library, (3) randomly pick colonies and amplify by PCR the cDNA within the vector, (4) sequence and identify PCR products, and (5) after approximately every 1,000 clones, subtract these from the normalized library and repeat steps 3&#8211;5. Details for all protocols are provided at <url>http://crawford.rsmas.miami.edu/</url> and were used in the Comparative Functional Genomic course at Mount Desert Island, ME 2000.</p>
            <p>We sequenced 46,433 ESTs and 40,043 of these are available in the dbEST database at NCBI (dbEST identification numbers: 23,480,307 to 23,515,306; 23,520,047 to 23,525,409 and 24,320,128 to 24,320,184) as of June 26, 2004. Sequences in <it>FunnyBase </it>are identified by a number series: unique sequence number, array number, plate number and well identifier (example: 23434_125_001_H04). The remaining 6,966 un-submitted sequences failed to meet one of the sequence quality parameters. Two criteria are used for defining "good" sequences: 1) >100 bp of sequence with Phred score >20 or 2) form an overlapping cluster with other sequences. Of the 19,937 sequences processed with the current version of <it>FunnyBase</it>, 17,893 (90%) passed one of these quality parameters. In earlier iterations of <it>FunnyBase</it>, visual inspection and later a sequencer-specific quality measure equivalent to a Phred score of 15 were used as filters resulting in 3,603 of the first 5,760 sequences (63%) and 13,922 of the next 15,168 sequences (92%) meeting quality standards, respectively. Re-sequences account for 5,668 sequences, and 4,625 of these were submitted.</p>
         </sec>
         <sec>
            <st>
               <p>Controls</p>
            </st>
            <p>One of the most important steps for producing microarrays from cDNA libraries is being able to associate the bacteria containing the cDNAs of interest with the EST annotation. High-throughput procedures are highly prone to tracking errors including: loading plates into an automatic sequencer in the wrong order, orienting symmetric plates in the wrong direction, or mislabeling of plates. The ability to identify these types of mistakes requires controls for identifying plates and plate orientation. The <it>FunnyBase </it>system has a number of integrated quality control steps. First, a <it>Ctenophore </it>cDNA (NCBI: accession number: CN992733) that is unlike anything else in GenBank is used as a control. Controls are placed in 96-well plates in wells corresponding to the plate number and two orientation wells (A5 and F9). Sequences from 96-well plates are automatically scanned for these controls so that the identity and orientation are confirmed and a report is generated for manual review. Secondly, all clones used for microarray production are re-sequenced. This is necessary because individual cDNAs are cherry picked, re-grown and re-amplified, and each of these steps has the potential to introduce or magnify an error. For example, for a 6,000 gene array, a 5% error rate would result in 300 incorrect clones. Re-sequenced array plates are compared using pair-wise BLAST <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> against previous sequencing results so that the identity of printed microarray spots are verified.</p>
         </sec>
         <sec>
            <st>
               <p>EST clustering</p>
            </st>
            <p>EST projects generate a number of redundant sequences due to the random selection of cDNAs from tissue libraries (Table <tblr tid="T1">1</tblr>). Clustering redundant sequences is a critical first step of analysis in order to identify genes to target for subtraction. The program CAP3 by Xiaong Huang <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> was used to cluster EST sequences with a 30 bp overlap and 75 percent similarity.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>The Ten Most Frequently Annotated ESTs. Clusters with the greatest number of annotated ESTs, the sequence id, number of ESTs that cluster together, and the e-value (probability of similarity) are listed.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>
                           <b>ID</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Number of ESTs</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Evalue</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Description</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>1616</p>
                     </c>
                     <c ca="center">
                        <p>977</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>Vitellogenin I precursor</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>2262</p>
                     </c>
                     <c ca="center">
                        <p>734</p>
                     </c>
                     <c ca="left">
                        <p>2e-88</p>
                     </c>
                     <c ca="left">
                        <p>Cytochrome c oxidase polypeptide II</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>1348</p>
                     </c>
                     <c ca="center">
                        <p>507</p>
                     </c>
                     <c ca="left">
                        <p>1e-102</p>
                     </c>
                     <c ca="left">
                        <p>Alpha-1-antitrypsin homolog precursor</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>1026</p>
                     </c>
                     <c ca="center">
                        <p>481</p>
                     </c>
                     <c ca="left">
                        <p>5e-61</p>
                     </c>
                     <c ca="left">
                        <p>Zona pellucida sperm-binding protein 3 precursor</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>1727</p>
                     </c>
                     <c ca="center">
                        <p>401</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>Serotransferrin precursor</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>555</p>
                     </c>
                     <c ca="center">
                        <p>397</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>Cytochrome c oxidase polypeptide I</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>640</p>
                     </c>
                     <c ca="center">
                        <p>369</p>
                     </c>
                     <c ca="left">
                        <p>8e-61</p>
                     </c>
                     <c ca="left">
                        <p>Zona pellucida sperm-binding protein B precursor</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>1549</p>
                     </c>
                     <c ca="center">
                        <p>351</p>
                     </c>
                     <c ca="left">
                        <p>2e-77</p>
                     </c>
                     <c ca="left">
                        <p>Apolipoprotein A-I precursor</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>570</p>
                     </c>
                     <c ca="center">
                        <p>331</p>
                     </c>
                     <c ca="left">
                        <p>1e-111</p>
                     </c>
                     <c ca="left">
                        <p>Cytochrome c oxidase polypeptide III</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>1178</p>
                     </c>
                     <c ca="center">
                        <p>304</p>
                     </c>
                     <c ca="left">
                        <p>2e-45</p>
                     </c>
                     <c ca="left">
                        <p>ATP synthase a chain</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p><it>FunnyBase </it>contains a total of 40,043 EST sequences from <it>F. heteroclitus </it>heart and liver. Clustering with CAP3 yields 3,776 clusters that contain 30,688 ESTs (77%). The remaining 8,991 ESTs (23%) are singletons. By storing the results of clustering with annotation, <it>FunnyBase </it>easily identifies these genes and aids in the selection of genes to be used for library subtraction with the goal of picking less common transcripts.</p>
            <p>The 10 annotated clusters with the most sequences are listed in Table <tblr tid="T1">1</tblr>. In microarray experiments these genes tend to be highly expressed and the fluorescent signal tends to saturate the photomultiplier tube. These genes also serve to verify microarray printing because the predicted spots for these genes have the strongest signal.</p>
            <p>The distribution of the number of clusters with two or more ESTs is depicted in Figure <figr fid="F2">2</figr>. Although most clusters (2,581 or 68%) in <it>FunnyBase </it>have two or three-to-four sequences (Fig. <figr fid="F2">2</figr>), a small number of highly expressed genes form clusters with a large number of ESTs. For example, there are three clusters that contain 512 to 1,024 ESTs and ten clusters that contain 256 to 512 ESTs (bottom two bars for killifish in Fig. <figr fid="F2">2</figr>). This distribution is similar to other teleost fish EST collections (Fig <figr fid="F2">2</figr>). Notice, as more ESTs are added, clusters tend to get larger (more ESTs per cluster) rather than new small clusters growing in frequency. Of the 3,779 killifish clusters, 14% have more than eight ESTs, yet of the 14,714 Medaka clusters, 48% have eight or more ESTs. These distributions suggest that adding more EST sequences has diminishing returns.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Frequency of cluster size class in teleosts</p>
               </caption>
               <text>
                  <p>Frequency of cluster size class in teleosts. The frequency of the number of ESTs in each cluster is shown for <it>Fundulus heteroclitus </it>(Killifish: total number of clusters 3,779), <it>Oryzias latipes </it>(Medaka: total number of clusters 7,401), <it>Oncorhynchus mykiss </it>(Rainbow Trout: total number of clusters 11,405) and <it>Danio rerio </it>(Zebrafish: total number of clusters 14,714). For example, the first bar indicates that approximately 1,000 clusters contain 2 ESTs in all four teleost fish. Clusters for other species (not <it>F. heteroclitus</it>) are based on NCBI UniGene.</p>
               </text>
               <graphic file="1471-2164-5-96-2"/>
            </fig>
            <p>One of the objectives of EST projects is to isolate most, if not all, genes expressed in a tissue or organism. The increasing size of larger clusters with more sequencing efforts indicates that strategies to increase the probability of isolating new genes need to be employed. We used two strategies. First, we normalized the library to reduce the differences among expressed genes to less than 10-fold among rare and abundant mRNAs <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. Using this technique we were able to reduce redundancy in annotated genes from 33% in the non-normalized library to 11% after normalization. Second, we targeted specific sequences for subtraction: annotated cDNAs with high frequencies were targeted in order to focus effort on picking new, rare sequences. Through subtraction we were able to increase the rate of discovery of new annotatable sequences from 24% to 36%. However, analysis of these results indicate that a set of highly expressed sequences, some of which were not subtracted because they were not in the set of annotated genes, still make up much of the EST library and should be the focus of future subtractions.</p>
         </sec>
         <sec>
            <st>
               <p>Gene annotation</p>
            </st>
            <p>Of the 12,776 unique ESTs (3,776 clusters and 8,991 singletons), 3,877 (30%) were annotated. The distribution of e-values for these annotations is shown in Figure <figr fid="F3">3</figr>. Most (84%) of these ESTs have e-values less than 10<sup>-5</sup>. Among the clusters, 2,265 of 3, 779 (60%) were annotated as compared to 333 of 1,131 (30%) of confirmed high quality singletons. The lower percent of annotated singletons suggests that these are either rare fish-specific genes, or represent otherwise divergent, likely non-coding (5' or 3' UTR), regions.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Distribution of e-values for annotations</p>
               </caption>
               <text>
                  <p>Distribution of e-values for annotations. Gene annotation is based on sequence similarity. The e-values, which describe the probability of random sequence similarity, are shown as the negative log value (e.g., 10<sup>-5 </sup>= 5).</p>
               </text>
               <graphic file="1471-2164-5-96-3"/>
            </fig>
            <p>Annotations for ESTs are based on similarity using BLASTX or BLASTN <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> to sequences in one of six-public databases: Swiss-Prot, Human UniGene, <it>Danio rerio </it>UniGene, <it>Oncorhynchus mykiss </it>Unigene, <it>Oryzias latipes </it>Unigene, or GenBank NR. <it>FunnyBase </it>includes locally parsed copies of these databases in a relational format. Thus, all searches are done locally and annotation features beyond the FASTA description can be queried. Consensus sequences from the <it>Fundulus </it>EST clusters as well as high quality single unique sequences (singletons) are used as query sequences for BLAST searches against these public databases. The use of consensus sequences, when available, allows sequences that do not contain regions of significant similarity with known protein (e.g., 5' or 3' noncoding regions) to be annotated if they are members of an annotated cluster. All hits with e-value less than 0.001 and their associated alignments are stored in the database and tracked with any associated functional annotation. Users can specify a custom level of significance when assessing the validity of homology based annotation. This record, which goes beyond storing a certain number of 'best hits', is critical because in many cases additional results may have a negligibly lower alignment score, but provide much more useful functional data.</p>
            <p>The use of multiple databases increases the total number of annotated ESTs (Fig. <figr fid="F4">4</figr>) as compared with any one source and provides opportunity to compare annotation between all three sources for 1,841 (47%) sequences. GenBank NR provided the most number of annotations, but these tend to be less informative (see <it>systematic functional annotations </it>below). Human Unigene provided an additional 311 (8% of total) annotations. SwissProt provided an additional 32 annotations (1% of total) with 743 fewer annotated sequences than the NR. However, SwissProt is uniformly well annotated as compared to NR where informative functional annotation can easily be buried by numerous uninformative hits at similar e-values. Besides increasing the number of annotations, comparing the annotations from multiple databases ensures that mistakes in the curation can be detected and information such as alternative gene names can be compiled from multiple sources.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Venn diagram of annotations from three different databases</p>
               </caption>
               <text>
                  <p>Venn diagram of annotations from three different databases. Number of unique ESTs annotated by three different databases: GenBank non-redundant (NR, total annotated = 3,534), Swiss-Prot (total annotated = 2,829), and human Unigene (total annotated = 2,390).</p>
               </text>
               <graphic file="1471-2164-5-96-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Systematic functional annotation: KEGG and Gene Ontology</p>
            </st>
            <p>In conjunction with performing similarity searches by BLAST, <it>FunnyBase </it>includes locally parsed representations of public databases such as SWISS-PROT in a relational database format. These databases provide additional information that cross-references other public resources such as GO, KEGG or OMIM <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> that are not available in the single FASTA description line returned by BLAST search.</p>
            <p>KEGG <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> is a unique tool that represents metabolic and signal-transduction pathways both visually and computationally. <it>FunnyBase </it>links annotated genes to enzymes in KEGG pathways based on enzyme commission (EC) numbers. These pathway associations are stored and queries can readily identify genes from a given pathway that show specific patterns of expression. For visual inspection of the pathway, the web interface <url>http://genomics.rsmas.miami.edu/funnybase/super_craw4/</url> links directly to the graphical KEGG pathways in which a gene occurs.</p>
            <p>Of the 3,877 annotated ESTs in <it>FunnyBase</it>, 588 (14%) participate in one or more pathways defined by KEGG. These 588 ESTs represent 105 different pathways. Table <tblr tid="T2">2</tblr> provides a breakdown of the number of ESTs in <it>FunnyBase </it>for the 10 pathways associated with the largest number of distinct sequences (contigs or singletons). The extent that a given pathway is represented in <it>FunnyBase </it>can be used to identify metabolic differences among tissues <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> or in different species.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Number of distinct sequences in the Top 10 most common KEGG pathways. The KEGG pathway name and number of distinct sequences (clusters or singletons) from <it>FunnyBase </it>are presented. There are more distinct sequences than enzymes in a pathway because many enzymes have several protein subunits and many proteins have several different loci encoding the same subunit (e.g., NADH dehydrogenase, a protein complex of oxidative phosphorylation, has 26 protein subunits and 42 loci for these subunits).</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c cspan="2" ca="center">
                        <p>
                           <b>Sequence count for TOP 10 pathways</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Glycolysis/Gluconeogenesis</p>
                     </c>
                     <c ca="left">
                        <p>89</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Oxidative phosphorylation</p>
                     </c>
                     <c ca="left">
                        <p>86</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Fatty acid metabolism</p>
                     </c>
                     <c ca="left">
                        <p>70</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Pyruvate metabolism</p>
                     </c>
                     <c ca="left">
                        <p>69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tryptophan metabolism</p>
                     </c>
                     <c ca="left">
                        <p>66</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Butanoate metabolism</p>
                     </c>
                     <c ca="left">
                        <p>48</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Glycerolipid metabolism</p>
                     </c>
                     <c ca="left">
                        <p>47</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Valine, leucine and isoleucine degradation</p>
                     </c>
                     <c ca="left">
                        <p>46</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Glycine, serine and threonine metabolism</p>
                     </c>
                     <c ca="left">
                        <p>44</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Propanoate metabolism</p>
                     </c>
                     <c ca="left">
                        <p>44</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>The Gene Ontology project (GO) has produced a structured vocabulary in the form of an acyclic directed graph that biologists can use to annotate genes in a systematic manner <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. <it>FunnyBase </it>includes two non-trivial steps to make the best possible use of GO terms. First, many GO annotations are lost if only the single 'best hit' from a homology search is considered because GO annotation is applied most often to a few model species such as human that may not appear as the single 'best hit' in a list of BLAST results. <it>FunnyBase </it>identifies the gene name associated with the 'best hit' BLAST result and then uses all GO annotation associated with hits from the complete BLAST results that have the same gene name as the 'best hit' and an e-value of e &lt; 10<sup>-12</sup>. The goal of this approach is to identify annotation associated with a single 'best hit' gene based on results that may come from multiple species (orthologous genes) and therefore may have varying degrees of sequence similarity due to phylogenetic distance, but to avoid the problem of selecting an inconsistent set of GO terms arising from gene families that share regions of sequence similarity but may have different functions.</p>
            <p>Secondly, GO annotation in public databases tends to annotate sequences with only the most specific GO term available, for example <it>RNA polymerase II transcription factor activity</it>, <it>enhancer binding </it>(GO:0003705) rather than the more general parent term <it>transcription regulator activity </it>(GO:0030528). However, in functional genomic analysis, significant patterns of expression may exist at the more general level of functional description. <it>FunnyBase </it>takes advantage of the connected parent-child relationship of GO terms provided by using the relational database version of GO available for download at <url>http://www.geneontology.org</url> to identify such relationships. These data are used to extract the tree of more general GO terms related to those provided by public databases. A <it>FunnyBase </it>script then re-annotates genes with this more complete set of GO terms.</p>
            <p>Of the 3,877 annotated genes, 1,912 (54%) are assigned one or more GO terms with a total 6,728 GO assignments being made directly based on information in public databases such as SwissProt. Using parent-child GO term relationship backtracking, an additional 34,112 GO term assignments were made, resulting in a final count of GO assignments of 36,024 excluding the most general terms that divide GO into three categories. Thus, on average, 19 GO terms are assigned to each of 1,912 annotated genes.</p>
         </sec>
         <sec>
            <st>
               <p>Gene scaffolding: clustering of clusters</p>
            </st>
            <p>Humans have approximately 30,000 expressed genes, yet there are over 1,000,000 human UniGenes (NCBI). Clearly, these clusters of cDNAs greatly overestimate the number of unique genes. Similarly, FunnyBase has multiple clusters for the same gene: 15 apolipoprotein I, 10 cytochrome oxidase I, and 53 vitellogenin clusters. To provide a more precise estimate of the number of unique genes, consensus sequences were queried against the 27,695 sequences from the Human RefSeq <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> database, then grouped by identical gene symbol. Of the 2,376 <it>Fundulus </it>clusters that were similar to a sequence in Human RefSeq (e-value &lt; 10<sup>-10</sup>), 1,818 (76%) had distinct gene annotations. This method of clustering clusters by similarity to well-annotated reference sequences provides a method to more accurately define the number of unique genes represented by an EST set.</p>
         </sec>
         <sec>
            <st>
               <p>Case study: using functional annotation for microarray analysis</p>
            </st>
            <p>As a case study in how functional annotation in <it>FunnyBase </it>can be integrated with microarray data in a rigorous manner, we used a data set based on a microarray of metabolic genes printed from ESTs annotated in <it>FunnyBase </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Statistical analysis of this set of 363 metabolic genes identified a set of 62 genes that showed statistically significant regression between gene expression levels and temperature along the Atlantic coast. That is, among individuals collected from different locations along the thermocline and then acclimated to common physiological conditions for at least nine months before analysis, 17% of the metabolic genes had a linear relationship between the amount of mRNA and the environmental temperature these animals evolved in. Our hypothesis was that this set of 59 genes represents a functionally different set than those genes that do not show regression with temperature. To test this hypothesis we examined the frequency of genes annotated with a given GO term in the statistically significant gene set versus the non-significant genes. Figure <figr fid="F5">5</figr> shows the relevant proportions in each set for GO terms that are represented by 5 or more ESTs in the significant set. For example, the GO term <it>Amine Metabolism </it>(GO:0009308) is assigned to 14% of the 62 statistically significant genes but only 3% of the non-significant genes (those that do not show significant regression with temperature). A Fisher-exact test indicates these frequencies (14% vs. 3%) represent different underlying distributions (p &lt; 0.001). Specifically, genes involved in amine metabolism are overrepresented in the set of genes that show regression with temperature as compared with the remaining sequences. This significant increase is found for two other non-mutually exclusive GO terms: <it>amino acid and derivative metabolism</it>, and <it>amino acid metabolism </it>(p &lt; 0.05). Other GO terms show a reverse trend although none were statistically significant. For example, ion transport (p = 0.08) and cell growth (p = 0.16) had few genes with a clinal variation in expression. These data suggest that the functions of genes influence whether they are affected by ecologically interesting patterns of expression (Fig <figr fid="F5">5</figr>).</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Distribution of significant and non-significant genes relative to GO terms</p>
               </caption>
               <text>
                  <p>Distribution of significant and non-significant genes relative to GO terms. The relationship between gene expression from "common gardened" fish and the environment they evolved in was statistically analyzed and grouped by GO terms. Black bars represent genes whose levels of expression has a significant regression with the environmental thermal cline among populations (p &lt; 0.05). Hatched bars represent the set of genes with no significant relationship to the thermal environment.</p>
               </text>
               <graphic file="1471-2164-5-96-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Web interface</p>
            </st>
            <p>The web interface <url>http://genomics.rsmas.miami.edu/funnybase/super_craw4/</url> provides public access to the <it>FunnyBase </it>system and dataset. Searches can query by keyword in annotation, gene name, GO term, metabolic pathway, clone or plate id, and BLAST homology search. All data including raw sequences, cluster memberships, cluster alignments, and alignments with homologous sequences are provided for the user to examine the source of annotations. Links associated with each annotation are made to external resources such as GO's AMIGO browser, KEGG pathways, SwissProt, and NCBI records.</p>
         </sec>
         <sec>
            <st>
               <p>Other Fundulus sequences</p>
            </st>
            <p><it>FunnyBase </it>was constructed to annotate sequences for the analysis of gene expression. It provides identification and annotation for genes in the Crawford laboratory with a primary goal of identifying clones useful for the construction of microarrays. As such, other <it>Fundulus </it>sequences in Genbank are not included. However, <it>FunnyBase </it>forms the basis of the TIGR Killifish gene index <url>http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species=killifish</url> that includes publicly available <it>F. heteroclitus </it>sequences.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>Customized species specific EST databases are available for many species <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B21">21</abbr><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>. <it>FunnyBase </it>provides an integrated method to annotate ESTs with the most biologically relevant set of associations and provides several innovations for the production of ESTs for microarrays. Control sequences are identified in each 96-well plate so that mislabelled or inverted plates are automatically detected. Annotations are based upon several different public databases. The multiple annotations provide greater assurance about gene description and greater frequency of annotation than any one database. The most functionally informative innovation of <it>FunnyBase </it>is the process of culling through numerous primary similarity search results in order to identify links to systematic functional databases in GO and KEGG. These provide a discrete set of terms that can be analyzed statistically and that are organized into networks that represent biological knowledge of higher-level functional and pathway associations. The range of databases queried by similarity search and the tracking of homology beyond a single 'best' hit maximizes the opportunity to obtain this annotation. A richer set of GO terms is achieved by using all hits with e-values less than 10<sup>-11 </sup>that represent the same gene as the 'best hit'. Additional GO terms that represent more general functions than those found in public annotation are derived through the parent-child relationship of the Gene Ontology. EC numbers provide links, <it>via </it>KEGG, to metabolic pathways and these stored terms can be used to investigate the relationship between gene expression in specific metabolic pathways including cardiac metabolism <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. To provide a more accurate accounting of the number of unique genes, consensus sequence from clusters of ESTs were queried against the Human RefSeq database and those sequences sharing the same gene symbol are grouped based on this scaffolding information. These approaches use publicly available bioinformatics tools (BLAST, CAP3, Phred, Cross-Match, Perl, and the MySQL database management system). The application of theses tools in an appropriate framework as outlined in <it>FunnyBase </it>can be used to create a systems level functional genomics annotation system useful for EST databases to study biological processes among a rich diversity of organisms.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Organism</p>
            </st>
            <p>The animal protocols used in the present study have been approved by the University of Miami Institute Animal Care and Use Committee. The teleost fish <it>Fundulus heteroclitus </it>used for ESTs were collected from two sites: Scorton Creek in Sandwich, MA, and Stone Harbor, NJ. These populations are in the central portion of the thermal cline and have relatively high levels of heterozygosity <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. These fish were subjected to the following environmental regime before tissues were harvested for mRNA extraction: kept in controlled temperature and aeration conditions, and acclimated to common conditions (20&#176;C, 15 ppt salinity) in re-circulating aquaria for at least nine months before experiments. Following this common acclimation a subset of fish were subjected to one of several stresses: 4&#176;C, 34&#176;C, hypoxia, or a complex mix of hydrocarbons.</p>
         </sec>
         <sec>
            <st>
               <p>cDNA library</p>
            </st>
            <p>To effectively isolate and sequence thousands of cDNAs for the production of microarrays, a unidirectional cDNA library with few non-recombinants was required. We created four cDNA libraries: heart libraries from non-stressed and stressed fish and liver libraries from non-stressed and stressed fish. The non-stressed <it>F. heteroclitus </it>cardiac and liver libraries were provided by Drs. S. Karchner and M. Hahn, WHOI <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> and were constructed using the UniZap &#955; cDNA Gigpack Gold cloning kit (Stratagene, La Jolla, CA, USA). The cardiac library was produced from 27 fish hearts (both sexes) sampled from Scorton Creek in Sandwich, MA. The cDNAs in these libraries are oriented such that the 5'end of each cDNA is ligated to EcoR1 and 3' poly A is ligated to XhoI. These libraries had less than 1% non-recombinants, i.e. 2 of 300 random clones from a non-normalized library had no inserts. The stressed libraries included 4 fish subjected to the four stressors (above) and 4 non-stressed individuals. Unidirectional heart and liver libraries were constructed such that the 5'end of each cDNA is ligated to EcoR1 and 3' poly A is ligated to XhoI of the plasmid vector pSmart (Lucigen, Middleton, WI, USA). The pSmart-cDNA vector was designed for EST work. The vector expresses kanamycin-resistance and has a terminator on both sides of the cDNA insertion site preventing expression of cDNA. These two attributes (non-expression and Kan-resistance) increase the stability of different genes in the library versus cDNA libraries in Amp libraries with Lac promoters (Crawford, unpublished). These libraries had less than 1% non-recombinants.</p>
            <p>Normalization of cDNA libraries reduces the differences among expressed genes to less than 10-fold among rare and abundant mRNAs <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. Normalized libraries were produced by isolating cDNAs from approximately 10<sup>12 </sup>plasmids. The cDNAs were isolated using PCR amplification with vector specific primers immediately 5' and 3' to the insertion site (EcoRI and XhoI sites). These PCR products (PCR-cDNAs) were denatured and hybridized to single stranded plasmids from the cardiac cDNA library. Taking advantage of Cot values, the most abundant cDNAs were annealed to the more abundant PCR products and were removed selectively by hydroxyapatite-column chromatography. The single-stranded plasmids in the flow-through were converted to double strands using the Sequenase DNA polymerase (Amersham, Piscataway, NJ, USA). DH10s <it>E. coli </it>(BRL) were transformed with these double-stranded plasmids by electroporation. The number of recovered plasmids and the resulting complexity of the normalized library depended on the duration of hybridization or Cot values. Two normalized libraries were made using either a 12 or 24 hour hybridization. The library from the 12-hour hybridization yielded 250,000 plasmids. The library from the 24 hour hybridization yielded 3,000 plasmids and had a greater representation of rare mRNAs and greater frequency of non-recombinants.</p>
         </sec>
         <sec>
            <st>
               <p>Isolation and sequencing of cDNAs</p>
            </st>
            <p>Characterization of cDNAs (growth of individual bacterial colonies containing plasmids, PCRs, purification of PCR products, sequencing reactions) used 96 well plates and octopipettes. To characterize cDNAs, 96 individual bacterial colonies from the normalized library were randomly chosen, and each was grown in 1.25 ml of Superbroth in 2 ml-96 well plates. After 18 hours of growth, two 250 ul bacterial glycerol stocks were made and stored in 96 well plates at -80&#176;C. One microliter of these bacterial growths was used for PCR reactions using forward and reverse plasmid specific primers: (PucF = CGCCAGGGTTTTCCCAGTCACG, PucR = GAGCGGATAACAATTTCACACAGGAAA). PCR reactions had 0.2 mM dNTPs, 10 pmoles of each primer, 1 unit of Promega Taq (0.2 ul), and reaction buffer with detergents and DMSO (final concentrations: 50 mM Tris HCl, pH 9.2 (25&#176;C), 16 mM (NH<sub>4</sub>)2SO<sub>4</sub>, 2.25 mM MgCl<sub>2</sub>, 2% (v/v) DMSO, 0.1% (v/v) Tween 20). Two-step thermal cycle conditions were used (94&#176;C for 10 seconds; then 32 cycles of 94&#176;C for 30 seconds followed by 70&#176;C for 5 minutes; then 72&#176;C for 15 minutes). PCR products were purified manually in 96 well format using Sephadex G-50 in a deep well plate with a 0.2 microfilter (Millipore, Billerica, USA) or robotically using AmPure (Agencourt, Beverly, MA, USA) and EvolP<sup>3 </sup>96 pipetting liquid handling system (PerkinElmer Life Sciences Inc., Boston, MA, USA).</p>
            <p>PCR products were sequenced from the 5' end (relative to the mRNA) on an ABI 373 or ABI 3730 sequencer using ABI "Big Dye" reaction mix. We typically used 1/16 the amount of reaction mix, yielding 300 to 400 unambiguous bases. Sequences were purified using biotin primers and streptavidin coated magnetic beads (for the ABI 373) or Agencort CleanSeq (for the ABI 3730).</p>
         </sec>
         <sec>
            <st>
               <p>Validation</p>
            </st>
            <p>We used three procedures to verify that the correct sequence was associated with each cDNA. 1) Each 96-well plate had three wells with a "marker cDNA" (<it>Ctenophore </it>cDNA #5, a random cDNA with no similarity to any sequence in GenBank). Two wells (#40 and #67) always contained the marker cDNA, and thus any misloading or mislabeling of sequencing lanes was identifiable. The third marker cDNA was placed in a well that corresponds to the plate number (e.g., plate 2 had the marker in well 2). 2) After the production of 12 plates, one row (8 wells) from each plate was re-sequenced. Thus, 8/96 or ~8% of all sequences and their locations were confirmed. 3) cDNAs used for microarrays were re-sequenced. These measures are important to ensure that the correct and known cDNAs are printed.</p>
         </sec>
         <sec>
            <st>
               <p>Subtraction</p>
            </st>
            <p>The complexity of the normalized library was reduced by subtracting the characterized cDNAs previously isolated from the normalized library. Subtraction greatly reduced the probability of isolating the same cDNA and thus improved the efficiency of screening the library for unique clones. Subtraction used a 100-fold molar excess of biotin-labeled antisense cDNAs produced by PCR using all the characterized cDNAs as substrates and vector-specific primers in which the 3' primer was labeled with biotin. These PCR products were hybridized to the cDNA libraries in the presence of oligo-dA and vector-specific oligos (that prevented non-specific hybridization to oligo-dT or vector sequences). After a 24 hour hybridization, genes in the library that bound to these biotin-labeled PCR products were removed with the use of magnetized, streptavidin coated beads. DH10s <it>E. coli </it>were transformed with the subtracted library by electroporation.</p>
         </sec>
         <sec>
            <st>
               <p>Hardware and software</p>
            </st>
            <p>Computational work was done on an Apple G5 dual 2 GHz processor system with 4 GB of RAM. Data are stored in a MySQL database, perl scripts were used extensively for parsing and loading data, and PHP was used on an APACHE web server to construct the user interface. Additional programs available from their authors are mentioned within context. Software and databases are described in Table <tblr tid="T3">3</tblr>.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Software and Databases. Publicly available software and databases used for <it>FunnyBase</it>. The version and/or download date are listed.</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Resource</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Version and/or Download Date</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Software</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Stand-alone BLAST</p>
                     </c>
                     <c ca="left">
                        <p>2.2.8</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Cross match</p>
                     </c>
                     <c ca="left">
                        <p>0.990329</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>CAP3</p>
                     </c>
                     <c ca="left">
                        <p>January, 2004</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Public Sequence Similarity Databases</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Swiss-Prot</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>NR</p>
                     </c>
                     <c ca="left">
                        <p>44</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Human RefSeq</p>
                     </c>
                     <c ca="left">
                        <p>June, 2004</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Human Unigene</p>
                     </c>
                     <c ca="left">
                        <p>June, 2004</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Zebrafish Unigene</p>
                     </c>
                     <c ca="left">
                        <p>June, 2004</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Medaka Unigene</p>
                     </c>
                     <c ca="left">
                        <p>June, 2004</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Rainbow Trout Unigene</p>
                     </c>
                     <c ca="left">
                        <p>June, 2004</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>June, 2004</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Functional Annotation</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Gene Ontology</p>
                     </c>
                     <c ca="left">
                        <p>2004-06-04</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>KEGG</p>
                     </c>
                     <c ca="left">
                        <p>June, 2004</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Microarrays</p>
            </st>
            <p>Microarrays were printed using a select 384 cDNAs from <it>F. heteroclitus </it>cardiac library encoding essential proteins for cellular metabolism isolated from over 40,000 expressed sequences <url>http://genomics.rsmas.miami.edu/FunnyBase/super_craw4/</url>. These 384 cDNAs were amplified with amine-linked primers and printed on 3-D Link Activated slides (Surmodics Inc., Eden Prairie, MN, USA) using <it>GeneMachine OminGrider</it>, and blocked following slide manufacturer protocols. The suite of 384 amplified cDNAs was printed as a group in four spatially separated replicates. Four hybridization zones of these four replicate arrays were printed per slide, with each zone set separated by a hydrophobic barrier. Samples were hybridized twice; once with Cy3 and once with Cy5 resulting in overall technical replication of 8-fold per sample.</p>
         </sec>
         <sec>
            <st>
               <p>Sample preparation and hybridization</p>
            </st>
            <p>RNA was extracted from tissue homogenate in a chaotropic buffer using phenol/cholorform/isoamyl alcohol and RNA quality was assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). RNA for hybridization was prepared by amplification using a modified Eberwine protocol <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> using the Ambion Amino Allyl MessageAmp aRNA Kit. Cy3 and Cy5 were hybridized to slides and incubated 12&#8211;18 hours at 42&#176;C. Following hybridization, slides were scanned using the Packard Bioscience ScanArray Express microarray scanner (PerkinElmer Life Sciences Inc., Boston, MA, USA) and images processed using ImaGene (Biodiscovery Inc., Marina del Rey, CA, USA).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JP designed, scripted and implemented <it>FunnyBase </it>and provided statistical analyses of database. MFO initiated, designed protocols and provided sequences for <it>F. heteroclitus</it>' EST project. JDV optimized robotic interfaces for sequencing and sequenced ESTs. JLR and KJK sequenced ESTs. GJW collaborated on bioinformatics and database development. JAW provided microarray data and analyses. DLC initiated <it>F. heteroclitus</it>' EST project and developed the database and annotation schemes for <it>FunnyBase</it>. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>Initial support for the <it>F. heteroclitus </it>EST project was provided by NSF BioInformatics post-doctoral fellowship 0074520 to MFO and NSF/IBN grant 9986602 to DLC. We would like to thank Kristin Horgan of M.J. research for arranging the loan of Tetrad-thermal cycler for the Compartive Functional Genomic Course. Additionally, we would like to thank AP-Biotech and specifically Dr. Robert Feldman for use of MegaBace used to re-sequence cDNAs.</p>
            <p>Current EST isolation, sequencing and bioinformatics is supported by NSF/OCE grant 0221879 and NIH/NHLBI R01 HL65470 to DLC and NIH/NIEHS ES011588 to MFO.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications</p>
            </title>
            <aug>
               <au>
                  <snm>Sorlie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Perou</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Aas</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Geisler</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Johnsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>van</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Jeffrey</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Thorsen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Quist</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Matese</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lonning</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Borresen-Dale</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences of the United States of America</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>10869</fpage>
            <lpage>10874</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">11553815</pubid>
                  <pubid idtype="doi">10.1073/pnas.191367098</pubid>
                  <pubid idtype="pmcid">58566</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Systematic variation in gene expression patterns in human cancer cell lines</p>
            </title>
            <aug>
               <au>
                  <snm>Ross</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Scherf</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Perou</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rees</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Iyer</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Jeffrey</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Van</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Waltham</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pergamenschikov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lashkari</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Shalon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Weinstein</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nature Genetics</source>
            <pubdate>2000</pubdate>
            <volume>24</volume>
            <fpage>227</fpage>
            <lpage>235</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">10700174</pubid>
                  <pubid idtype="doi">10.1038/73432</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Expression profiling reveals distinct sets of genes altered during induction and regression of cardiac hypertrophy</p>
            </title>
            <aug>
               <au>
                  <snm>Friddle</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Koga</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Bristow</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>PNAS</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>6745</fpage>
            <lpage>6750</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">10829065</pubid>
                  <pubid idtype="doi">10.1073/pnas.100127897</pubid>
                  <pubid idtype="pmcid">18725</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Identification of new genes differentially expressed in coronary artery disease by expression profiling</p>
            </title>
            <aug>
               <au>
                  <snm>Archacki</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Angheloiu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>XL</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>FL</fnm>
               </au>
               <au>
                  <snm>DiPaola</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>GQ</fnm>
               </au>
               <au>
                  <snm>Moravec</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Topol</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Q</fnm>
               </au>
            </aug>
            <source>Physiol Genomics</source>
            <pubdate>2003</pubdate>
            <volume>15</volume>
            <fpage>65</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12902549</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Cardiac hypertrophy by hypertension and exercise training exhibits different gene expression of enzymes in energy metabolism</p>
            </title>
            <aug>
               <au>
                  <snm>Iemitsu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Miyauchi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maeda</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sakai</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fujii</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Miyazaki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kakinuma</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Matsuda</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yamaguchi</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Hypertens Res</source>
            <pubdate>2003</pubdate>
            <volume>26</volume>
            <fpage>829</fpage>
            <lpage>837</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14621187</pubid>
                  <pubid idtype="doi">10.1291/hypres.26.829</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Transcriptomal analysis of failing and nonfailing human hearts</p>
            </title>
            <aug>
               <au>
                  <snm>Steenman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>YW</fnm>
               </au>
               <au>
                  <snm>Le Cunff</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lamirault</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Varro</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Leger</snm>
                  <fnm>JJ</fnm>
               </au>
            </aug>
            <source>Physiol Genomics</source>
            <pubdate>2003</pubdate>
            <volume>12</volume>
            <fpage>97</fpage>
            <lpage>112</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12429867</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Cardiac transcriptional response to acute and chronic angiotensin II treatments</p>
            </title>
            <aug>
               <au>
                  <snm>Larkin</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Frank</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Gaspard</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Duka</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Gavras</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Physiol Genomics</source>
            <pubdate>2004</pubdate>
            <issue>18</issue>
            <fpage>152</fpage>
            <lpage>66</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1152/physiolgenomics.00057.2004</pubid>
                  <pubid idtype="pmpid" link="fulltext">15126644</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Comprehensive analysis of the mouse metabolome based on the transcriptome</p>
            </title>
            <aug>
               <au>
                  <snm>Bono</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nikaido</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kasukawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hayashizaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Okazaki</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1345</fpage>
            <lpage>1349</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12819132</pubid>
                  <pubid idtype="doi">10.1101/gr.974603</pubid>
                  <pubid idtype="pmcid">403659</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Alterations in mitochondrial function in a mouse model of hypertrophic cardiomyopathy</p>
            </title>
            <aug>
               <au>
                  <snm>Lucas</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Aryal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Szweda</snm>
                  <fnm>LI</fnm>
               </au>
               <au>
                  <snm>Koch</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Leinwand</snm>
                  <fnm>LA</fnm>
               </au>
            </aug>
            <source>Am J Physiol Heart Circ Physiol</source>
            <pubdate>2003</pubdate>
            <volume>284</volume>
            <fpage>H575</fpage>
            <lpage>83</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12414446</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Global transcription analysis of Krebs tricarboxylic acid cycle mutants reveals an alternating pattern of gene expression and effects on hypoxic and oxidative genes</p>
            </title>
            <aug>
               <au>
                  <snm>McCammon</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Epstein</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Przybyla-Zawislak</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>McAlister-Henn</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Butow</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Mol Biol Cell</source>
            <pubdate>2003</pubdate>
            <volume>14</volume>
            <fpage>958</fpage>
            <lpage>972</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12631716</pubid>
                  <pubid idtype="doi">10.1091/mbc.E02-07-0422</pubid>
                  <pubid idtype="pmcid">151572</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Genetics and functional genomics of type 2 diabetes mellitus</p>
            </title>
            <aug>
               <au>
                  <snm>Toye</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gauguier</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>241</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14659011</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-4-12-241</pubid>
                  <pubid idtype="pmcid">329413</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Coordinated patterns of gene expression for substrate and energy metabolism in skeletal muscle of diabetic mice</p>
            </title>
            <aug>
               <au>
                  <snm>Yechoor</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Patti</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Saccone</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kahn</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>10587</fpage>
            <lpage>10592</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12149437</pubid>
                  <pubid idtype="doi">10.1073/pnas.142301999</pubid>
                  <pubid idtype="pmcid">124982</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Transposon-mediated enhancer trapping in medaka</p>
            </title>
            <aug>
               <au>
                  <snm>Grabher</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Henrich</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sasado</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Arenz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wittbrodt</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Furutani-Seiki</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2003</pubdate>
            <volume>322</volume>
            <fpage>57</fpage>
            <lpage>66</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14644497</pubid>
                  <pubid idtype="doi">10.1016/j.gene.2003.09.009</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>A survey of expressed genes in Japanese flounder (Paralichthys olivaceus) liver and spleen</p>
            </title>
            <aug>
               <au>
                  <snm>Inoue</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nam</snm>
                  <fnm>BH</fnm>
               </au>
               <au>
                  <snm>Hirono</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Aoki</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Molecular Marine Biology &amp; Biotechnology</source>
            <pubdate>1997</pubdate>
            <volume>6</volume>
            <fpage>376</fpage>
            <lpage>380</lpage>
         </bibl>
         <bibl id="B15">
            <title>
               <p>15,000 Unique zebrafish EST clusters and their future use in microarray for profiling gene expression patterns during embryogenesis</p>
            </title>
            <aug>
               <au>
                  <snm>Lo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ruan</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Eun</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Wen</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Peng</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>455</fpage>
            <lpage>466</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12618376</pubid>
                  <pubid idtype="doi">10.1101/gr.885403</pubid>
                  <pubid idtype="pmcid">430290</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Analysis of expressed sequence tags (EST) obtained from common carp, Cyprinus carpio L., head kidney cells after stimulation by two mitogens, lipopolysaccharide and concanavalin-A</p>
            </title>
            <aug>
               <au>
                  <snm>Savan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sakai</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Comparative Biochemistry &amp; Physiology Part B, Biochemistry &amp; Molecular Biology 131B</source>
            <pubdate>2002</pubdate>
            <volume>1</volume>
            <fpage>71</fpage>
            <lpage>82</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S1096-4959(01)00488-2</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>The construction of an EST database for Bombyx mori and its application</p>
            </title>
            <aug>
               <au>
                  <snm>Mita</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Morimyo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Okano</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Koike</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nohata</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kawasaki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kadono-Okuda</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yamamoto</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Shimada</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Goldsmith</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Maeda</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences of the United States of America</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>14121</fpage>
            <lpage>14126</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14614147</pubid>
                  <pubid idtype="doi">10.1073/pnas.2234984100</pubid>
                  <pubid idtype="pmcid">283556</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Development and application of a salmonid EST database and cDNA microarray: data mining and interspecific hybridization characteristics</p>
            </title>
            <aug>
               <au>
                  <snm>Rise</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>von Schalburg</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Mawer</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Devlin</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Kuipers</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Busby</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Beetz-Sargent</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Alberto</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Shukin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Zeznik</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Smailus</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Schein</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Marra</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Butterfield</snm>
                  <fnm>YS</fnm>
               </au>
               <au>
                  <snm>Stott</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Davidson</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>Koop</snm>
                  <fnm>BF</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>478</fpage>
            <lpage>490</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14962987</pubid>
                  <pubid idtype="doi">10.1101/gr.1687304</pubid>
                  <pubid idtype="pmcid">353236</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Annotated Expressed Sequence Tags and cDNA Microarrays for Studies of Brain and Behavior in the Honey Bee</p>
            </title>
            <aug>
               <au>
                  <snm>Whitfield</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Band</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Bonaldo</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Kumar</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Pardinas</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Robertson</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Soares</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>GE</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>555</fpage>
            <lpage>566</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">11932240</pubid>
                  <pubid idtype="doi">10.1101/gr.5302</pubid>
                  <pubid idtype="pmcid">187514</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The utility of natural populations for microarray analyses: isolation of genes necessary for functional genomic studies.</p>
            </title>
            <aug>
               <au>
                  <snm>Oleksiak</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Kolell</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Crawford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Marine Biotechnology</source>
            <pubdate>2001</pubdate>
            <volume>3</volume>
            <fpage>S203</fpage>
            <lpage>S211</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14961317</pubid>
                  <pubid idtype="doi">10.1007/s10126-001-0043-0</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>CnidBase: The Cnidarian Evolutionary Genomics Database</p>
            </title>
            <aug>
               <au>
                  <snm>Ryan</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Finnerty</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>159</fpage>
            <lpage>163</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12519972</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg116</pubid>
                  <pubid idtype="pmcid">165563</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Functional genomics does not have to be limited to a few select organisms.</p>
            </title>
            <aug>
               <au>
                  <snm>Crawford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>INTERACTIONS1001 http://www.genomebiology.com/2001/2/1/interactions/1001/</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1186/gb-2001-2-1-interactions1001</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Variation in gene expression within and among natural populations</p>
            </title>
            <aug>
               <au>
                  <snm>Oleksiak</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Churchill</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Crawford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Nature Genetics</source>
            <pubdate>2002</pubdate>
            <volume>32</volume>
            <fpage>261</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12219088</pubid>
                  <pubid idtype="doi">10.1038/ng983</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Gene Expression Profiles in the Brain Predict Behavior in Individual Honey Bees</p>
            </title>
            <aug>
               <au>
                  <snm>Whitfield</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Cziko</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>GE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <fpage>296</fpage>
            <lpage>299</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14551438</pubid>
                  <pubid idtype="doi">10.1126/science.1086807</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Expression profiles during honeybee caste determination</p>
            </title>
            <aug>
               <au>
                  <snm>Evans</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>RESEARCH0001</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">17597</pubid>
                  <pubid idtype="pmpid" link="fulltext">11178278</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Hypoxia-induced gene expression profiling in the euryoxic fish Gillichtys mirabilis</p>
            </title>
            <aug>
               <au>
                  <snm>Gracey</snm>
                  <fnm>AY</fnm>
               </au>
               <au>
                  <snm>Troll</snm>
                  <fnm>JV</fnm>
               </au>
               <au>
                  <snm>Somero</snm>
                  <fnm>GN</fnm>
               </au>
            </aug>
            <source>Proceeding of National Academy of Science, USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>1993</fpage>
            <lpage>1998</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.98.4.1993</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Changes in gene expression associated with acclimation to constant temperatures and fluctuating daily temperatures in an annual killifish Austrofundulus limnaeus</p>
            </title>
            <aug>
               <au>
                  <snm>Podrabsky</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Somero</snm>
                  <fnm>GN</fnm>
               </au>
            </aug>
            <source>J Exp Biol</source>
            <pubdate>2004</pubdate>
            <volume>207</volume>
            <fpage>2237</fpage>
            <lpage>2254</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15159429</pubid>
                  <pubid idtype="doi">10.1242/jeb.01016</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Differential gene expression in the brain of channel catfish (Ictalurus punctatus) in response to cold acclimation</p>
            </title>
            <aug>
               <au>
                  <snm>Ju</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Dunham</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Molecular Genetics and Genomics</source>
            <pubdate>2002</pubdate>
            <volume>268</volume>
            <fpage>87</fpage>
            <lpage>95</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12242503</pubid>
                  <pubid idtype="doi">10.1007/s00438-002-0727-9</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Phylogenetic analysis of glycolytic enzyme expression</p>
            </title>
            <aug>
               <au>
                  <snm>Pierce</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Crawford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>275</volume>
            <fpage>256</fpage>
            <lpage>259</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1126/science.276.5310.256</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Gene flow and mitochondrial DNA variation in the killifish Fundulus heteroclitus</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Chapman</snm>
                  <fnm>RW</fnm>
               </au>
            </aug>
            <source>Evolution</source>
            <pubdate>1991</pubdate>
            <volume>45</volume>
            <fpage>1147</fpage>
            <lpage>1161</lpage>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Molecular basis of evolutionary adaptation at the lactate dehydrogenase-B locus in the fish Fundulus heteroclitus</p>
            </title>
            <aug>
               <au>
                  <snm>Crawford</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Powers</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences of the United States of America</source>
            <pubdate>1989</pubdate>
            <volume>86</volume>
            <fpage>9365</fpage>
            <lpage>9369</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">298496</pubid>
                  <pubid idtype="pmpid">2594773</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>A multidisciplinary approach to the selectionist/neutralist controversy using the model teleost, Fundulus heteroclitus</p>
            </title>
            <aug>
               <au>
                  <snm>Powers</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gonzalez-Villasenor</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>DiMichelle</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Crawford</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Bernardi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lauerman</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Oxford Surveys in Evolutionary Biology</source>
            <publisher>New York, NY, Oxford University Press</publisher>
            <editor>Futuyma D and Antonovics J</editor>
            <pubdate>1993</pubdate>
            <volume>9</volume>
            <fpage>43</fpage>
            <lpage>108</lpage>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Phylogenetic analysis of thermal acclimation of the glycolytic enzymes in the genus Fundulus</p>
            </title>
            <aug>
               <au>
                  <snm>Pierce</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Crawford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Physiological Zoology</source>
            <pubdate>1997</pubdate>
            <volume>70</volume>
            <fpage>597</fpage>
            <lpage>609</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9361133</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Phylogenetic analysis of glycolytic enzyme expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Pierce</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Crawford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>276</volume>
            <fpage>256</fpage>
            <lpage>259</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">9092475</pubid>
                  <pubid idtype="doi">10.1126/science.276.5310.256</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>http://www.tigr.org/tdb/tgi/</p>
            </title>
            <aug>
               <au>
                  <cnm>TIGR</cnm>
               </au>
            </aug>
         </bibl>
         <bibl id="B36">
            <title>
               <p>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene</p>
            </title>
            <aug>
               <au>
                  <cnm>UniGene</cnm>
               </au>
            </aug>
         </bibl>
         <bibl id="B37">
            <title>
               <p/>
            </title>
            <aug>
               <au>
                  <snm>dbESTs</snm>
                  <fnm>NCBI</fnm>
               </au>
            </aug>
            <source>(http://wwwncbinlmnihgov/dbEST/dbEST_summaryhtml)</source>
            <volume>1 July 2004</volume>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium</p>
            </title>
            <aug>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dwight</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Eppig</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Issel-Tarver</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kasarskis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Matese</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Ringwald</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nature Genetics</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>25</fpage>
            <lpage>29</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">10802651</pubid>
                  <pubid idtype="doi">10.1038/75556</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>The Kyoto encyclopedia of genes and genomes--KEGG</p>
            </title>
            <aug>
               <au>
                  <snm>Wixon</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kell</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Yeast</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>48</fpage>
            <lpage>55</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid">10928937</pubid>
                  <pubid idtype="doi">10.1002/(SICI)1097-0061(200004)17:1&lt;48::AID-YEA2>3.0.CO;2-H</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>UniProt: the Universal Protein knowledgebase</p>
            </title>
            <aug>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Boeckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ferro</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Magrane</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Redaschi</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Yeh</snm>
                  <fnm>LS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2004</pubdate>
            <volume>32 Database issue</volume>
            <fpage>D115</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14681372</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh131</pubid>
                  <pubid idtype="pmcid">308865</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>NCBI NR (non-redudant) Database</p>
            </title>
            <aug>
               <au>
                  <cnm>NCBI</cnm>
               </au>
            </aug>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">2231712</pubid>
                  <pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>CAP3: A DNA Sequence Assembly Program</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Madan</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>868</fpage>
            <lpage>877</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">10508846</pubid>
                  <pubid idtype="doi">10.1101/gr.9.9.868</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Generation and analysis of 280,000 human expressed sequence tags</p>
            </title>
            <aug>
               <au>
                  <snm>Hillier</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Lennon</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bonaldo</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Chiapelli</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Chissoe</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dietrich</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>DuBuque</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Favello</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hawkins</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hultman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kucaba</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lacy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Le</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Le</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mardis</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Parsons</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Prange</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rifkin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Rohlfing</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schellenberg</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Marra</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Research</source>
            <pubdate>1996</pubdate>
            <volume>6</volume>
            <fpage>807</fpage>
            <lpage>828</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8889549</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Normalization and subtraction: two approaches to facilitate gene discovery</p>
            </title>
            <aug>
               <au>
                  <snm>Bonaldo</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Lennon</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Soares</snm>
                  <fnm>MB</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>1996</pubdate>
            <volume>6</volume>
            <fpage>791</fpage>
            <lpage>806</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8889548</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Online Mendelian Inheritance in Man, OMIM (TM)</p>
            </title>
            <aug>
               <au>
                  <snm>McKusick-Nathans Institute for Genetic Medicine</snm>
                  <fnm>JHUBMDNCBINLM</fnm>
               </au>
            </aug>
            <url>http://www.ncbi.nlm.nih.gov/omim/</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Variation in tissue-specific gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Whitehead</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Crawford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2004</pubdate>
            <volume>In Press</volume>
         </bibl>
         <bibl id="B48">
            <title>
               <p>RefSeq and LocusLink: NCBI gene-centered resources</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>137</fpage>
            <lpage>140</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">11125071</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.137</pubid>
                  <pubid idtype="pmcid">29787</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>A large database of chicken bursal ESTs as a resource for the analysis of vertebrate gene function</p>
            </title>
            <aug>
               <au>
                  <snm>Abdrakhmanov</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lodygin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Geroth</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Arakawa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Law</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Plachy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Korn</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Buerstedde</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>2062</fpage>
            <lpage>2069</lpa