<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-S5-S2</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Proceedings</dochead>
      <bibl>
         <title>
            <p>Comparative genomics in cyprinids: common carp ESTs help the annotation of the zebrafish genome</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Christoffels</snm>
               <fnm>Alan</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>alan@tll.org.sg</email>
            </au>
            <au id="A2">
               <snm>Bartfai</snm>
               <fnm>Richard</fnm>
               <insr iid="I3"/>
               <email>bartfai@tll.org.sg</email>
            </au>
            <au id="A3">
               <snm>Srinivasan</snm>
               <fnm>Hamsa</fnm>
               <insr iid="I1"/>
               <email>hamsa.srinivasan@gmail.com</email>
            </au>
            <au id="A4">
               <snm>Komen</snm>
               <fnm>Hans</fnm>
               <insr iid="I4"/>
               <email>Hans.Komen@wur.nl</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Orban</snm>
               <fnm>Laszlo</fnm>
               <insr iid="I3"/>
               <insr iid="I5"/>
               <email>laszlo@tll.org.sg</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Computational Biology Group, Temasek Life Sciences Laboratory, Singapore</p>
            </ins>
            <ins id="I2">
               <p>School of Biological Sciences, Nanyang Technological University, Singapore</p>
            </ins>
            <ins id="I3">
               <p>Reproductive Genomics Group, Temasek Life Sciences Laboratory, Singapore</p>
            </ins>
            <ins id="I4">
               <p>Animal Breeding and Genetics Group, Wageningen University, Wageningen, The Netherlands</p>
            </ins>
            <ins id="I5">
               <p>Department of Biological Sciences, The National University of Singapore, Singapore</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <supplement>
            <title>
               <p>APBioNet &#8211; Fifth International Conference on Bioinformatics (InCoB2006)</p>
            </title>
            <editor>Shoba Ranganathan, Martti Tammi, Michael Gribskov, Tin Wee Tan</editor>
            <note>Proceedings</note>
         </supplement>
         <conference>
            <title>
               <p>International Conference in Bioinformatics &#8211; InCoB2006</p>
            </title>
            <location>New Delhi, India</location>
            <date-range>18&#8211;20 December 2006</date-range>
         </conference>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>Suppl 5</issue>
         <fpage>S2</fpage>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17254304</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-S5-S2</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>18</day>
               <month>12</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Christoffels et al; licensee BioMed Central Ltd</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Automatic annotation of sequenced eukaryotic genomes integrates a combination of methodologies such as <it>ab-initio </it>methods and alignment of homologous genes and/or proteins. For example, annotation of the zebrafish genome within Ensembl relies heavily on available cDNA and protein sequences from two distantly related fish species and other vertebrates that have diverged several hundred million years ago. The scarcity of genomic information from other cyprinids provides the impetus to leverage EST collections to understand gene structures in this diverse teleost group.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We have generated 6,050 ESTs from the differentiating testis of common carp <it>(Cyprinus carpio) </it>and clustered them with 9,303 non-gonadal ESTs from CarpBase as well as 1,317 ESTs and 652 common carp mRNAs from GenBank. Over 28% of the resulting 8,663 unique transcripts are exclusively testis-derived ESTs. Moreover, 974 of these transcripts did not match any sequence in the zebrafish or fathead minnow EST collection.</p>
               <p>A total of 1,843 unique common carp sequences could be stringently mapped to the zebrafish genome (version 5), of which 1,752 matched coding sequences of zebrafish genes with or without potential splice variants. We show that 91 common carp transcripts map to intergenic and intronic regions on the zebrafish genome assembly and regions annotated with non-teleost sequences. Interestingly, an additional 42 common carp transcripts indicate the potential presence of new splicing variants not found in zebrafish databases so far. The fact that common carp transcripts help the identification or confirmation of these coding regions in zebrafish exemplifies the usefulness of sequences from closely related species for the annotation of model genomes.</p>
               <p>We also demonstrate that 5' UTR sequences of common carp and zebrafish orthologs share a significant level of similarity based on preservation of motif arrangements for as many as 10 <it>ab-initio </it>motifs.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our data show that there is sufficient homology between the transcribed sequences of common carp and zebrafish to warrant an even deeper cyprinid transcriptome comparison. On the other hand, the comparative analysis illustrates the value in utilizing partially sequenced transcriptomes to understand gene structure in this diverse teleost group. We highlight the need for integrated resources to leverage the wealth of fragmented genomic data.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Eukaryotic gene prediction has been a challenging problem, explored over the last two decades and driven by the availability of large volumes of genomic data. The development of gene prediction methods have traditionally included (1) <it>ab-initio </it>approaches such as GENSCAN <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp> that do not use any experimental evidence, (2) alignment-based methods such as GENEWISE <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> that attempts to align an homologous protein sequence to a genomic sequence and more recently, (3) hybrid approaches that incorporate cDNA-defined splice junctions into <it>ab-initio </it>and protein alignment information <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. Such hybrid approaches for automatic annotation of genome sequences have been implemented within the Ensembl annotation project <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Ensembl represents a bioinformatics project aimed at annotating sequenced genomes and integrating biological data that can be mapped or assigned to features described in the genomic data.</p>
         <p>At present, twenty fully or near-fully sequenced vertebrate genomes have been included in Ensembl (version 39). Teleosts, comprising about half the number of all extant vertebrate species, are represented by only five species, namely Japanese fugu (<it>Takifugu rubripes</it>), green spotted pufferfish (<it>Tetraodon nigroviridis</it>), zebrafish (<it>Danio rerio</it>), Japanese medaka (<it>Oryzias latipes</it>) and three-spined stickleback (<it>Gastroceus aculeatus</it>), within the Ensembl data.</p>
         <p>The zebrafish is a representative of the most abundant and widespread primary freshwater fish family, <it>Cyprinidae </it><abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp> with ample genomic resources including a nearly fully sequenced genome and over a million expressed sequence tags (ESTs). However, genomic data for the rest of the cyprinids are quite scarce (for review see <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>), partly due to polyploidy that represents a characteristic feature of several members of the <it>Cyprinidae </it>family <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p>In the absence of genome projects from closely related species, the automatic annotation of genomes relies heavily on available cDNA and protein sequences of other vertebrates for sequence comparisons. For example, mammalian and teleost genome comparisons have been used successfully to identify conserved protein-coding genes and regulatory elements despite the 450 million years that elapsed since their last common ancestor <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. In contrast, a recent study by Thomas and colleagues <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> concluded that fish-mammal comparisons were unable to detect most non-coding regions that were conserved between amniotes. Theoretically, the annotation of the zebrafish genome could benefit from sequence data for a closely related species excluding the annotated genomes of Japanese fugu and the green spotted pufferfish that share a common ancestor with zebrafish more than 200 million years ago <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
         <p>The UniGene collection <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> represents a database of species-specific mRNA and ESTs that are grouped into clusters or genes based on stringent sequence identity. Currently two cyprinid species are present in the UniGene collection (build 90 <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>), namely the zebrafish and fathead minnow <it>(Pimephales promelas)</it>. Zebrafish belongs to the subfamily <it>Rasborinae</it>, whereas fathead minnow is a member of <it>Leuciscinae </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Nearly 11,000 ESTs are present in dbEST <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp> for a third cyprinid species, common carp <it>(Cyprinus carpio, Cyprininae) </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, however they were not sampled in the recent UniGene collection (build 90). (These common carp ESTs have been produced earlier by other teams from a range of tissues other than gonad <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>). Common carp is the most important fish species of freshwater aquaculture, probably with the earliest domestication records among fishes <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. It has been used in fish biology and aquaculture research quite extensively (for reviews see <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>).</p>
         <p>Common carp is a close relative of the zebrafish, they both belong to the same family. The ancestors of common carp and zebrafish have split about 50 million years ago (Mya) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, whereas the corresponding divergence data for fathead minnow is not available. The wealth of EST data for these three cyprinid species and the recent speciation event provides a valuable resource to aid the ongoing zebrafish genome annotation project.</p>
         <p>In order to facilitate the comparative genomic analysis of gonad development in cyprinid teleosts, primarily the zebrafish <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> and common carp, we set out to complement the non-gonadal common carp transcriptome data by sequencing clones from testis-derived cDNA libraries. We then performed a cross-species analysis of cyprinids by comparing common carp ESTs sequences to those originating from zebrafish and fathead minnow, as well as to the partially sequenced zebrafish genome. We mapped common carp ESTs to un-annotated regions of the zebrafish genome. Our results identified novel testis-expressed transcripts in cyprinids and new splice variants in the common carp transcriptome. We were able to show that the two species share a significant level of similarity in the 5'UTR regions. Collectively, these results indicate that such a comparative approach, based on the usage of closely related species, could add value to the current ongoing improvements to the zebrafish genome assembly and annotation by the genomic community.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>Testis-derived common carp cDNAs add nearly 2,500 unique sequences to the public EST collection</p>
            </st>
            <p>At the start of our work GenBank <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and CarpBASE <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> together contained 10,615 common carp ESTs, all of which originated from non-gonadal cDNA libraries. We enriched the existing transcriptome dataset for common carp, by generating an additional 6,050 ESTs by random sequencing of clones from five different cDNA libraries derived from differentiating common carp testis (60&#8211;100 days post fertilization or dpf; see <supplr sid="S1">Additional File 1</supplr>: Table S1 for details on the libraries). We also added an additional 652 common carp mRNAs extracted from GenBank in order to assist the assembly of ESTs.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p>The description of common carp cDNA libraries analyzed in this study. Details include tissue, developmental stage and source of cDNA libraries.</p>
               </text>
               <file name="1471-2105-7-S5-S2-S1.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Following cleaning and quality control, over 15,000 ESTs {10,283 from GenBank plus CarpBASE and 5,073 from our own data (GenBank: <ext-link ext-link-type="gen" ext-link-id="DW719352">DW719352</ext-link>&#8211;<ext-link ext-link-type="gen" ext-link-id="DW724424">DW724424</ext-link>)} were retained and clustered (Fig. <figr fid="F1">1</figr>). The clustered dataset of 8,663 unique sequences (1,643 clusters and 7,020 singletons) contained 2,442 (28.1%) "testis-only" sequences, including clusters with exclusively testis-derived ESTs and singletons isolated from one of the testis cDNA libraries (Fig. <figr fid="F2">2</figr>).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Flow chart depicting the protocols used for comparing the sequences from common carp and zebrafish</p>
               </caption>
               <text>
                  <p>Flow chart depicting the protocols used for comparing the sequences from common carp and zebrafish.</p>
               </text>
               <graphic file="1471-2105-7-S5-S2-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Distribution of clusters and singletons according to the origin of the sequences</p>
               </caption>
               <text>
                  <p>Distribution of clusters and singletons according to the origin of the sequences. The combined common carp collection was thoroughly cleaned and clustered using the STACKPACK clustering tool.</p>
               </text>
               <graphic file="1471-2105-7-S5-S2-2"/>
            </fig>
            <p>In order to initiate functional annotation of the partial transcriptome of common carp, we identified open reading frames (ORFs) in our clustered EST set using ESTScan <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. An ORF prediction was obtained for 81% of the clusters and 47.5% of the singletons, yielding a total of 4,663 sequences (data not shown). The ORF-containing common carp transcripts were classified into functional categories using protein domain databases (<supplr sid="S2">Additional File 2</supplr>: Table S2; see Materials and Methods for databases used).</p>
            <suppl id="S2">
               <title>
                  <p>Additional File 2</p>
               </title>
               <text>
                  <p>Distribution of functional categories identified in the partial transcriptome of common carp.</p>
               </text>
               <file name="1471-2105-7-S5-S2-S2.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Mapping of common carp ESTs to the zebrafish genome</p>
            </st>
            <p>In the zebrafish Ensembl annotation (Ensembl_37) genes were annotated using mRNA and proteins from the target species as well as a range of other vertebrates, the closest to zebrafish being Japanese fugu and green spotted pufferfish. We mapped our common carp EST data to the zebrafish genome assembly (v5; <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>) according to a multi-step protocol (see Materials and Methods for details and <supplr sid="S8">Additional File 8</supplr>: Figure S1 for flow chart). A total of 1,182 common carp clusters (72% of all clusters) and 3,827 singletons (55% of all singletons) showed sequence similarity to the zebrafish genome with a BLAST E-value cutoff of 1e-04. After stringent filtering &#8211; selecting a unique zebrafish genomic location for each mapped common carp cluster (see Materials and Methods for detailed description) and sequence identity of 80% over 70% of the EST length &#8211; we assigned 484 clusters (29%) and 1,359 singletons (19%) to the zebrafish genome assembly (from here onwards these sequences will be referred to as "mapped common carp transcripts"). The common carp transcript map coordinates are available from Ensembl version 38 as a DAS track <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
            <suppl id="S8">
               <title>
                  <p>Additional File 8</p>
               </title>
               <text>
                  <p>Protocol to map common carp transcripts to the zebrafish genome assembly (v5). The flow chart depicts the pipeline implemented for mapping common carp transcripts to the zebrafish genome. Filter criteria are denoted in the decision tree. Total number of clusters and singletons are indicated in square brackets.</p>
               </text>
               <file name="1471-2105-7-S5-S2-S8.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>The 90 percentile of all intron lengths within the zebrafish Ensembl database is 4,657 nucleotides. There were 122 cases, where two common carp clusters/singletons mapped to the zebrafish genome within 4,657 nucleotides. These represent cases where the clusters and/or singletons potentially correspond to the same gene but were partitioned into separate clusters because of the absence of sequence data in the EST database.</p>
            <p>Interestingly, there were 84 cases, where at least two clusters and/or singletons overlapped the same zebrafish locus. These represent potential gene family expansions in the common carp relative to zebrafish, but would require experimental validation in the future. These cases provide support for the incorporation of EST sequences from closely related "sequence-poor" species into the analysis pipeline of (nearly) completely sequenced genomes.</p>
         </sec>
         <sec>
            <st>
               <p>Common carp ESTs map to regions lacking expressed sequence information in the zebrafish genome</p>
            </st>
            <p>Nearly 40% of ESTs obtained from GenBank and those sequenced in our lab are bi-directional due to the EST sequencing protocol used. As a result, the strandedness of the genome-aligned common carp ESTs were obtained using the splice-site orientation as defined in the EST2GENOME algorithm <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. To identify un-annotated regions in the zebrafish genome, we required both plus and minus strands of the zebrafish genome be free of any sequence similarity features to non-common carp cDNA and proteins.</p>
            <p>Of the 1,843 common carp transcripts mapped to the zebrafish genome assembly (Ensembl_37), 1,752 overlapped zebrafish cDNAs supported by genes and/or ESTs. The remaining 91 "mapped common carp transcripts" showed sequence identity to regions overlapping zebrafish introns (23), <it>ab-initio </it>predictions (22), non-zebrafish exons (22), intergenic regions (13) and non-zebrafish introns (11) (<supplr sid="S3">Additional File 3</supplr>: Table S3; see Materials and Methods for classification criteria).</p>
            <suppl id="S3">
               <title>
                  <p>Additional File 3</p>
               </title>
               <text>
                  <p>Classification of 91 common carp ESTs that map to intergenic, intronic, <it>ab initio </it>predictions and non-zebrafish supported annotations.</p>
               </text>
               <file name="1471-2105-7-S5-S2-S3.html">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Five of the 13 common carp transcripts that map to intergenic regions are located less than 1 kb from the 5' end of the nearest neighbouring gene. Considering their close proximity to an annotated gene, these common carp transcripts represent potential untranslated regions (UTRs). In fact, the five neighbouring genes are annotated as developmental genes (data not shown). Developmental genes are highly conserved among species and very often the sequence conservation extends to their regulatory regions <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Furthermore, each of the common carp transcripts mapped to the zebrafish genome have sequence identity in excess of 80%, suggesting that the use of a lower threshold for common carp EST mapping might retrieve many more UTR sequences that could be subjected to similar UTR analyses as described in the Materials and Methods. The remaining eight common carp transcripts that map to intergenic regions are located between 5 and 150 kb away from the nearest zebrafish locus, suggesting the presence of novel gene loci that require experimental verification in the future.</p>
            <p>Forty-two of the 91 mapped common carp transcripts have not been identified in the zebrafish and fathead minnow EST collections so far, therefore they represented novel cyprinid sequences. Another 16 of the 91 mapped common carp transcripts showed significant sequence similarity to the zebrafish and fathead minnow UniGene collection (build 91). (This indicated that the overlapping zebrafish transcripts might not have been available at the time of annotating the zebrafish genome version 5.). The remaining 33 common carp transcripts shared very weak sequence similarity (&lt;40% identity) with either zebrafish or fathead minnow, thus might point to genes that diverged from their orthologs. Alternatively, these transcripts could represent sequences orthologous to zebrafish UTRs that are yet to be assigned to the annotated zebrafish genome.</p>
            <p>The above cases illustrate the potential advantages of utilizing partial transcriptomes from related species in order to provide information on the functional properties of (a) un-annotated parts of genomes to be assembled as well as (b) regions annotated with distantly related species.</p>
         </sec>
         <sec>
            <st>
               <p>Alternative exon usage identified by comparing cyprinid transcripts</p>
            </st>
            <p>EST-based analysis of alternative splicing has been performed earlier in mammals; the results suggest that 40&#8211;60% of the genes produce alternatively spliced transcripts <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Only a few studies have been performed on fish sequences (see e.g. <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>) resulting in a limited amount of data on splicing from teleosts.</p>
            <p>Interestingly, out of 1,752 common carp transcripts that map to coding regions, there were 26 cases where the exon structure showed evidence for a missing exon compared to the overlapping zebrafish Ensembl gene (example: Fig. <figr fid="F3">3A</figr>; full list: <supplr sid="S4">Additional File 4</supplr>: Table S4). Similar comparisons yielded 16 cases where an exon that was present in the overlapping common carp ESTs was missing from the zebrafish transcript (example: Fig. <figr fid="F3">3B</figr>; full list: <supplr sid="S4">Additional File 4</supplr>: Table S4). There are four possibilities to explain such differences: i) the exon in question is missing from one of the two genomes; ii) exclusive usage of different splice products in the two related species; iii) different preferences of alternative splice products; and iv) virtual difference due to partial transcriptomes.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Examples of potential splice variants identified by mapping common carp transcripts to the zebrafish genome</p>
               </caption>
               <text>
                  <p>Examples of potential splice variants identified by mapping common carp transcripts to the zebrafish genome. (a) Common carp transcript lacking an exon present in the zebrafish cDNA. (b) A common carp transcript with exons not present in the overlapping zebrafish gene.</p>
               </text>
               <graphic file="1471-2105-7-S5-S2-3"/>
            </fig>
            <suppl id="S4">
               <title>
                  <p>Additional File 4</p>
               </title>
               <text>
                  <p>List of hyperlinks to potential common carp splice variants. Overlapping common carp and zebrafish transcripts are presented on a gbrowse viewer to highlight the missing exons in one of the two species.</p>
               </text>
               <file name="1471-2105-7-S5-S2-S4.html">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>At the time of submission GenBank contained over 1.3 million ESTs for zebrafish, fathead minnow and common carp. We propose that the broad mRNA diversity contained in teleost EST resources could be leveraged to understand the extent of alternative splicing within this diverse group of teleosts using analyses similar to those reported for human ESTs <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison of the partial cyprinid gonad transcriptomes identifies 974 novel testis-derived transcripts</p>
            </st>
            <p>The UniGene collection (build 91) contains datasets for two cyprinid species namely the zebrafish and fathead minnow. The common carp EST data reported in this study, sampled by nearly 9,000 unique transcripts, represent an additional cyprinid species that will be included in subsequent UniGene releases. The new EST data for common carp has also provided an opportunity to examine the value of tissue-specific sequencing on the existing gene collections. The common carp EST data were compared to the zebrafish UniGene collection (build 91) and subsequently to the fathead minnow data set using a BLAST E-value &lt;1e-04 and sequence identity over 40% of the sequence length.</p>
            <p>A total of 932 testis-derived common carp singletons and 42 clusters containing exclusively testis-derived common carp ESTs (<supplr sid="S5">Additional File 5</supplr>: Table S5) did not overlap any of the zebrafish and fathead minnow UniGene transcripts. This dataset added 974 potentially novel sequences to the combined testis transcriptome of cyprinid teleosts (a fraction of these might represent UTR or coding sequences that are derived from fast-evolving genes).</p>
            <suppl id="S5">
               <title>
                  <p>Additional File 5</p>
               </title>
               <text>
                  <p>List of 974 testis-only transcripts that do not overlap any of the zebrafish and fathead minnow ESTs.</p>
               </text>
               <file name="1471-2105-7-S5-S2-S5.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>A total of 214 out of 974 testis-only transcripts contained an ORF. Among these 214 transcripts, three testis-derived clusters contained an interleukin-8-like domain (IPR001811). The absence of significant sequence identity to zebrafish and fathead minnow at the nucleotide level is partly due to cytokines representing rapidly diverging genes involved in regulation of the immune system. Another domain, tissue inhibitor of metalloproteinase (IPR001820), which was identified in a common carp testis-derived transcript was present in two zebrafish UniGene clusters (Dr.240 and Dr.31907), but not sampled by any gonad-derived zebrafish sequences. The remaining 210 common carp transcripts do not show the presence of any characterized domains. These unique testis-derived transcripts could provide starting material for the isolation of their zebrafish orthologs, if any, and their potential application as markers for functional studies on gonad differentiation.</p>
            <p>The potential homologs of 474 common carp clusters with at least one testis-derived EST were identified in the zebrafish UniGene data collection (<supplr sid="S6">Additional File 6</supplr>: Table S6). When compared to the fathead minnow EST collection, six of these 474 common carp clusters showed sequence identity to adult testis-derived ESTs only (<supplr sid="S6">Additional File 6</supplr>: Table S6). The common carp data correspond to differentiating testis (60&#8211;100 dpf), whereas the testis-derived zebrafish and fathead minnow clones presently found in the public databases are all from an adult organ. Therefore our results have complemented the previously available knowledge about the expression of these genes with experimental data on their activity during testis differentiation, providing indications on potentially conserved aspects of cyprinid gonad development. Moreover, the fact that common carp transcripts help the identification or confirmation of these coding regions in zebrafish exemplifies the usefulness of sequences from closely related species for the annotation of model genomes.</p>
            <suppl id="S6">
               <title>
                  <p>Additional File 6</p>
               </title>
               <text>
                  <p>List of 474 testis-derived clusters that show sequence identity to 474 zebrafish and 75 Fathead minnow UniGene clusters. Testis-expression information was added to the adult-stage zebrafish expression data.</p>
               </text>
               <file name="1471-2105-7-S5-S2-S6.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Comparing the overall architecture of UTR regions for a set of orthologous genes from common carp and zebrafish</p>
            </st>
            <p>There is an average 82% sequence identity between the coding region of homologous gene pairs in zebrafish and common carp, whereas the same value for their 5' and 3' UTRs is only 61% and 58%, respectively (see Materials and Methods for details). We set out to explore the extent to which common carp and zebrafish retained similarity in the 5'UTR regions of their orthologous genes as this can reveal aspects of regulatory roles of these regions in both species. This task was difficult for two reasons: i) the fact that only limited sequence information is available from common carp dramatically decreased our ability to identify large number of orthologs between these species; and ii) the usual approaches to evaluate similarity based on local alignments are not really suitable for the similarity assessment of regulatory regions as demonstrated by Blanco and colleagues <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
            <p>By analyzing 48 pairs of orthologous sequences and an additional six paralogs, which contained at least 50 bp at their 5' UTR (see <supplr sid="S7">Additional File 7</supplr>: Table S7 for the complete list) we identified motif families shared in the 5' UTR of common carp and zebrafish mRNAs. Analyzing each of the orthologous pairs individually (plus the paralogs, whenever applicable), we determined the order of a maximum of 10 shared motifs between common carp and zebrafish.</p>
            <suppl id="S7">
               <title>
                  <p>Additional File 7</p>
               </title>
               <text>
                  <p>Homologous gene pairs identified through manual curation of common carp and zebrafish genes. The table includes DNA and protein accession numbers and corresponding gene descriptions. Genes in rows highlighted with yellow contain 5' UTR sequence (>= 50 bp) and were used in UTR analysis.</p>
               </text>
               <file name="1471-2105-7-S5-S2-S7.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>The distribution of coverage for all orthologous pairs relative to the number of motifs in these arrangements is represented in <supplr sid="S9">Additional File 9</supplr>: Figure S2. About two-third of the orthologous 5'UTR pairs tested shared 4&#8211;6 motifs in the conserved positional arrangement, whereas most of the rest shared 7&#8211;10. The distribution of identified motifs together with the conserved arrangement in the zebrafish caudal type homeobox transcription factor 4 (<it>cdx4</it>) (RefSeq:NM_131109) and its common carp ortholog, <it>cdx1 </it>(Genbank:<ext-link ext-link-type="gen" ext-link-id="X80668">X80668</ext-link>) are shown in Figure <figr fid="F4">4</figr> as an example.</p>
            <suppl id="S9">
               <title>
                  <p>Additional File 9</p>
               </title>
               <text>
                  <p>Distribution of orthologous clusters with given number of common motifs using Dragon motif builder (blue bars) and CLUSTALW (red bars).</p>
               </text>
               <file name="1471-2105-7-S5-S2-S9.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Collection of motifs identified in the <it>cdx1-cdx4 </it>ortholog gene pair in the common carp and zebrafish</p>
               </caption>
               <text>
                  <p>Collection of motifs identified in the <it>cdx1-cdx4 </it>ortholog gene pair in the common carp and zebrafish. The arrangement of motifs (black boxes) identified in the 5' UTR regions of common carp <it>cdx1 </it>and zebrafish <it>cdx4 </it>genes (caudal type homeobox transcription factor 4 orthologs) are shown on the left. A black arrow indicates the start of the first coding exon. Motif sequences are shown on the right.</p>
               </text>
               <graphic file="1471-2105-7-S5-S2-4"/>
            </fig>
            <p>A detailed UTR analysis is not within the scope of the present manuscript, therefore we propose a large-scale analysis to find out whether 5'UTR regions from different orthologous pairs share motifs from the same family. The presence of such shared motif families would suggest the existence of regulatory components common to both species suitable for further evaluation.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this study, we have demonstrated the value of using ESTs for comparative analysis of transcriptomes from species with vastly different amount of sequence information. For example, common carp ESTs were successfully mapped to un-annotated regions of the zebrafish genome demonstrating the value of using closely related species for sequence comparison. The existing cyprinid ESTs represent a useful resource for comparative genomics to understand the evolution of this family.</p>
         <p>Sequenced genomes are being integrated with functional information (e.g. expression data from microarray hybridisations, gene ontologies, etc.) to improve the efficiency of data mining. However, integrating fragmented genomic data for non-sequenced genomes remain a challenge for scientists who want to leverage inter-species comparisons. We suggest that there is a need to co-ordinate the isolated "in-house" integration attempts across laboratories in order to maximize and improve the quality of the information content that is currently under-utilized.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Isolation of differentiating testis from common carp individuals</p>
            </st>
            <p>Androgenetic common carp "supermales" (YY; <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>) have been crossed with wild type females (XX) to give rise to an all-male offspring population. (This approach allows for testis isolation without the need for sexing the fish.) The gonad has been isolated from a minimum of 6 individuals at 59/60 (a mixture of 59 and 60 days-old individuals), 70, 80 and 100 dpf, respectively. One of the two gonads from each individual has been processed for histological analyses (data not shown), while the other one has been stored in RNAlater (Ambion) for the use of RNA isolation.</p>
         </sec>
         <sec>
            <st>
               <p>Construction of cDNA libraries from the differentiating carp testis</p>
            </st>
            <p>Total RNA was isolated from the testis of 59/60, 80 and 100 days-old individuals, respectively. Full-length cDNA was synthesized using Creator Smart Library Construction Kit (Clontech) according to the manufacturer's instruction. After <it>SfiI </it>restriction enzyme digestion the adaptors and short cDNAs were removed by ChromaSpin 400 column (Clontech). The size fractionated cDNA pool was then cloned into a pBluescript based vector (detailed map is available on request) and transformed into <it>E. coli </it>XL10-Gold cells. Clones were picked into thirty, twenty and ten 96-well plates from the libraries generated from testes collected at 60, 80 and 100 dpf, respectively, and their insert was sequenced using M13 forward primer as described in <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
            <p>Total RNA was isolated from the testis of 70 and 100 day old individuals, respectively. Two sets of subtractive hybridizations were performed: 70 dpf male gonad (driver) from 100 dpf testis (tester), and 100 dpf testis (d) from 70 dpf male gonad (t). The PCR-Select&#8482; cDNA subtraction kit (Clontech) was used to enrich for developmental stage-specific fragments from the SMART cDNA template according to the recommendations of the manufacturer. The selectively amplified cDNA fragments (in average 400&#8211;800 bp in length) were ligated into pGEM-T (Promega) cloning vector. In total 2,500 clones have been picked from the two libraries and their insert was sequenced using M13 forward primer.</p>
         </sec>
         <sec>
            <st>
               <p>Sequence acquisition and EST clustering</p>
            </st>
            <p>A total of 10,620 common carp ESTs, sequenced from a range of tissues other than gonad, were downloaded from GenBank (26 April 2005); 9,303 of those sequences are also available from CarpBASE <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> (see <supplr sid="S1">Additional File 1</supplr>: Table S1 for details of clone origins). They were combined with 652 mRNAs from GenBank and with 6,050 gonad-derived common carp ESTs generated in our labs within the framework of this project (Fig. <figr fid="F1">1</figr>). Low quality regions were trimmed at the 3' end of ESTs prior to masking against libraries of repeats, mitochondrial and ribosomal sequences using RepeatMasker <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Sequences that comprised at least 70% unmasked nucleotides (10,283 GenBank and CarpBASE ESTs and mRNAs, 5,073 TLL ESTs) were retained for further analysis. (The processed TLL ESTs were submitted to GenBank and can be found under the following IDs: DW719352-DW724424.) The combined EST data set was clustered using the STACKPACK clustering tools <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp> on HPCompaq Alpha ES40 architecture.</p>
         </sec>
         <sec>
            <st>
               <p>Functional characterization of common carp transcripts</p>
            </st>
            <p>Common carp transcripts (clusters and singletons) were partitioned into ORF- and nonORF-containing sequences using ESTScan <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. The ORF-containing transcripts were annotated for protein domains and functional sites by matching them against the PFAM, PROSITE and PRINTS databases <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp> using hmmpfam, a program within the HMMER package that uses hidden Markov models to do sensitive searching of a protein database <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The protein domains were mapped to gene ontology categories using GO tables <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Mapping of common carp transcripts to the zebrafish genome</p>
            </st>
            <p>In order to further categorize the common carp transcripts they were searched against the zebrafish genome assembly (version 5). The possibility of multigene families within EST clusters allow for common carp clusters to map to multiple zebrafish genomic locations. A single high quality zebrafish genomic location was identified for each mapped common cluster in order to screen for novel genes and potential alternative splice variants.</p>
            <p>Transcripts that map to the zebrafish genome with BLAST E-value of at least 1e-04 where passed through a set of stringent filters as defined in <supplr sid="S8">Additional File 8</supplr>: Figure S1 in order to identify a single zebrafish genomic location for each of the mapped common carp clusters. The best zebrafish chromosome location for each EST in a common carp cluster was considered: the zebrafish chromosome locus shared by all ESTs within a cluster was chosen as the mapped genomic locus for the corresponding common carp cluster. EST clusters that represented best hits to different chromosome locations for constituent ESTs were screened for a common zebrafish chromosome hit by considering the top five best hits for each EST in a cluster. A common zebrafish chromosome hit identified in the top five best hits was assigned as the unique map location for the common carp cluster. Mapped common carp clusters were not considered if there was not at least one zebrafish chromosome hit shared among all the ESTs in a cluster. All common carp transcripts that passed these filtering criteria were aligned to the specific segment of the overlapping zebrafish genome using EST2GENOME <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
            <p>Exon-intron boundaries were extracted from the EST2GENOME results and served as a DAS track on the ensembl browser <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
            <p>Genome-aligned common carp ESTs were classified according to one of five criteria that were satisfied on the plus or minus strand. Common carp ESTs overlapped (1) zebrafish coding regions: exons corresponding to an Ensembl gene or zebrafish EST; or (2) zebrafish introns: the entire genome-aligned common carp EST was contained within the intronic region(s) of a zebrafish gene; or (3) non-zebrafish exons: common carp ESTs mapped to regions of the zebrafish genome that overlapped non-zebrafish cDNA; (4) non-zebrafish introns: the non-coding portions of cDNA or proteins aligned to the zebrafish genome; (5) intergenic: regions of the zebrafish genome void of any annotations; (6) <it>ab initio </it>predictions: common carp ESTs mapped to regions of the zebrafish genome with an <it>in silico </it>gene prediction only.</p>
         </sec>
         <sec>
            <st>
               <p>Comparing testis derived common carp sequences with zebrafish and fathead minnow EST data</p>
            </st>
            <p>Testis-only transcripts for common carp were defined as clusters or singletons represented by ESTs obtained exclusively from common carp testis cDNA libraries. Gonad derived genes for zebrafish were sampled from the UniGene zebrafish collection (build 91) where UniGene clusters contained ESTs that were sampled from zebrafish testis or ovary cDNA libraries. Common carp testis-derived transcripts were searched against the zebrafish gonad-derived UniGene dataset using BLASTN with (i) an E-value &lt; 1e-04; and (ii) sequence overlap where 40% of the query sequence overlapped the matching database sequence. The common carp transcripts without identity to zebrafish gonad derived sequences were searched against the remainder of the zebrafish UniGene build 91 using an E-value &lt; 1e-04 but without the requirement for 50% of the query sequence overlapping the database sequence. This relaxed criteria resulted in the identification of fewer common carp ESTs without homologous zebrafish ESTs in UniGene (build 91). However, these common carp ESTs provide a minimum dataset of testis-derived sequences not sampled by the zebrafish EST collection. The resulting "unique" common carp transcripts were searched against the fathead minnow UniGene (build 91) EST data using the same criteria as used for zebrafish.</p>
         </sec>
         <sec>
            <st>
               <p>Acquisition of sequence data for common carp and zebrafish orthologs and paralogs</p>
            </st>
            <p>A total 652 common carp mRNA sequences were downloaded from GenBank. About 292 mRNAs represented partial mRNA sequences and were removed. The sequences corresponding to the remaining 360 mRNA records in GenBank were searched against NCBI's non-redundant database using protein-protein BLAST (blastp; <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>). The BLAST results were filtered for a significant sequence match to zebrafish (E-value &lt; 1e-05) and matching zebrafish mRNAs that were partial sequences was filtered. The remaining 183 common carp and zebrafish homologous pairs were screened manually for orthologous relationships using cross-linked information including publications, curated annotations and filtering for redundant GenBank records. Eventually 120 pairs of orthologous genes were selected for sequence comparison between coding and non-coding regions (<supplr sid="S7">Additional File 7</supplr>: Table S7) and a subset containing 48 pairs, plus six additional paralogs (all with at least 50 nucleotides upstream of the first protein coding exon) was used for motif searches (highlighted sequences in <supplr sid="S7">Additional File 7</supplr>: Table S7).</p>
            <p>First we analyzed the sequence similarity among the coding regions and the UTRs for the orthologous gene set. At the nucleotide level, sequence conservation was observed more often in the CDS regions, followed by the 5' UTR and 3' UTR regions, respectively (<supplr sid="S10">Additional File 10</supplr>: Figure S3). Specifically, 75% of the orthologous pairs are captured when we set a sequence identity threshold of 80% at the CDS and protein levels. In comparison, only 25% of the 5' UTR sequences are captured under the same conditions (<supplr sid="S10">Additional File 10</supplr>: Fig. S3). The threshold of 80% sequence identity was implemented for subsequent BLAST searches of common carp ESTs against the zebrafish genome assembly.</p>
            <suppl id="S10">
               <title>
                  <p>Additional File 10</p>
               </title>
               <text>
                  <p>Percent sequence identity between common carp and zebrafish orthologous proteins, CDS, 5' UTR and 3' UTR regions.</p>
               </text>
               <file name="1471-2105-7-S5-S2-S10.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p><it>Ab-initio </it>motif identification, motif arrangement and 5'UTR sequence similarity</p>
            </st>
            <p>For the identification of motifs in 5'UTR regions, we compared the efficiency of Dragon Motif Builder system <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> with a local alignment method, ClustalW <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. With the Dragon Motif Builder we searched for any motif with the length between 6 bp and 10 bp, used the matrix score threshold of 0.9, and searched for up to 10 motifs in the two sequences of a given orthologous pair (in some cases the 5'UTR regions were very short providing not enough sequence length to harbour all 10 motifs). For ClustalW the common motifs were manually identified and restricted to the same criteria as those used by Dragon Motif Builder (motifs of length 6 to 10 bp). A significant difference was observed showing that ClustalW was not able to identify sufficient similarity between the ortholog sequences in the 5'UTR regions (<supplr sid="S9">Additional File 9</supplr>: Figure S2) as the segments that contain similar arrangement of common motifs between the two species were not residing at similar genomic locations.</p>
            <p>Once the motifs were identified, we analyzed the motif arrangements. We selected the group of motifs that contained the largest number of common motifs, but retained the same positional arrangement in the two species (see Fig. <figr fid="F4">4</figr> for a specific example). Thus each of the ortholog pairs was screened for such a representative motif arrangement. We used the number of motifs in the representative arrangements as a possible measure of similarity between the 5'UTR regions. In most cases, the regions where this arrangement had been spotted, was found at significantly different distances from the starting codon.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>AC had designed the bioinformatics pathway, performed most computational analyses, generated most tables and figures and took part in the writing of the manuscript. RB has contributed to the experimental design, constructed the full-length libraries and took part in the writing of the manuscript. HS has participated in the bioinformatics analysis of the data and in the maintenance of the cyprinid EST database of TLL. HK has supervised the generation of YY androgenic common carp line, generated the monosex populations, isolated the testis samples from them and contributed to the experimental design. LO initiated the project on comparative analysis of cyprinid ESTs, contributed to the experimental design and took part in the writing of the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors thank Aarti Giri, James A. Hill, Mei Yin Ho, Balamuragan Kumarasamy, Yang Li and Tina Eyre for their technical help. They also acknowledge the help of Oliver Bezuidt, Cameron MacPherson and Vladimir Bajic for the comparative analysis of homologous 5'UTRs from common carp and zebrafish as well as Vladimir Bajic's helpful comments and corrections on an earlier version of the manuscript.</p>
            <p>This project was supported from internal funding by Temasek Life Sciences Laboratory.</p>
            <p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 7, Supplement 5, 2006: APBioNet &#8211; Fifth International Conference on Bioinformatics (InCoB2006). The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2105/7?issue=S5</url>.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Identification of genes in human genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <publisher>[Ph.D. Thesis]. Stanford, CA, USA.: Stanford University</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Prediction of complete gene structures in human genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <fpage>78</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.0951</pubid>
                  <pubid idtype="pmpid" link="fulltext">9149143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>GeneWise and GenomeWise</p>
            </title>
            <aug>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>988</fpage>
            <lpage>995</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.1865504</pubid>
                  <pubid idtype="pmpid" link="fulltext">15123596</pubid>
                  <pubid idtype="pmcid">479130</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Genome annotation assessment in <it>Drosophila melanogaster</it></p>
            </title>
            <aug>
               <au>
                  <snm>Reese</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Hartzell</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Ohler</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Research</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>483</fpage>
            <lpage>501</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310877</pubid>
                  <pubid idtype="pmpid" link="fulltext">10779488</pubid>
                  <pubid idtype="doi">10.1101/gr.10.4.483</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Prediction of genetic structure in eukaryotic DNA using reference point logistic regression and sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Hooper</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wishart</snm>
                  <fnm>DS</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>425</fpage>
            <lpage>438</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.5.425</pubid>
                  <pubid idtype="pmpid" link="fulltext">10871265</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Ensembl 2006</p>
            </title>
            <aug>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Caccamo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>L</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D556</fpage>
            <lpage>561</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347495</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381931</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj133</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>An overview of Ensembl</p>
            </title>
            <aug>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Bevan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Caccamo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Research</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>925</fpage>
            <lpage>928</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">479121</pubid>
                  <pubid idtype="pmpid" link="fulltext">15078858</pubid>
                  <pubid idtype="doi">10.1101/gr.1860604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>FishBase</p>
            </title>
            <url>http://www.fishbase.org/</url>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Fishes of the World</p>
            </title>
            <aug>
               <au>
                  <snm>Nelson</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <publisher>New York, NY, USA: Wiley</publisher>
            <pubdate>1994</pubdate>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The genetics and genomics of cyprinids</p>
            </title>
            <aug>
               <au>
                  <snm>Orban</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>QJ</fnm>
               </au>
            </aug>
            <source>Genome Mapping in Fishes and Aquatic Animals</source>
            <publisher>Berlin Germany: Springer Verlag</publisher>
            <editor>Kole CR Kocher T</editor>
            <pubdate>2006</pubdate>
            <inpress/>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Molecular genetic aspects of tetraploidy in the common carp <it>Cyprinus carpio</it></p>
            </title>
            <aug>
               <au>
                  <snm>Larhammar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Risinger</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Mol Phylogenet Evol</source>
            <pubdate>1994</pubdate>
            <volume>3</volume>
            <fpage>59</fpage>
            <lpage>68</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/mpev.1994.1007</pubid>
                  <pubid idtype="pmpid" link="fulltext">8025730</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Recent duplication of the common carp (<it>Cyprinus carpio </it>L.) genome as revealed by analyses of microsatellite loci</p>
            </title>
            <aug>
               <au>
                  <snm>David</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Blum</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Feldman</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Lavi</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Hillel</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <fpage>1425</fpage>
            <lpage>1434</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msg173</pubid>
                  <pubid idtype="pmpid" link="fulltext">12832638</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Embryonic epsilon and gamma globin genes of a prosimian primate (<it>Galago crassicaudatus</it>). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints</p>
            </title>
            <aug>
               <au>
                  <snm>Tagle</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Koop</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Goodman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Slightom</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Hess</snm>
                  <fnm>DL</fnm>
               </au>
               <etal/>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>1988</pubdate>
            <volume>203</volume>
            <fpage>439</fpage>
            <lpage>455</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(88)90011-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">3199442</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Highly conserved syntenic blocks at the vertebrate Hox loci and conserved regulatory elements within and outside Hox gene clusters</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Koh</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Tay</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Venkatesh</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <fpage>6994</fpage>
            <lpage>6999</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0601492103</pubid>
                  <pubid idtype="pmpid" link="fulltext">16636282</pubid>
                  <pubid idtype="pmcid">1459007</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Comparative analyses of multi-species sequences from targeted genomic regions</p>
            </title>
            <aug>
               <au>
                  <snm>Thomas</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Touchman</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Blakesley</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Bouffard</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Beckstrom-Sternberg</snm>
                  <fnm>SM</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>424</volume>
            <fpage>788</fpage>
            <lpage>793</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01858</pubid>
                  <pubid idtype="pmpid" link="fulltext">12917688</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Novel relationships among ten fish model species revealed based on a phylogenomic analysis using ESTs</p>
            </title>
            <aug>
               <au>
                  <snm>Steinke</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Salzburger</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Evolution</source>
            <pubdate>2006</pubdate>
            <volume>62</volume>
            <fpage>772</fpage>
            <lpage>784</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00239-005-0170-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">16752215</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>UniGene</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene</url>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The phylogenetic position of the zebrafish <it>(Danio rerio)</it>, a model system in developmental biology &#8211; an invitation to the comparative method</p>
            </title>
            <aug>
               <au>
                  <snm>Meyer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Biermann</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Orti</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proceedings of the Royal Society of London Series B-Biological Sciences</source>
            <pubdate>1993</pubdate>
            <volume>252</volume>
            <fpage>231</fpage>
            <lpage>236</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1098/rspb.1993.0070</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>dbEST &#8211; Database for "expressed sequence tags"</p>
            </title>
            <aug>
               <au>
                  <snm>Boguski</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Lowe</snm>
                  <fnm>TMJ</fnm>
               </au>
               <au>
                  <snm>Tolstoshev</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Nature Genetics</source>
            <pubdate>1993</pubdate>
            <volume>4</volume>
            <fpage>332</fpage>
            <lpage>333</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng0893-332</pubid>
                  <pubid idtype="pmpid" link="fulltext">8401577</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>dbEST</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/dbEST/</url>
         </bibl>
         <bibl id="B21">
            <title>
               <p>CarpBASE</p>
            </title>
            <url>http://legr.liv.ac.uk/carpbase/index.htm</url>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Origin and domestication of the wild carp, <it>Cyprinus carpio </it>&#8211; from Roman gourmets to the swimming flowers</p>
            </title>
            <aug>
               <au>
                  <snm>Balon</snm>
                  <fnm>EK</fnm>
               </au>
            </aug>
            <source>Aquaculture</source>
            <pubdate>1995</pubdate>
            <volume>129</volume>
            <fpage>3</fpage>
            <lpage>48</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0044-8486(94)00227-F</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>About the oldest domesticates among fishes</p>
            </title>
            <aug>
               <au>
                  <snm>Balon</snm>
                  <fnm>EK</fnm>
               </au>
            </aug>
            <source>Journal of Fish Biology</source>
            <pubdate>2004</pubdate>
            <volume>65</volume>
            <fpage>1</fpage>
            <lpage>27</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1111/j.0022-1112.2004.00563.x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A review of genetic improvement of the common carp (<it>Cyprinus carpio </it>L.) and other cyprinids by crossbreeding, hybridization and selection</p>
            </title>
            <aug>
               <au>
                  <snm>Hulata</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Aquaculture</source>
            <pubdate>1995</pubdate>
            <volume>129</volume>
            <fpage>143</fpage>
            <lpage>155</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0044-8486(94)00244-I</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Genome and gene manipulation in the common carp</p>
            </title>
            <aug>
               <au>
                  <snm>Horvath</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Orban</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Aquaculture</source>
            <pubdate>1995</pubdate>
            <volume>129</volume>
            <fpage>157</fpage>
            <lpage>181</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0044-8486(94)00325-I</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Comparative analysis of the testis and ovary transcriptomes in zebrafish by combining experimental and computational tools</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Chia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Bartfai</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Christoffels</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yue</snm>
                  <fnm>GH</fnm>
               </au>
               <etal/>
            </aug>
            <source>Comparative and Functional Genomics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>403</fpage>
            <lpage>418</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/cfg.418</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>GenBank</p>
            </title>
            <aug>
               <au>
                  <snm>Benson</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Karsch-Mizrachi</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Ostell</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D16</fpage>
            <lpage>20</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347519</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381837</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj157</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Iseli</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jongeneel</snm>
                  <fnm>CV</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <pubdate>1999</pubdate>
            <fpage>138</fpage>
            <lpage>148</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10786296</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>The <it>Danio rerio </it>Sequencing Project</p>
            </title>
            <url>http://www.sanger.ac.uk/Projects/D_rerio/</url>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Cyprinid Genome Browser @ TLL</p>
            </title>
            <url>http://www.bioinformatics.tll.org.sg/Cyprinids/CyprinidMapping.html</url>
         </bibl>
         <bibl id="B31">
            <title>
               <p>EST2GENOME</p>
            </title>
            <url>http://bioweb.pasteur.fr/seqanal/interfaces/est2genome.html</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>A genomic view of alternative splicing</p>
            </title>
            <aug>
               <au>
                  <snm>Modrek</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nature Genetics</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>13</fpage>
            <lpage>19</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng0102-13</pubid>
                  <pubid idtype="pmpid" link="fulltext">11753382</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Novel splice variants associated with one of the zebrafish <it>dnmt3 </it>genes</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Dueck</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Mhanni</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>McGowan</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>BMC Developmental Biology</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <fpage>23</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1274307</pubid>
                  <pubid idtype="pmpid" link="fulltext">16236173</pubid>
                  <pubid idtype="doi">10.1186/1471-213X-5-23</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Variation in sequence and organization of splicing regulatory elements in vertebrate genes</p>
            </title>
            <aug>
               <au>
                  <snm>Yeo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hoon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Venkatesh</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>15700</fpage>
            <lpage>15705</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0404901101</pubid>
                  <pubid idtype="pmpid" link="fulltext">15505203</pubid>
                  <pubid idtype="pmcid">524216</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The contribution of exon-skipping events on chromosome 22 to protein coding diversity</p>
            </title>
            <aug>
               <au>
                  <snm>Hide</snm>
                  <fnm>WA</fnm>
               </au>
               <au>
                  <snm>Babenko</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>van Heusden</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Seoighe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kelso</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1848</fpage>
            <lpage>1853</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">311148</pubid>
                  <pubid idtype="pmpid" link="fulltext">11691849</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Transcription factor map alignment of promoter regions</p>
            </title>
            <aug>
               <au>
                  <snm>Blanco</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Messenguer</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>PLoS Computational Biology</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e49</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1464811</pubid>
                  <pubid idtype="pmpid" link="fulltext">16733547</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020049</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Viable androgenetic YY genotypes of common carp (<it>Cyprinus carpio </it>L.)</p>
            </title>
            <aug>
               <au>
                  <snm>Bongers</snm>
                  <fnm>ABJ</fnm>
               </au>
               <au>
                  <snm>Zandieh-Doulabi</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Richter</snm>
                  <fnm>CJJ</fnm>
               </au>
               <au>
                  <snm>Komen</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Journal of Heredity</source>
            <pubdate>1999</pubdate>
            <volume>90</volume>
            <fpage>195</fpage>
            <lpage>198</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/jhered/90.1.195</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Repeatmasker</p>
            </title>
            <url>http://www.repeatmasker.org/</url>
         </bibl>
         <bibl id="B39">
            <title>
               <p>A comprehensive approach to clustering of expressed human gene sequence: The sequence tag alignment and consensus knowledge base</p>
            </title>
            <aug>
               <au>
                  <snm>Miller</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Christoffels</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Gopalakrishnan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Burke</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ptitsyn</snm>
                  <fnm>AA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Research</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>1143</fpage>
            <lpage>1155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310831</pubid>
                  <pubid idtype="pmpid" link="fulltext">10568754</pubid>
                  <pubid idtype="doi">10.1101/gr.9.11.1143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>STACK: Sequence Tag Alignment and Consensus Knowledgebase</p>
            </title>
            <aug>
               <au>
                  <snm>Christoffels</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>van Gelder</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Greyling</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hide</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>234</fpage>
            <lpage>238</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29830</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125101</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.234</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>PROSITE: a documented database using patterns and profiles as motif descriptors</p>
            </title>
            <aug>
               <au>
                  <snm>Sigrist</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Cerutti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gattiker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Falquet</snm>
                  <fnm>L</fnm>
               </au>
               <etal/>
            </aug>
            <source>Briefings in Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>265</fpage>
            <lpage>274</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bib/3.3.265</pubid>
                  <pubid idtype="pmpid" link="fulltext">12230035</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>The Pfam protein families database</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Coin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Finn</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Hollich</snm>
                  <fnm>V</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D138</fpage>
            <lpage>141</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308855</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681378</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh121</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>The InterPro Database, 2003 brings increased coverage and new features</p>
            </title>
            <aug>
               <au>
                  <snm>Mulder</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Attwood</snm>
                  <fnm>TK</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>D</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>315</fpage>
            <lpage>318</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165493</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520011</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg046</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Profile hidden Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>755</fpage>
            <lpage>763</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.9.755</pubid>
                  <pubid idtype="pmpid" link="fulltext">9918945</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Gene ontology: Tool for the unification of biology</p>
            </title>
            <aug>
               <au>
                  <snm>Gene</snm>
                  <fnm>OC</fnm>
               </au>
            </aug>
            <source>Nature Genetics</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>25</fpage>
            <lpage>29</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/75556</pubid>
                  <pubid idtype="pmpid" link="fulltext">10802651</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>BLAST</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/BLAST/</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>An algorithm for <it>ab initio </it>DNA motif detection</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chowdhary</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kassim</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bajic</snm>
                  <fnm>VB</fnm>
               </au>
            </aug>
            <source>Information Processing and Living Systems</source>
            <publisher>World Scientific, Singapore</publisher>
            <editor>Bajic VB, Tan TW</editor>
            <pubdate>2005</pubdate>
            <fpage>611</fpage>
            <lpage>614</lpage>
         </bibl>
         <bibl id="B48">
            <title>
               <p>ClustalW</p>
            </title>
            <url>http://www.ebi.ac.uk/clustalw/</url>
         </bibl>
      </refgrp>
   </bm>
</art>
