<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-8-93</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Identification and characterization of insect-specific proteins by genome data analysis</p>
         </title>
         <aug>
            <au id="A1" ce="yes">
               <snm>Zhang</snm>
               <fnm>Guojie</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>zhanggj@genomics.org.cn</email>
            </au>
            <au id="A2" ce="yes">
               <snm>Wang</snm>
               <fnm>Hongsheng</fnm>
               <insr iid="I1"/>
               <email>wanghs@ioz.ac.cn</email>
            </au>
            <au id="A3" ce="yes">
               <snm>Shi</snm>
               <fnm>Junjie</fnm>
               <insr iid="I2"/>
               <email>shijj@genomics.org.cn</email>
            </au>
            <au id="A4">
               <snm>Wang</snm>
               <fnm>Xiaoling</fnm>
               <insr iid="I2"/>
               <email>wangxl@genomics.org.cn</email>
            </au>
            <au id="A5">
               <snm>Zheng</snm>
               <fnm>Hongkun</fnm>
               <insr iid="I2"/>
               <email>zhenghk@genomics.org.cn</email>
            </au>
            <au id="A6">
               <snm>Wong</snm>
               <mnm>Ka-Shu</mnm>
               <fnm>Gane</fnm>
               <insr iid="I2"/>
               <email>gksw@u.washington.edu</email>
            </au>
            <au id="A7">
               <snm>Clark</snm>
               <fnm>Terry</fnm>
               <insr iid="I4"/>
               <email>kedali@gmail.com</email>
            </au>
            <au id="A8">
               <snm>Wang</snm>
               <fnm>Wen</fnm>
               <insr iid="I3"/>
               <email>wwang@mail.kiz.ac.cn</email>
            </au>
            <au id="A9">
               <snm>Wang</snm>
               <fnm>Jun</fnm>
               <insr iid="I2"/>
               <insr iid="I5"/>
               <email>wangj@genomics.org.cn</email>
            </au>
            <au id="A10" ca="yes">
               <snm>Kang</snm>
               <fnm>Le</fnm>
               <insr iid="I1"/>
               <email>lkang@ioz.ac.cn</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology Chinese Academy of Sciences, Haidian Beijing 100080, China</p>
            </ins>
            <ins id="I2">
               <p>Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China</p>
            </ins>
            <ins id="I3">
               <p>CAS-Max Plank Junior Research Group, Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Chinese Academy of Science (CAS), Kunming, Yunnan 650223, China</p>
            </ins>
            <ins id="I4">
               <p>Department of Electrical Engineering and Computer Science, The University of Kansas, 2001 Eaton Hall, Lawrence, KS 66044, USA</p>
            </ins>
            <ins id="I5">
               <p>Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230, Odense M, Denmark</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>93</fpage>
         <url>http://www.biomedcentral.com/1471-2164/8/93</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17407609</pubid>
               <pubid idtype="doi">10.1186/1471-2164-8-93</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>18</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>04</day>
               <month>4</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>04</day>
               <month>4</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Zhang et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Insects constitute the vast majority of known species with their importance including biodiversity, agricultural, and human health concerns. It is likely that the successful adaptation of the Insecta clade depends on specific components in its proteome that give rise to specialized features. However, proteome determination is an intensive undertaking. Here we present results from a computational method that uses genome analysis to characterize insect and eukaryote proteomes as an approximation complementary to experimental approaches.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Homologs in common to <it>Drosophila melanogaster</it>, <it>Anopheles gambiae</it>, <it>Bombyx mori, Tribolium castaneum</it>, and <it>Apis mellifera </it>were compared to the complete genomes of three non-insect eukaryotes (opisthokonts) <it>Homo sapiens</it>, <it>Caenorhabditis elegans </it>and <it>Saccharomyces cerevisiae</it>. This operation yielded 154 groups of orthologous proteins in <it>Drosophila </it>to be insect-specific homologs; 466 groups were determined to be common to eukaryotes (represented by three opisthokonts). ESTs from the hemimetabolous insect <it>Locust migratoria </it>were also considered in order to approximate their corresponding genes in the insect-specific homologs. Stress and stimulus response proteins were found to constitute a higher fraction in the insect-specific homologs than in the homologs common to eukaryotes.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The significant representation of stress response and stimulus response proteins in proteins determined to be insect-specific, along with specific cuticle and pheromone/odorant binding proteins, suggest that communication and adaptation to environments may distinguish insect evolution relative to other eukaryotes. The tendency for low <it>Ka/Ks </it>ratios in the insect-specific protein set suggests purifying selection pressure. The generally larger number of paralogs in the insect-specific proteins may indicate adaptation to environment changes. Instances in our insect-specific protein set have been arrived at through experiments reported in the literature, supporting the accuracy of our approach.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Insects constitute nearly 80% of species on earth and are among the most diverse group of organisms in the history of life, giving them considerable potential to provide insight into evolutionary mechanisms. Insects, with their large number of species, their biomass, diversity of adaptation, and ecological impact, support the structure and function of ecosystem and biodiveristy on the lands of the earth. Numerous crops rely on insects for pollination, with the importance of insects extending into other agricultural and human health concerns. Insects have been in existence for at least 400 million years, making them among the earliest land animals. Though nearly one million insect species have been classified and named, their actual number is believed to be between 2.5 and 10 million. It is widely accepted that insects diverged as members of one of the largest subphyla in arthropods more than 390 million years ago. During this time, insects experienced rapid evolution and a radiation that is considered faster than any other group <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, migrating into nearly all available environmental niches except the benthic zone <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Mitochondrial DNA strongly supports an insect-crustacean clade as a sister group, which excludes the other arthropod subphyla collectively known as the myriapods <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The insects are a monophyletic group, a universally held view supported by morphological and molecular features.</p>
         <p>The structure of an organism is an outgrowth of development tailored to meet functional demands in an idiosyncratic evolutionary history. Like other segmented animals, insects are composed of a series of repeated units called metameres. Extant arthropods share many taxonomical characteristics, such as an exoskeleton, jointed appendages, and reduced coeloms and hemocoels. The segments of the insect body are organized into three major tagmata unique to this subclass: the head, thorax, and abdomen <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. The thorax has three pair of legs, and in pterygotes, the wings. In the abdomen, we find the presence of an ovipositor in females. In addition to the macro-scale features mentioned above, other defining features of the Insecta include: the loss of musculature and the presence of the Johnsonton's organ in the antenna, loss of articulations between the coxae and the sterna, sub-segmentation of the tarsus into units called tarsomeres, articulation of the pretarsal claws with the apical-most tarsomere <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, and the presence, at least primitively, of a long terminal filament <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Insects are one of only four lineages of animals with powered flight, the others being pterosaurs, birds, and bats. Wings refine insect design, vastly improving mobility, dispersal, and complex behaviors to adapt to environmental challenges. It is widely held that insects evolved flight just once, at least 100 million years before pterosaurs, perhaps 170 million years ago <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Other noteworthy features include the development of the posterior tentorium into a tranverse bar, and metamorphism and segmentation of metameres <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>.</p>
         <p>It is likely that the specialized features of the Insecta clade are based on components specific to its proteome. Characterization of this protein set should improve understanding of the molecular basis for the diversification of insects and their extensive success in ecological niches. Toward elucidating this molecular basis, we have characterized the eukaryote and insect proteomes. The large number of eukaryote genome sequences now available, including various insect genomes, makes it possible to characterize proteomes computationally. In this work, we utilized the insect genome sequences of fruit fly, mosquito, silk worm, beetle, honeybee, locust ESTs, and the non-insect eukaryote genomes of nematode, human, and yeast. (The insect-species in our study cover <it>holometabolous </it>and <it>hemimetabolous </it>development.) Since our approach utilizes genome sequence for approximating the proteome, the resolution of the proteome characterization improves as more genomes become available. This rapid characterization of proteomes through computation facilitates rational hypothesis generation and experiment design in applied research in many areas, such as biodiversity, agriculture and human health.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Insect and Eukaryote protein sets</p>
            </st>
            <p>We modeled the insect proteome by selecting the subset of <it>Drosophila </it>protein sequences with homology to predicted genes in all insect-species studied here. Similarly, we defined the subset in <it>Drosophila </it>common to the eukaryote species studied here: mosquito, silkworm, beetle, honeybee, human, nematode and yeast. Because at this time it is not possible to definitively determine the eukaryote and insect proteomes, estimates are useful for comparative assessments. Our protein sets were derived from a collection of 13,525 protein sequences established for <it>Drosophila melanogaster</it>, which we reduced to 10,018 orthologous groups; proteins with significant similarity were considered as singletons in our processing, since paralogs may have arisen after speciation.</p>
            <p>To determine the proteins in the <it>Drosophila </it>orthologous groups common to all insects studied here, called the <it>insect core set</it>, we used predicted proteins from insect genome sequences and EST sequences. We obtained 1346 orthologous groups from the intersection of the whole genomes of five <it>holometabolous </it>insects (see Methods). One aspect of our approximation is to use homologs to <it>Drosophila </it>proteins to characterize proteomes, implicitly assuming that function follows structure. This could contribute to differences in our characterization from the actual proteome, but it does not significantly detract from our use of the characterizations. We discuss further implications of our approximation in more detail below.</p>
            <p>Using the insect-core protein set, we removed proteins with significant similarity to any genome sequence in yeast, human, and nematode (see Methods). The remaining 154 orthologous groups (with 360 proteins) form the <it>insect-specific set</it>, and 73 of these groups are represented in the <it>hemimetabolous </it>insect locust ESTs [see Additional file <supplr sid="S1">1</supplr>]. The insect-specific set contains proteins with homology evidence to all insects studied here; in addition, these sequences are without significant similarity to the non-insect species. Since we are interested in genes and proteins in insects which developed in insects after their divergence from other eukaryotes, we searched entire non-insect eukaryotic genomes in alignments with the insect-core proteins in order to exclude remnants of common ancestral genes. To refine the insect-specific proteins, we removed proteins with similarity to non-insect proteins in the NCBI protein database as described in Methods (Figure <figr fid="F1">1</figr>). This reduced the 360 candidate insect-specific proteins to the final insect-specific set consisting of 51 proteins [see Additional file <supplr sid="S2">2</supplr>].</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Insect-specific proteins from five whole genomes of insects</b>. Proteins homologs in the five whole insect genomes were listed in this table.</p>
               </text>
               <file name="1471-2164-8-93-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Flowchart of computational analysis</p>
               </caption>
               <text>
                  <p><b>Flowchart of computational analysis</b>. The pipeline was based primarily on genome comparisons; insect core proteins were distilled from four insects putative protein sets, and were searched against non-insect genomes to arrive at the insect-specific proteins and eukaryote/opisthokont core proteins. Also see Figure 2.</p>
               </text>
               <graphic file="1471-2164-8-93-1"/>
            </fig>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p><b>Refined insect-specific proteins</b>. The refined 51 insect-specific proteins are listed in the table with <it>Ka/Ks</it>, Interpro annotations, GO terms, mutant phenotypes, and homologs with other insects. GO terms were downloaded from the Gene Ontology Consortium. Mutant phenotypes were downloaded from FlyBase. Proteins with significant mutant phenotypes are highlighted in red.</p>
               </text>
               <file name="1471-2164-8-93-S2.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>We found 466 proteins with homology to all eukaryotes considered in this study using methods similar to those above [see Additional file <supplr sid="S3">3</supplr>].</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p><b>Eukaryote/opisthokont core proteins</b>. A list of 466 eukaryote/opisthokont core proteins with horologes in five insects was presented.</p>
               </text>
               <file name="1471-2164-8-93-S3.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>As the eukaryotes used in this study are all opisthokont, this set of proteins should be properly considered opisthokont core proteins. Many of these eukaryotic core proteins &#8211; the opisthokont core proteins &#8211; are involved in housekeeping or general metabolic processes. We also defined 1850 proteins as <it>Drosophila </it>specific by eliminating proteins homologous to other insect proteins as discussed in Methods (Figure <figr fid="F2">2</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Clustering <it>Drosophila </it>proteins</p>
               </caption>
               <text>
                  <p><b>Clustering <it>Drosophila </it>proteins</b>. <it>Drosophila </it>proteins were clustered into paralogous groups based on their sequence similarity. Using methods described in the text, 1850 groups of <it>Drosophila </it>specific proteins make up 18% of fruitfly paralogous groups, and 1346 (13%) insect core proteins were identified. In the insect core set, 466 groups (5%) can be found in other eukaryotes, and 154 groups (1%) are insect specific.</p>
               </text>
               <graphic file="1471-2164-8-93-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>GO annotations and functional categories</p>
            </st>
            <p>We categorized proteins in the eukaryote (466 groups in opisthokont) and insect-specific sets (154 groups) using high-level gene ontology categories with results shown in Figure <figr fid="F2">2</figr>. In both the eukaryote and insect-specific sets, metabolic proteins constituted the highest fraction, 25% and 20%, respectively. Disproportionately represented categories are interesting to consider for candidate proteins that confer distinguishing characteristics. In the eukaryote/opisthokont set, genes responsible for processes such as cell division, cell motility, cell cycle, reproduction and cellular process are more highly represented by factors from about two to twenty. These proteins and their respective functional categories may distinguish insects less from eukaryotes/opisthokont than those proteins in categories that have a significant representation in the insect-specific set and are underrepresented in the eukaryotic/opisthokont set. These more highly represented categories in the insect-specific set are: larval development (2% in opisthokont, 4% in insect); defense response (0 in opisthokont, 6% in insect); and stress respone (0.2% in opisthokont, 6% in insect). What's more, a significant number of the insect-specific proteins were found to be related to pheromone/odorant binding proteins (OBP), insect cuticle proteins, and proline-rich proteins [see Aditional file 2].</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Biological process categories</p>
            </st>
            <p>Our analysis of the <it>eukaryote/opisthokont core </it>and <it>insect-specific </it>protein sets was based on functional categories representative of high-level GO designations. Metabolism is the largest category of our eukaryotic/opisthokont core and of the insect-specific proteins. Significantly larger categories for the insect-specific proteins relative to the eukaryote core are stimulus and defense response (Figure <figr fid="F3">3</figr>.). A representative insect-specific gene in the stimulus response category is PedIII/CG11390 which has been reported to function in sensory perception <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. In the eukaryote/opisthokont core proteins, the more highly represented insect-specific categories are not pronounced fractions thereby highlighting the insect-specific proteins as candidates for specialized roles. In the eukaryote/opisthokont core, other housekeeping processes such as cellular division, cell cycle and cellular organization processes constitute a larger fraction of the total protein set. The disproportionate distribution of the eukaryote/opisthokont core and insect-specific sets may be at the very foundation of insect evolution. It is important to note that the disproportionate distributions of functional types of proteins between insects and eukaryotes/opisthokont may be caused to some degree by the methodology; the small number of proteins in the insect-specific core may be caused by the limited number of insect genomes used, artificially underrepresenting the insect proteome. However, assuming an approximately representative distribution of unrepresented proteins makes it unlikely that the overrepresented categories are invalid.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Gene Ontology classifications</p>
               </caption>
               <text>
                  <p><b>Gene Ontology classifications</b>. Classification of insect specific proteins and eukaryote/opisthokont core proteins according to the <it>biological process </it>characterizations of the Gene Ontology System. Eukaryote/opisthokont core proteins are graphed with green bars and insect-specific proteins are shown with red bars. Plots show percentage differences for each category.</p>
               </text>
               <graphic file="1471-2164-8-93-3"/>
            </fig>
            <p>The five insects with whole genomes are all holometabolous and might not be representative of all insects. At present, a complete genome sequence for hemimetabolous has not been sequenced, most likely because hemimetabolous insects often have large genomes (more than 2 gigabases) <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Fortunately, 45,474 high quality EST sequences from the hemimetabolous insect <it>migratory locust </it>permit us to perform analysis with all insects <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. We determined the insect-specific orthologs in the locust ESTs to arrive at a collection of six sets of insect-spectific proteins. Our analysis found the functional distribution of the orthologous proteins in of the six insects to be similar with the functional distribution of the largest set from the five holometabolous insects [see Additional file <supplr sid="S2">2</supplr>].</p>
            <p>We have noted above, the computed insect-specific protein dataset is an approximatation dependent on available genome sequence. Inclusion of additional genomic data could alter the protein set. The lack of many representative outgroups might causes false positives, i.e. some proteins might be inaccurately included in our list. For example, the gene CG6895 related to immune function is identified as an insect-specific gene in this study, but its homolog was recently reported in the sea urchin <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Improved quality of genome sequences and gene annotations for the insects used in this study will improve the accuracy of our computed proteins sets <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Molecular function categories</p>
            </st>
            <p>A considerable number of the 51 insect-specific proteins were found to be related to insect cuticle proteins and pheromone/odorant binding proteins (OBP) [see Additional file <supplr sid="S2">2</supplr>]. Molting and metamorphosis are crucial processes in the developmental history of the insects involving cuticular proteins. Cuticular proteins are involved in important composite structural materials for insect cuticles, which provide protection, support, and locomotion; these prevent water loss via a wax layer, provide sites for waste product deposition, and protect from ultraviolet radiation <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Olfaction is essential to insect survival and reproduction, such as in location of food sources and mate selection. These olfactory driven behaviors contribute significantly to the ability of insects to adapt to the environment. The odorant-binding proteins, which compose the insect olfactory system, are involved in the recognition of odorants of plants by insects <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. The pheromone binding proteins (PBP), abundantly present in the sensillum lymph of pheromone-responsive antennal hairs, are thought to be important in the recognition and discrimination of species-specific pheromones <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. The olfactory system in insects evolved as a remarkably selective and sensitive system, approaching the theoretical limit for a detector. Even a single pheromone molecule is enough to elicit impulses at the olfactory neuron <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. The large number of odorant and olfactory proteins in the insect-specific set suggests that in the evolution and diversification of insects, communication and adaptation with the environment played key roles in shaping their morphological and physiological characteristics.</p>
            <p>Other insect-specific proteins in our insect-specific set have been found essential to development through experimental procedures <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>, supporting our insect-specific proteome characterization. Moreover, these have been found to be active in insects and are of interest for evolutionary reasons including their suspected roles in diversification. For example, the gene <it>sinuous </it>(CG10624), which is active in tracheal system development, can partially rescue the tracheal defects of sinuous mutants <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. The <it>Exuperantia (Exu) </it>protein in our insect-specific set is the earliest factor known to be required for the localization of <it>bicoid </it>mRNA to the anterior pole of the <it>Drosophila </it>oocyte. <it>Exu </it>is highly enriched in the sponge bodies; mutation of <it>exu </it>in <it>Drosophila </it>may result in defection of embryonic development <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. <it>Larval serum proteins </it>(<it>Lsp</it>), another type of protein in the insect-specific set, belonging to the hemocyanin superfamily. This family is thought to function as storage proteins that provide amino acids and energy during non-feeding periods of immature and adult development <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Low mutation rate of insect-specific proteins</p>
            </st>
            <p>It is widely accepted that all insects have arisen from a common ancestor that diverged from an aquatic arthropod more than 390 million years ago, and that they coevolved with a specific plant group <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Homologs to the insect-specific proteins should be present in the ancestor and be conserved by natural selection. To test this, we analyzed the ratio of the number of nonsynonymous substitutions per nonsynonymous site (<it>Ka</it>) to the number of synonymous substitutions per synonymous site (<it>Ks</it>) for the insect-specific proteins in <it>Drosophila</it>; in this analysis eukaryote/opisthokont core proteins and <it>Drosophila </it>specific proteins were used as controls. The high percentage of insect-specific proteins have a <it>Ka/Ks </it>ratio lower than 0.5 (Figure <figr fid="F4">4</figr>) suggesting negative selection in these proteins <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. As non-synonymous changes are more likely to be deleterious, under negative or purifying selection pressure, these substitutions were eliminated in functionally active proteins, which may have provided a steady protein complement for insects <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Furthermore, the higher <it>Ka/Ks </it>ratio of insect-specific proteins is on average greater than that of the eukaryote/opisthokont core proteins. This may reflect the later appearance of insect-specific set,relative to proteins in the common eukaryote ancestor.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p><it>Ka/Ks </it>distribution</p>
               </caption>
               <text>
                  <p><b><it>Ka/Ks </it>distribution</b>. Nonsynonymous and synonymous substitution rates (Ka and Ks) were estimated for <it>Drosophila </it>specific, insect-specific, and eukaryote/opisthokont core proteins. <it>Drosophila </it>specific proteins are shown in black, insect-specific proteins in red and eukaryote/opisthokont core proteins in green. (a) Cumulative percentage of <it>Ka/Ks </it>ratios; (b) <it>Ka/Ks </it>versus <it>Ks </it>ratios.</p>
               </text>
               <graphic file="1471-2164-8-93-4"/>
            </fig>
            <p>To determine whether these conserved genes appeared with low redundancy, we ascertained the number of paralogs in the insect-specific genes with the number of paralogs in the eukaryote/opisthokont core genes. Gene duplication is considered one of the principal mechanisms in generating new genes and redundant sequences of genes with the same function <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Duplicated sequences of established genes often degrade to pseudogenes because purifying selection preserves essential coding sequence, while non-essential duplicates may lose function through random mutations favorable to natural selection. The relationship between duplicates and their functional ancestor is not fully understood. Some authors suggest that the stronger selective constraints on housekeeping genes relative to tissue specific genes is not due to their lower genetic redundancy <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. However, our results agree with the observation of constrained duplication since most of the eukaryote/opisthokont core and insect-specific proteins are without paralogs (Figure <figr fid="F5">5</figr>). This suggests that genes with established function may tend to avoid duplication, thereby tolerating fewer genetic perturbations. However, the insect-specific proteins are inclined to arise from genes producing a greater number of paralogs, which is in contrast to proteins in the eukaryote/opisthokont core. This may confer insect adaptation to changes in the environment. For example, CG16799 and CG6421 have been found to function in defense response; both arise from paralogous groups in <it>Drosophila </it>with ten and four members, respectively.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Copy numbers of insect-specific proteins and eukaryote/opisthokont core proteins</p>
               </caption>
               <text>
                  <p><b>Copy numbers of insect-specific proteins and eukaryote/opisthokont core proteins</b>. This plot shows the distribution of proteins by copy numbers of insect-specific proteins and eukaryote/opisthokont core proteins, insect-specific proteins in red and eukaryote/opisthokont core proteins in green.</p>
               </text>
               <graphic file="1471-2164-8-93-5"/>
            </fig>
            <p>Our analysis suggests that our working set of insect-specific proteins had been shaped by strong natural selection, with environment as one of the selective influences.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>An analysis of the genetic basis of evolution and development in insects was performed by characterizing the <it>eukaryote/opisthokont core </it>and <it>insect-specific </it>proteomes through genome analysis. Studies of the conservation and divergence between different organisms can provide clues to the molecular basis of species diversity and adaptation. The characterization of proteomes based on genome sequences provides a rapid method to approximate and update putative proteomes as genome sequences become available. Using this approach, we isolated fifty insect-specific proteins, many supported by experimental studies.</p>
         <p>Proteins related to stress and immune responses constitute a significantly larger fraction of the proteins in our characterization of the insect-specific proteome, in contrast to our characterization of the eukaryote/opisthokont core proteome. The large component of olfaction and cuticle development proteins specific to the insect suggests the significance of communication and adaptation to the environment in insect evolution. Purifying selections in the evolution of insects were indicated in the analysis of nonsynonymous-to-synonymous substitution ratios, with a larger fraction of multi-paralog proteins possibly providing insects with an adaptive advantage over other eukaryotes. Due to the nature of our computatational method, our insect-specific proteins can increase or decrease with the inclusion of additional genome data from insects and non-insect species.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Sequence data</p>
            </st>
            <p>The protein sets in this work were founded on 18,282 protein sequences of <it>Drosophila melanogaster </it><abbrgrp><abbr bid="B30">30</abbr></abbrgrp> obtained from Ensembl <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Genes were predicted in genome sequences for <it>Anopheles gambiae </it>(mosquito) <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> and <it>Bombyx mori </it>(silkworm) <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. Proteins of <it>Tribolium castaneum </it>and <it>Apis mellifera </it><abbrgrp><abbr bid="B35">35</abbr></abbrgrp> were obtained from HGSC<abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Homologs to the insect protein sequences were isolated in annotated genomes of human <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, yeast <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> and nematode <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. We obtained the <it>Anopheles gambiae </it>(mosquito) genome annotated with 16112 proteins (anopheles-21.2b) from Ensembl. The annotated human genome sequence draft (hg17) was obtained from UCSC <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, the worm genome (celegans-21.116a) from Ensembl, and the yeast genome from Saccharomyces Genome Database SGD <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. Proteins where obtained for <it>D. yakuba </it>from FlyBase for use in <it>Ka/Ks </it>analysis. The locust (<it>Locusta migratoria</it>) UniGene collection with 12,161 ESTs and cDNA sequences was obtained from LocustDB <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B42">42</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Sequence analysis</p>
            </st>
            <p>Sequence alignment was performed with BLAST <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> using the BLOSUM62 scoring matrix and default parameters. Gene prediction was performed using the gene-finder algorithm <it>BGF </it>used in BGI GeneFinder <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> based on <it>GenScan </it><abbrgrp><abbr bid="B45">45</abbr></abbrgrp> and <it>FgeneSH </it><abbrgrp><abbr bid="B46">46</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Paralog definitions</p>
            </st>
            <p>We grouped homologous protein sequences into <it>paralogous groups</it>. Protein sequences were considered paralogous if their alignment had an E-value less than or equal to 1e-5 and the alignment covered 70% or more of one of the aligned proteins. We represented paralogous groups by the longest member in the group, with the size of the group determined by the number of unique sequences in it.</p>
         </sec>
         <sec>
            <st>
               <p>Proteome characterizations using genomic based pipeline</p>
            </st>
            <p>We defined protein sets based on <it>Drosophila </it>proteins in our processing pipeline to characterize proteomes. Similarity with genome sequences, predicted proteins, and ESTs was used to cull sets determined in the processing pipeline as described below. Thus, it is important to note that the various protein sets we computationally arrive at characterize insect and eukaryote proteomes through homology.</p>
            <p>The insect core set was arrived at by selecting proteins in the <it>Drosophila </it>protein data set with similarity to mosquito and silkworm protein sequences predicted by genome analysis, and with similarity to the locust EST sequence data. Protein sequences for predicted genes in silkworm and mosquito were aligned against fruit fly using blastp <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> and considered homologous with an E-value cutoff of 1e-5 or less; in addition, we required that the length of the aligned sequences be within 70% of each other (Figure <figr fid="F5">5</figr>).</p>
            <p>The insect-specific protein set was derived from the insect core set, where proteins <it>without </it>significant alignment to the genome sequences of human, nematode, or yeast were included (E-values of 1e-5 or less). In addition, sequences in the insect core set were retained for the insect-specific set if any alignment covered less than 30% of the insect protein sequence. The insect-specific proteins were further assessed against the NCBI protein database, retaining sequences without significant similarity and less than 30% alignment coverage with all non-insect proteins (Figure <figr fid="F5">5</figr>).</p>
            <p>Proteins in the insect core set with an E-value cutoff of 1e-5 or less in alignments with each of the non-insect eukaryotes, and involving 50% or more of the insect protein in the alignments, were included in the eukaryote core protein set.</p>
         </sec>
         <sec>
            <st>
               <p>Interpro annotation of insect proteins</p>
            </st>
            <p>Functional annotations for proteins in each of the working insect proteomes were determined using the annotation tool <it>Interproscan </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp> and Gene Ontology nomenclature <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. GO terms were downloaded from Gene Ontology Consortium.</p>
         </sec>
         <sec>
            <st>
               <p><it>Ka/Ks </it>ratio calculation</p>
            </st>
            <p>We selected the most similar orthologs to <it>Drosophila melanogaster </it>in the <it>Drosophila yakuba </it>proteome, YN00 <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, to calculate <it>Ka/Ks </it>ratios.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>GJZ carried out the sequence alignment, analysis of the data, drafting and revision of the manuscript. HSW participated in the study concept and drafted the manuscript. SJJ participated in the acquisition of data and sequence alignment and drafted the manuscript. WXL and ZHK participated in sequence alignment. GKSW, JW and LK conceived of the study and participated in its design. WW, TC and LK critically revised the manuscript and assessed results. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This project was supported by the National Basic Research Program of China (No:2006CB102002), Chinese Academy of Sciences (GJHZ0518), Ministry of Science and Technology under program CNGI-04-15-7A, National Natural Science Foundation of China (90208019; 90403130; 30221004), and China National Grid. Other support came from Danish Platform for Integrative Biology, Ole R&#248;mer grants from the Danish Natural Science Research Council and National Science Foundation (DBI 0217241). We thank four anonymous reviewers for their generous and constructive suggestion.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks</p>
            </title>
            <aug>
               <au>
                  <snm>Gaunt</snm>
                  <mi>W</mi>
                  <fnm>Michael</fnm>
               </au>
               <au>
                  <snm>Miles</snm>
                  <mi>A</mi>
                  <fnm>Michael</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>748</fpage>
            <lpage>761</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11961108</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Comparative analysis of morphological traits among <it>Drosophila melanogaster </it>and <it>D. simulans</it>: genetic variability, clines and phenotypic plasticity</p>
            </title>
            <aug>
               <au>
                  <snm>Gibert</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Capy</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Imasheva</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Moreteau</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Morin</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Petavy</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>David</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Genetica</source>
            <pubdate>2004</pubdate>
            <volume>120</volume>
            <fpage>165</fpage>
            <lpage>179</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/B:GENE.0000017639.62427.8b</pubid>
                  <pubid idtype="pmpid" link="fulltext">15088656</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Gene translocation links insect and crustaceans</p>
            </title>
            <aug>
               <au>
                  <snm>Boore</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Lavrov</snm>
                  <fnm>DV</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1998</pubdate>
            <volume>392</volume>
            <fpage>667</fpage>
            <lpage>668</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/33577</pubid>
                  <pubid idtype="pmpid" link="fulltext">9565028</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <aug>
               <au>
                  <snm>Heming</snm>
                  <fnm>BS</fnm>
               </au>
            </aug>
            <source>Insect Development and Evolution</source>
            <publisher>New York: Cornell University Press</publisher>
            <pubdate>2003</pubdate>
            <fpage>139</fpage>
            <lpage>151</lpage>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Fossil Liposcelididae and the lice ages (Insecta: Psocodea)</p>
            </title>
            <aug>
               <au>
                  <snm>Grimaldi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Engel</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>Proc Biol Sci</source>
            <pubdate>2006</pubdate>
            <volume>273</volume>
            <fpage>625</fpage>
            <lpage>33</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1098/rspb.2005.3337</pubid>
                  <pubid idtype="pmpid" link="fulltext">16537135</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Phylogeny of extant hexapods</p>
            </title>
            <aug>
               <au>
                  <snm>Kristensen</snm>
                  <fnm>NP</fnm>
               </au>
            </aug>
            <source>The insects of Australia; A textbook for students and research workers</source>
            <publisher>Melbourne: Melbourne Univ. Press</publisher>
            <editor>Naumann ID, Carne PB, Lawrence JF, Nielsen ES, Spradberry JP, Taylor RW, Whitten MJ, Littlejohn MJ</editor>
            <edition>2</edition>
            <pubdate>1991</pubdate>
            <fpage>125</fpage>
            <lpage>140</lpage>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Generating patterns from fields of cells. Examples from <it>Drosophila </it>segmentation</p>
            </title>
            <aug>
               <au>
                  <snm>Sanson</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>EMBO Rep</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>1083</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1084173</pubid>
                  <pubid idtype="pmpid" link="fulltext">11743020</pubid>
                  <pubid idtype="doi">10.1093/embo-reports/kve255</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Insect segmentation: Genes, stripes and segments in "Hoppers"</p>
            </title>
            <aug>
               <au>
                  <snm>French</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>R910</fpage>
            <lpage>3</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(01)00552-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">11719236</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Pherokine-2 and -3: two Drosophila molecules related to pheromone/odor-binding proteins induced by viral and bacterial infections</p>
            </title>
            <aug>
               <au>
                  <snm>Sabatier</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Jouanaguy</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dostert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zachary</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dimarcg</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Bulet</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Imler</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Europ J Biochem</source>
            <pubdate>2003</pubdate>
            <volume>270</volume>
            <fpage>3398</fpage>
            <lpage>3407</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1432-1033.2003.03725.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12899697</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Sequencing of a new targert genome: the Pediculus humannus humanus (Phthiraptera: Pediculidae) genome project</p>
            </title>
            <aug>
               <au>
                  <snm>Pittendrigh</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Johnston</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Romero-Severson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dasch</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>J Med Entomol</source>
            <pubdate>2006</pubdate>
            <volume>43</volume>
            <fpage>1103</fpage>
            <lpage>11</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1603/0022-2585(2006)43[1103:SOANTG]2.0.CO;2</pubid>
                  <pubid idtype="pmpid">17162941</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>The analysis of large-scale gene expression correlated to the phase changes of the migratory locust</p>
            </title>
            <aug>
               <au>
                  <snm>Kang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>XY</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>RQ</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>17611</fpage>
            <lpage>17615</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0407753101</pubid>
                  <pubid idtype="pmpid" link="fulltext">15591108</pubid>
                  <pubid idtype="pmcid">535406</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Genomic insights into the immune system of the sea urchin</p>
            </title>
            <aug>
               <au>
                  <snm>Rast</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Loza-Coll</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hibino</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Litman</snm>
                  <fnm>GW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>314</volume>
            <fpage>952</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1134301</pubid>
                  <pubid idtype="pmpid" link="fulltext">17095692</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Vertebrate gene predictions and the problem of large genes</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>GK</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>741</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1160</pubid>
                  <pubid idtype="pmpid" link="fulltext">12951575</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Current methods of gene prediction. their strengths and weaknesses</p>
            </title>
            <aug>
               <au>
                  <snm>Catherine</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marie-France</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pierre</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>4103</fpage>
            <lpage>4117</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">140543</pubid>
                  <pubid idtype="pmpid" link="fulltext">12364589</pubid>
                  <pubid idtype="doi">10.1093/nar/gkf543</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Insect cuticular proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Andersen</snm>
                  <fnm>SO</fnm>
               </au>
               <au>
                  <snm>Hojrup</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Roepstorff</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Insect Biochem Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>25</volume>
            <fpage>153</fpage>
            <lpage>76</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0965-1748(94)00052-J</pubid>
                  <pubid idtype="pmpid" link="fulltext">7711748</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Odorant binding protein diversity and distribution among the insect orders as indicated by LAP an OBP-related protein of the true bug <it>Lygus lineolaris </it>(Hemiptera, Heteroptera)</p>
            </title>
            <aug>
               <au>
                  <snm>Vog</snm>
                  <fnm>RG</fnm>
               </au>
               <au>
                  <snm>Callahan</snm>
                  <fnm>FE</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Dickens</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Chem Senses</source>
            <pubdate>1999</pubdate>
            <volume>24</volume>
            <fpage>481</fpage>
            <lpage>495</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/chemse/24.5.481</pubid>
                  <pubid idtype="pmpid" link="fulltext">10576256</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Genome-wide analysis of the odorant-binding protein gene family in <it>Drosophila melanogaster</it></p>
            </title>
            <aug>
               <au>
                  <snm>Daria</snm>
                  <mi>S</mi>
                  <fnm>Hekmat-Scafe</fnm>
               </au>
               <au>
                  <snm>Charles</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Aimee</snm>
                  <mi>J</mi>
                  <fnm>Mckinney</fnm>
               </au>
               <au>
                  <snm>Mark</snm>
                  <mi>A</mi>
                  <fnm>Tanouye</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1357</fpage>
            <lpage>1369</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186648</pubid>
                  <pubid idtype="pmpid" link="fulltext">12213773</pubid>
                  <pubid idtype="doi">10.1101/gr.239402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Pheromone binding and inactivation by moth antennae</p>
            </title>
            <aug>
               <au>
                  <snm>Richard</snm>
                  <mi>G</mi>
                  <fnm>Vogt</fnm>
               </au>
               <au>
                  <snm>Lynn</snm>
                  <mi>M</mi>
                  <fnm>Riddiford</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1981</pubdate>
            <volume>293</volume>
            <fpage>161</fpage>
            <lpage>163</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1038/293161a0</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Peripheral mechanisms of pheromone reception in moths</p>
            </title>
            <aug>
               <au>
                  <snm>Kaissling</snm>
                  <fnm>KE</fnm>
               </au>
            </aug>
            <source>Chem Senses</source>
            <pubdate>1996</pubdate>
            <volume>21</volume>
            <fpage>257</fpage>
            <lpage>268</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/chemse/21.2.257</pubid>
                  <pubid idtype="pmpid" link="fulltext">8670704</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Wake up and smell the pheromones</p>
            </title>
            <aug>
               <au>
                  <snm>Vosshall</snm>
                  <fnm>LB</fnm>
               </au>
               <au>
                  <snm>Stensmyr</snm>
                  <fnm>MC</fnm>
               </au>
            </aug>
            <source>Neuron</source>
            <pubdate>2005</pubdate>
            <volume>45</volume>
            <fpage>179</fpage>
            <lpage>187</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.neuron.2005.01.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15664166</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Pheromone reception</p>
            </title>
            <aug>
               <au>
                  <snm>Leal</snm>
                  <fnm>WS</fnm>
               </au>
            </aug>
            <source>Topics in current chemistry</source>
            <pubdate>2005</pubdate>
            <volume>240</volume>
            <fpage>1</fpage>
            <lpage>36</lpage>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Sinuous is a <it>Drosophila </it>claudin required for septate junction organization and epithelial tube size control</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>VM</fnm>
               </au>
               <au>
                  <snm>Schulte</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hirschi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tepass</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Beitel</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>J Cell Biol</source>
            <pubdate>2004</pubdate>
            <volume>164</volume>
            <fpage>313</fpage>
            <lpage>323</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1083/jcb.200309134</pubid>
                  <pubid idtype="pmpid" link="fulltext">14734539</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Implications for bcd mRNA localization from spatial distribution of exu protein in <it>Drosophila </it>oogenesis</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hazelrigg</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1994</pubdate>
            <volume>369</volume>
            <fpage>400</fpage>
            <lpage>03</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/369400a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">7910952</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>The evolution of hexamerins and the phylogeny of insects</p>
            </title>
            <aug>
               <au>
                  <snm>Burmester</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Massey</snm>
                  <fnm>HC</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Zakharkin</snm>
                  <fnm>SO</fnm>
               </au>
               <au>
                  <snm>Benes</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1998</pubdate>
            <volume>47</volume>
            <fpage>93</fpage>
            <lpage>108</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/PL00006366</pubid>
                  <pubid idtype="pmpid" link="fulltext">9664700</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>The major serum protein of <it>Drosophila </it>larvae, Larval Serum Protein 1, is dispensable</p>
            </title>
            <aug>
               <au>
                  <snm>Roberts</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Jowett</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Glover</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Europ J Biochem</source>
            <pubdate>1991</pubdate>
            <volume>195</volume>
            <fpage>195</fpage>
            <lpage>201</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1432-1033.1991.tb15695.x</pubid>
                  <pubid idtype="pmpid">1703957</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Plant-insect coevolution and inhibition of acetycholinesterase</p>
            </title>
            <aug>
               <au>
                  <snm>Ryan</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Byrne</snm>
                  <fnm>Oonagh</fnm>
               </au>
            </aug>
            <source>Journal of chemical ecology</source>
            <pubdate>1988</pubdate>
            <volume>14</volume>
            <fpage>1965</fpage>
            <lpage>1975</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/BF01013489</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>The <it>Ka/Ks </it>ratio: Diagnosing the form of sequence evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>486</fpage>
            <lpage>487</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)02722-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">12175810</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Molecular Evolution (Sinaur Associates, Sunderland, Massachusetts. 1997)</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>W-H</fnm>
               </au>
            </aug>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Mammalian housekeeping genes evolve more slowly than tissue-specific genes</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Liqing</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Wen-Hsiung</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>21</volume>
            <fpage>236</fpage>
            <lpage>239</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh010</pubid>
                  <pubid idtype="pmpid" link="fulltext">14595094</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Finishing a whole-genome shotgun: release 3 of the the <it>Drosophila melanogaster </it>euchromatic genome sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Kronmiller</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Champe</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dugan</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Frise</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hodgson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>George</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Hoskins</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Laverty</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Muzny</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Pacleb</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pfeiffer</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sodergren</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Svirskas</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tabor</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Wan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Stapleton</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Scherer</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>research0079.1</fpage>
            <lpage>14</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1186/gb-2002-3-12-research0079</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Ensembl Genome Browser</p>
            </title>
            <url>http://www.ensembl.org/index.html</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The genome sequence of the malaria mosquito <it>Anopheles gambiae</it></p>
            </title>
            <aug>
               <au>
                  <snm>Holt</snm>
                  <mi>A</mi>
                  <fnm>Robert</fnm>
               </au>
               <au>
                  <snm>Mani Subramanian</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>Aaron</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <mi>G</mi>
                  <fnm>Granger</fnm>
               </au>
               <au>
                  <snm>Charlab</snm>
                  <fnm>Rosane</fnm>
               </au>
               <au>
                  <snm>Nusskern</snm>
                  <mi>R</mi>
                  <fnm>Deborah</fnm>
               </au>
               <au>
                  <snm>Wincker</snm>
                  <fnm>Patrick</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <mi>G</mi>
                  <fnm>Andrew</fnm>
               </au>
               <au>
                  <snm>Ribeiro</snm>
                  <mi>MC</mi>
                  <fnm>Jos&#233;</fnm>
               </au>
               <au>
                  <snm>Wides</snm>
                  <fnm>Ron</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <mi>L</mi>
                  <fnm>Steven</fnm>
               </au>
               <au>
                  <snm>Loftus</snm>
                  <fnm>Brendan</fnm>
               </au>
               <au>
                  <snm>Yandell</snm>
                  <fnm>Mark</fnm>
               </au>
               <au>
                  <snm>Majoros</snm>
                  <mi>H</mi>
                  <fnm>William</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <mi>B</mi>
                  <fnm>Douglas</fnm>
               </au>
               <au>
                  <snm>Lai</snm>
                  <fnm>Zhongwu</fnm>
               </au>
               <au>
                  <snm>Kraft</snm>
                  <mi>L</mi>
                  <fnm>Cheryl</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <mi>F</mi>
                  <fnm>Josep</fnm>
               </au>
               <au>
                  <snm>Anthouard</snm>
                  <fnm>Veronique</fnm>
               </au>
               <au>
                  <snm>Arensburger</snm>
                  <fnm>Peter</fnm>
               </au>
               <au>
                  <snm>Atkinson</snm>
                  <mi>W</mi>
                  <fnm>Peter</fnm>
               </au>
               <au>
                  <snm>Baden</snm>
                  <fnm>Holly</fnm>
               </au>
               <au>
                  <snm>de Berardinis</snm>
                  <fnm>Veronique</fnm>
               </au>
               <au>
                  <snm>Baldwin</snm>
                  <fnm>Danita</fnm>
               </au>
               <au>
                  <snm>Benes</snm>
                  <fnm>Vladimir</fnm>
               </au>
               <au>
                  <snm>Biedler</snm>
                  <fnm>Jim</fnm>
               </au>
               <au>
                  <snm>Blass</snm>
                  <fnm>Claudia</fnm>
               </au>
               <au>
                  <snm>Bolanos</snm>
                  <fnm>Randall</fnm>
               </au>
               <au>
                  <snm>Boscus</snm>
                  <fnm>Didier</fnm>
               </au>
               <au>
                  <snm>Barnstead</snm>
                  <fnm>Mary</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>298</volume>
            <fpage>129</fpage>
            <lpage>49</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1076181</pubid>
                  <pubid idtype="pmpid" link="fulltext">12364791</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>SilkDB: a knowledgebase for silkworm biology and genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Xia</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Dai</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ruan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Yuan</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Xiang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2005</pubdate>
            <issue>33 Database</issue>
            <fpage>D399</fpage>
            <lpage>402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540070</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608225</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>A Draft Sequence for the Genome of the Domesticated Silkworm (Bombyx Mori)</p>
            </title>
            <aug>
               <au>
                  <snm>Xia</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dai</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zha</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chai</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Qian</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lan</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Yuan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wan</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>W</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>306</volume>
            <fpage>1937</fpage>
            <lpage>40</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1102210</pubid>
                  <pubid idtype="pmpid" link="fulltext">15591204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Insights into social insects from the genome of the honeybee <it>Apis mellifera</it></p>
            </title>
            <aug>
               <au>
                  <cnm>The Honeybee Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>443</volume>
            <fpage>931</fpage>
            <lpage>949</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature05260</pubid>
                  <pubid idtype="pmpid">17073008</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Honeybee Genome Project</p>
            </title>
            <url>http://www.hgsc.bcm.tmc.edu/projects/honeybee/</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Finishing the euchromatic sequence of the human genome</p>
            </title>
            <aug>
               <au>
                  <cnm>International Human Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>431</volume>
            <fpage>931</fpage>
            <lpage>945</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15496913</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>SGD: <it>Saccharomyces </it>Genome Database</p>
            </title>
            <aug>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Adler</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chervitz</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Dwight</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Hester</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Jia</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Juvik</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Roe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Botsein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>73</fpage>
            <lpage>79</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147204</pubid>
                  <pubid idtype="pmpid" link="fulltext">9399804</pubid>
                  <pubid idtype="doi">10.1093/nar/26.1.73</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Stein</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Bao</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Blasiar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Blumenthal</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Chinwalla</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Clee</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Coghlan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Coulson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>D'Eustachio</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Fitch</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Fulton</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Fulton</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Hillier</snm>
                  <fnm>LW</fnm>
               </au>
               <au>
                  <snm>Kamath</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kuwabara</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Mardis</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Marra</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Miner</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Minx</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mullikin</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Plumb</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schein</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Sohrmann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Spieth</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2003</pubdate>
            <volume>1</volume>
            <fpage>E45</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">261899</pubid>
                  <pubid idtype="pmpid" link="fulltext">14624247</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0000045</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The UCSC Genome Browser Database</p>
            </title>
            <url>http://genome-test.cse.ucsc.edu/</url>
         </bibl>
         <bibl id="B41">
            <title>
               <p><it>Saccharomyces </it>Genome Database (SGD)</p>
            </title>
            <url>http://www.yeastgenome.org/</url>
         </bibl>
         <bibl id="B42">
            <title>
               <p>LocustDB: a relational database for the transcriptiome and biology of the migratory locust (<it>Locusta migratorial</it>)</p>
            </title>
            <aug>
               <au>
                  <snm>Ma</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kang</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>21</volume>
            <fpage>7</fpage>
            <lpage>11</lpage>
            <url>http://locustdb.genomics.org.cn/</url>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2231712</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Test datasets and evaluation of gene prediction programs on the rice genome</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>Heng</fnm>
               </au>
               <au>
                  <snm>Gao</snm>
                  <fnm>Lei</fnm>
               </au>
               <au>
                  <snm>Fang</snm>
                  <fnm>Lin</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>Tao</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Hai-Hong</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Yan</fnm>
               </au>
               <au>
                  <snm>Fang</snm>
                  <fnm>Li-Jun</fnm>
               </au>
               <au>
                  <snm>Xie</snm>
                  <fnm>Hui-Min</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>Wei-Mou</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>Jin-Song</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>Zhao</fnm>
               </au>
               <au>
                  <snm>Jin</snm>
                  <fnm>Jiao</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Yu-Dong</fnm>
               </au>
               <au>
                  <snm>Xing</snm>
                  <fnm>Zi-Xing</fnm>
               </au>
               <au>
                  <snm>Gao</snm>
                  <fnm>Shao-Gen</fnm>
               </au>
               <au>
                  <snm>Hao</snm>
                  <fnm>Bai-Lin</fnm>
               </au>
            </aug>
            <source>J Comput Sci &amp; Technol</source>
            <pubdate>2005</pubdate>
            <volume>20</volume>
            <fpage>446</fpage>
            <lpage>453</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/s11390-005-0446-x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Prediction of complete gene structures in human genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <fpage>78</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.0951</pubid>
                  <pubid idtype="pmpid" link="fulltext">9149143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Ab initio gene fiding in <it>Drosophila </it>genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Salamov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Solovvev</snm>
                  <fnm>VV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>516</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310882</pubid>
                  <pubid idtype="pmpid" link="fulltext">10779491</pubid>
                  <pubid idtype="doi">10.1101/gr.10.4.516</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>InterProScan &#8211; an integration platform for the signature-recognition methods in InterPro</p>
            </title>
            <aug>
               <au>
                  <snm>Zdobnov</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>847</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.9.847</pubid>
                  <pubid idtype="pmpid" link="fulltext">11590104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium</p>
            </title>
            <aug>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dwight</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Eppig</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Issel-Tarver</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kasarskis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Matese</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Ringwald</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>25</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/75556</pubid>
                  <pubid idtype="pmpid" link="fulltext">10802651</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>ZH</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>32</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10666704</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
