<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-436</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Database</dochead>
      <bibl>
         <title>
            <p>ORENZA: a web resource for studying ORphan ENZyme activities</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Lespinet</snm>
               <fnm>Olivier</fnm>
               <insr iid="I1"/>
               <email>olivier.lespinet@igmors.u-psud.fr</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Labedan</snm>
               <fnm>Bernard</fnm>
               <insr iid="I1"/>
               <email>bernard.labedan@igmors.u-psud.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Institut de G&#233;n&#233;tique et Microbiologie, CNRS UMR 8621, Universit&#233; Paris-Sud, B&#226;timent 400, 91405 Orsay Cedex, France</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>436</fpage>
         <url>http://www.biomedcentral.com/1471-2105/7/436</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17026747</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-436</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>25</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>06</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>06</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Lespinet and Labedan; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Despite the current availability of several hundreds of thousands of amino acid sequences, more than 36% of the enzyme activities (EC numbers) defined by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) are not associated with any amino acid sequence in major public databases. This wide gap separating knowledge of biochemical function and sequence information is found for nearly all classes of enzymes. Thus, there is an urgent need to explore these sequence-less EC numbers, in order to progressively close this gap.</p>
            </sec>
            <sec>
               <st>
                  <p>Description</p>
               </st>
               <p>We designed ORENZA, a PostgreSQL database of ORphan ENZyme Activities, to collate information about the EC numbers defined by the NC-IUBMB with specific emphasis on orphan enzyme activities. Complete lists of all EC numbers and of orphan EC numbers are available and will be periodically updated. ORENZA allows one to browse the complete list of EC numbers or the subset associated with orphan enzymes or to query a specific EC number, an enzyme name or a species name for those interested in particular organisms. It is possible to search ORENZA for the different biochemical properties of the defined enzymes, the metabolic pathways in which they participate, the taxonomic data of the organisms whose genomes encode them, and many other features. The association of an enzyme activity with an amino acid sequence is clearly underlined, making it easy to identify at once the orphan enzyme activities. Interactive publishing of suggestions by the community would provide expert evidence for re-annotation of orphan EC numbers in public databases.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>ORENZA is a Web resource designed to progressively bridge the unwanted gap between function (enzyme activities) and sequence (dataset present in public databases). ORENZA should increase interactions between communities of biochemists and of genomicists. This is expected to reduce the number of orphan enzyme activities by allocating gene sequences to the relevant enzymes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Since 1956, the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) has been classifying enzyme activities (EC numbers) in order to organize all contributions made by individual biochemists and to check their validity and consistency <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Such a standardization effort is based on the definition of the so-called EC numbers that comprise four digits. The first one (from 1 to 6) delineates the broad type of activity: Oxidoreductase, Transferase, Hydrolase, Lyase, Isomerase, and Ligase respectively. The second and third digits detail the reaction that an enzyme catalyzes. For example (Table <tblr tid="T1">1</tblr>), among the 1065 items forming the class Hydrolases (EC 3), there are 163 Glycosylases forming the subclass EC 3.2, of which, 140 enzymes hydrolyse O- and S-glycosyl compounds (sub-subclass EC 3.2.1) and 23 hydrolyse N-Glycosyl compounds (sub-subclass EC 3.2.2). The last digit is a serial number that is used to identify a particular enzyme. For instance, EC 3.2.2.1 corresponds to the purine nucleosidase and EC 3.2.2.3 to the uridine nucleosidase, respectively. The EC categorization is constantly evolving as new enzyme activities are determined and new information comes to light on previously classified enzymes. Presently (June 2006), 3927 EC numbers correspond to a defined unambiguous activity encoded by a protein. Note that IntEnz, the integrated relational enzyme database <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, now provides easy access to updated and curated data of the NC-IUBMB <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Browsing the EC hierachy. For each level are indicated the total number of EC numbers and that of orphan EC numbers between brackets.</p>
            </caption>
            <tblbdy cols="15">
               <r>
                  <c ca="center">
                     <p>
                        <b>Class</b>
                     </p>
                  </c>
                  <c cspan="14" ca="center">
                     <p><b>3 </b>1065 [336]</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>
                        <b>Subclass</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p><b>3.1 </b>267 [113]</p>
                  </c>
                  <c cspan="2" ca="center">
                     <p><b>3.2 </b>163 [56]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.3 </b>10 [4]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.4 </b>317 [49]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.5 </b>171 [70]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.6 </b>109 [36]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.7 </b>10 [4]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.8 </b>10 [1]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.9 </b>1 [1]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.10 </b>2 [1]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.11 </b>2 [0]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.12 </b>1 [1]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.13 </b>2 [0]</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>
                        <b>Sub-subclass</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p><b>3.2.1 </b>140 [45]</p>
                  </c>
                  <c ca="center">
                     <p><b>3.2.2 </b>23 [11]</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>Unexpectedly, Peter Karp <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and us <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp> independently observed that a significant part of these curated and approved EC numbers does not correspond to any amino acid sequence in public databases. Recent updates of our previous results confirm this very large gap between known enzyme function and recorded protein sequence. There are presently only 2483 EC numbers having at least one associated sequence in the release 8.1 (13-Jun-2006) of the UniProt Knowledgebase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. We have used the term orphan enzyme activities <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> for the 1444 EC numbers that do not have a sequence associated with them. Remarkably, these orphan enzyme activities currently represent 36.8% of the 3927 retained EC numbers.</p>
         <p>We have already shown that orphans are present at about the same proportion in every class and subclass of enzyme activities <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Likewise, we found no correlation between orphan distribution and main functional categories. 25.3% of the enzyme activities involved in well-studied metabolic pathways are sequence-less while we found 49.5% orphans among non-metabolic enzyme activities <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.</p>
         <p>Thus, it appears that there is an important gap between function and sequence, which implies that its progressive bridging would require a concerted effort as already underlined <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Accordingly, we have built ORENZA, a database of ORphan ENZyme Activities, to offer such a tool to the research community. Hereafter, we describe the content of this resource and we detail how to use it in order to reach the goals defined above.</p>
      </sec>
      <sec>
         <st>
            <p>Construction and content</p>
         </st>
         <sec>
            <st>
               <p>Structure of the ORENZA database</p>
            </st>
            <p>In order to build an efficient relational database that will help to identify the encoding gene for the maximum number of sequence-less enzyme activities (the so-called orphan enzymes <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>) we have retrieved data from various public databases and we have organized them as described below.</p>
         </sec>
         <sec>
            <st>
               <p>Data collection</p>
            </st>
            <p>There are two primary sources of information about each enzyme, corresponding respectively to all data about its activity (EC number), namely the Enzyme Nomenclature <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp> and amino acid sequences as recorded in UniProt Knowledgebase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Fig. <figr fid="F1">1</figr> shows the different fields that were collected from both these sources and how they are organized in one main table. Moreover, we added additional &#8211; but highly important &#8211; features about each enzyme such as its role in metabolism (data recovered from KEGG <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>), the names of the organism(s) where it has been studied (data extracted from BRENDA <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and from UniProt <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>), and the taxonomy of these organisms (data extracted from the NCBI <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>), and various pieces of information extracted from ENZYME <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> such as cofactors, possible role in disease and motifs found in PROSITE <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. These secondary characteristics are confined to small tables or added directly to the main one as in the case of the 3D structure (data recovered from PDB <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>). We wrote Perl scripts in order to extract and periodically update the relevant information from the following public resources: NC-IUBMB, ENZYME, KEGG, BRENDA, UNIPROT, and PDB. Note also the addition of a couple of other tables, one is listing ribozymes (only one, presently), the other one listing the individual contributions made by external experts on their sequence data (see below for more details).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Schema of the ORENZA relational database</p>
               </caption>
               <text>
                  <p><b>Schema of the ORENZA relational database</b>. The primary key of each table is in bold underlined type. Dashed arrows indicate references to foreign keys. Plain arrows represent the origin of the data stored in each table. Moreover, for the table Enzyme_activities the origin of the data is indicated by the same color code used to identify each of the following major primary databases used in our analysis: UniProt (purple), BRENDA (blue), ENZYME (green), PDB (orange) and NC-IUBMB (beige).</p>
               </text>
               <graphic file="1471-2105-7-436-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Checking orphanity</p>
            </st>
            <p>A Perl script screened the occurrence of EC numbers in UniProt Knowledgebase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Any EC number assigned by the NC-IUBMB <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> that is not referenced in UniProt is defined as an orphan enzyme activity. Note that we did not take into account partial or incomplete EC numbers (318 in the present version of UniProt) but too ambiguous <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> for sound use.</p>
         </sec>
         <sec>
            <st>
               <p>Structuring the relational database and implementing the web resource</p>
            </st>
            <p>We chose to use exclusively open source tools to build ORENZA database.</p>
            <p>Accordingly, PostgreSQL 8.1 <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, one of the most advanced open source databases, was installed on a Linux platform. PHP language <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> was used to structure the Web service and to better exploit the queries from the relational database.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Utility</p>
         </st>
         <sec>
            <st>
               <p>Browsing and searching ORENZA</p>
            </st>
            <p>One can browse and/or search ORENZA using three main avenues as described in detail below.</p>
            <sec>
               <st>
                  <p>Browsing the whole set of EC numbers</p>
               </st>
               <p>The complete list of EC numbers is directly available by a simple click. It corresponds to the most recent version of NC-IUBMB <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The obtained view displays the list as a three-column table where each line corresponds to a specific EC number, the common name of the corresponding enzyme and a computed annotation about its possible orphanity, respectively (Fig. <figr fid="F2">2</figr>). Note also that the upper line of this view shows a summary indicating the total number of the EC numbers present in the selection (including the ribozyme) as well as that of the orphan EC numbers, respectively. The entire list, which can be easily downloaded as a text file, is completely dynamic. A click on a line opens a new view delivering a wealth of information about the selected EC number that is structured in three successive levels. Fig. <figr fid="F3">3A</figr> shows an example in the case of EC 1.1.1.125 with notification of many features.</p>
               <fig id="F2">
                  <title>
                     <p>Figure 2</p>
                  </title>
                  <caption>
                     <p>Extract from the full list of enzymes classified by the NC-IUBMB, along with their associated orphanity</p>
                  </caption>
                  <text>
                     <p><b>Extract from the full list of enzymes classified by the NC-IUBMB, along with their associated orphanity</b>. For each line EC number, common name and orphanity are indicated. The total number of enzymes and the total number of orphan enzymes activities are indicated on top.</p>
                  </text>
                  <graphic file="1471-2105-7-436-2"/>
               </fig>
               <fig id="F3">
                  <title>
                     <p>Figure 3</p>
                  </title>
                  <caption>
                     <p>Details of specific enzymes</p>
                  </caption>
                  <text>
                     <p><b>Details of specific enzymes</b>. 3A: example of an enzyme entry with associated amino acid sequences. 3B: example of an orphan EC number. The fact that the enzyme is an orphan enzyme is noted after the EC number and in the Swiss-Prot and TrEMBL fields.</p>
                  </text>
                  <graphic file="1471-2105-7-436-3"/>
               </fig>
               <p>The first level consists of characteristics of the enzymatic activity and its history. The description section contains information taken from the NC-IUBMB data such as the different names (common, systematic, and others) of the enzyme, a scheme of the reaction(s) it catalyses and other data about the cofactors and NC-IUBMB comments about the reaction that are extracted from the ENZYME database <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. In the history part, we list fundamental references, and the date of creation of the entry in the official NC-IUBMB nomenclature.</p>
               <p>The second level presents information about the position of the enzyme in the cell metabolism with the corresponding number of a KEGG map <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, and its taxonomic ubiquity with a list of organisms where this enzymatic activity has been characterized as recorded in the BRENDA database <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
               <p>The third level exhibits information about the peptidic molecule such as motifs (from PROSITE <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>), the lists of amino acid sequences found in SwissProt and TrEMBL, respectively <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. If there is no sequence, as is the case for EC 1.1.1.126, which is labeled "orphan", this is clearly mentioned (Fig. <figr fid="F3">3B</figr>).</p>
            </sec>
            <sec>
               <st>
                  <p>Browsing the orphan EC numbers</p>
               </st>
               <p>The second main avenue offered by ORENZA to explore the enzyme universe is the entire list, periodically updated, of the orphan enzyme activities. As described above, there are several ways to retrieve these orphans besides browsing the list in its entirety.</p>
               <p>First, one can browse the different levels (class, subclass, etc.) of the EC hierarchy exactly as already described for the whole dataset of EC numbers.</p>
               <p>A second approach is to explore the metabolism hierarchy proposed by KEGG. For instance, clicking on Lipid Metabolism (56 orphans out of 246) opens a view showing the distribution of these orphans inside the 12 corresponding pathways (Fig. <figr fid="F4">4A</figr>). Among these 12 pathways, glycerophospholipid metabolism appears to have the most orphans (19). Another click unveils the full list of these enzyme activities involved in glycerophospholipid metabolism for which no amino acid sequence is available (Fig. <figr fid="F4">4B</figr>). Again, one can explore each enzyme in detail and copy/paste the corresponding information to save it as a text file.</p>
               <fig id="F4">
                  <title>
                     <p>Figure 4</p>
                  </title>
                  <caption>
                     <p>List of orphan enzyme activities for various KEGG pathways</p>
                  </caption>
                  <text>
                     <p><b>List of orphan enzyme activities for various KEGG pathways</b>. 4A: List for pathway <it>'01130 Lipid Metabolism'</it>, sorted by sub-pathways. 4B: List of orphan EC numbers for pathway <it>'00564 Glycerophospholipid metabolism</it>.'</p>
                  </text>
                  <graphic file="1471-2105-7-436-4"/>
               </fig>
               <p>A third way to browse the orphan EC numbers is to sort them by their year of creation. This permits one to observe that the relative proportion of orphans is independent of the progress of genome sequencing. Fig. <figr fid="F5">5A</figr> shows that many orphans appeared during the period of gene sequencing and that the level remained unexpectedly high during the present era of heavy genome sequencing. Fig. <figr fid="F5">5B</figr> zooms in on the last seven years and confirms this trend with a high proportion of orphans in 2000, 2004 and 2005.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Distribution of the creation year of orphan enzyme activities</p>
                  </caption>
                  <text>
                     <p><b>Distribution of the creation year of orphan enzyme activities</b>. 5A. Distribution during the pre-sequencing era (yellow), the gene sequencing era (pink) and the genome sequencing era (cyan). 5B Number of enzymes created within the past seven years that have/lack sequence data. Total number of EC numbers is in blue, total number of orphan EC numbers in red and percentage of orphans in green.</p>
                  </text>
                  <graphic file="1471-2105-7-436-5"/>
               </fig>
               <p>A fourth way to explore orphan enzyme activities is based on their occurrence in different organisms. Here, we access the entire list of orphan enzyme activities sorted by the number of organisms where these activities have been detected and experimentally studied. Beside the 39 EC numbers for which there is no information in the BRENDA resource, we find that a large majority (1286) of orphans is found in a limited number (1 to 10) of species (Fig. <figr fid="F6">6</figr>) but a few ones (132) have been found to have a large taxonomic distribution (Fig. <figr fid="F6">6</figr>, inset).</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>Taxonomic distribution of orphan enzyme activities</p>
                  </caption>
                  <text>
                     <p><b>Taxonomic distribution of orphan enzyme activities</b>. Green bars correspond to the distribution of the number of organisms (ranging from one to ten) where orphan EC numbers have been experimentally identified. In the inset, pink bars correspond to the number of orphan EC numbers identified in various ranges of number of organisms larger than ten organisms.</p>
                  </text>
                  <graphic file="1471-2105-7-436-6"/>
               </fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Searching ORENZA</p>
            </st>
            <p>It is possible to query ORENZA for a specific enzyme activity by entering either the EC number or the enzyme name. For example, entering the word "aspartate" recovers 41 EC numbers, 13 being presently not assigned to a sequence.</p>
            <p>Another interesting feature is the possibility of searching by species. For instance, entering the phrase "<it>Homo sapiens</it>" retrieves 1560 EC numbers that are present in human cells. Looking at the obtained list shows again a significant number of 225 orphans. The same observation is true for four other model organisms as shown in Table <tblr tid="T2">2</tblr>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Distribution of orphan enzyme activities in a few model organisms</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>EC numbers</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>Model organisms</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Total</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Orphans (/total)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Species specific orphans (/total orphans)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>Escherichia coli</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1792</p>
                     </c>
                     <c ca="center">
                        <p>189 (0.11)</p>
                     </c>
                     <c ca="center">
                        <p>25 (0.13)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>Arabidopsis thaliana</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>651</p>
                     </c>
                     <c ca="center">
                        <p>22 (0.03)</p>
                     </c>
                     <c ca="center">
                        <p>0 (0)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>Saccharomyces cerevisiae</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1254</p>
                     </c>
                     <c ca="center">
                        <p>129 (0.10)</p>
                     </c>
                     <c ca="center">
                        <p>14 (0.11)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>Drosophila melanogaster</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>417</p>
                     </c>
                     <c ca="center">
                        <p>16 (0.04)</p>
                     </c>
                     <c ca="center">
                        <p>4 (0.25)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>Homo sapiens</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1560</p>
                     </c>
                     <c ca="center">
                        <p>225 (0.14)</p>
                     </c>
                     <c ca="center">
                        <p>6 (0.02)</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Interestingly, the proportion of orphans that are common to these five model organisms is extremely low. Only three EC numbers are found as orphans in the five organisms: EC 3.6.1.18 (FAD diphosphatase), EC 3.6.4.4 (plus-end-directed kinesin ATPase), and EC 3.6.4.5 (minus-end-directed kinesin ATPase). Moreover, only three EC numbers are found as orphans in <it>E. coli</it>, fungi and animals but not in plants: EC 1.1.1.43 (phosphogluconate 2-dehydrogenase), EC 1.5.3.2 (N-methyl-L-amino-acid oxidase), and EC 3.6.4.1 (myosin ATPase).</p>
            <p>On the other hand, we have a few species-specific orphans as shown further on Table <tblr tid="T2">2</tblr>. For instance, six orphan EC numbers are reported uniquely in human cells (listed in Table <tblr tid="T3">3</tblr>) but the corresponding figures are as high as 25 for <it>E. coli </it>and 14 for <it>S. cerevisiae</it>, two organisms that have been intensively studied at the biochemical level for 60 years by thousands of laboratories worldwide.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>The six orphan enzyme activities that are specific to <it>Homo sapiens</it>.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="center">
                        <p>
                           <b>EC number</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Enzyme name</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>role in human physiology</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>EC 2.3.1.125</p>
                     </c>
                     <c ca="center">
                        <p>1-alkyl-2-acetylglycerol O-acyltransferase</p>
                     </c>
                     <c ca="center">
                        <p>platelet activation</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>EC 3.1.6.15</p>
                     </c>
                     <c ca="center">
                        <p>N-sulfoglucosamine-3-sulfatase</p>
                     </c>
                     <c ca="center">
                        <p>urinary infection by <it>Flavobacterium heparinum</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>EC 1.1.1.160</p>
                     </c>
                     <c ca="center">
                        <p>dihydrobunolol dehydrogenase</p>
                     </c>
                     <c ca="center">
                        <p>liver physiology</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>EC 2.4.1.153</p>
                     </c>
                     <c ca="center">
                        <p>dolichyl-phosphate &#945;-N-acetylglucosaminyltransferase</p>
                     </c>
                     <c ca="center">
                        <p>liver physiology</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>EC 3.1.2.13</p>
                     </c>
                     <c ca="center">
                        <p>S-succinylglutathione hydrolase</p>
                     </c>
                     <c ca="center">
                        <p>liver physiology</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>EC 5.1.3.19</p>
                     </c>
                     <c ca="center">
                        <p>chondroitin-glucuronate 5-epimerase</p>
                     </c>
                     <c ca="center">
                        <p>blood coagulation, cardiovascular disease, carcinogenesis</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Building an ORENZA community</p>
            </st>
            <p>We clearly need the help of a large array of experts to identify the putative sequence(s) associated with orphan enzyme activities <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. In order to encourage such a collective effort, we propose, as a part of this ORENZA resource, a friendly tool that will allow people having sound knowledge about specific enzyme activities to make helpful suggestions. Moreover, such a resource could help to establish fruitful and dynamic interactions between different experts interested in the same field. Indeed, each suggestion (with identification of its author) will appear on ORENZA resource as a new item on each EC number's individual files. If several experts agree on the same suggestion, it would be transmitted to the curators of UniProt with a high degree of confidence. In cases where experts provide conflicting advice, all versions of the advice provided will be published as they have been set and validated. This would allow the community to decide, eventually.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The presence of so many EC numbers that do not have an associated sequence appears rather extraordinary at a time where we are inundated by genomic data. Such a situation is encroaching Research at different levels. Alleviating this problem would be very helpful for the difficult task of annotating and/or reannotating genomes. Thus, there is an urgent need to bridge this unwanted gap between biochemical knowledge and massive identification of coding sequences and we and others (see Karp <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>) think that the whole community must contribute to this task. This is why we built this ORENZA resource.</p>
         <p>We designed this database to be an interactive tool allowing each expert to exploit his/her knowledge about an (or a group of related) enzyme(s) that have been registered as being an orphan enzyme activity.</p>
         <p>Different cases may exist and we already described three of them where personal expertise would eliminate many errors and/or neglected instances. (i) A trivial error takes place when the enzyme has been correctly described in a sequence database but its EC number is not indicated. This is the case for example of glyceraldehyde 3-phosphate dehydrogenases as already shown <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. One of these sequences (GAPOR, EC 1.2.7.6) has been entered in UniProt without its EC number although the information was given in relevant published papers. Presently, we estimate that up to 20% of the so-called orphan EC numbers might correspond to such a trivial incomplete annotation in the sequence databases (OL &amp; BL, unpublished results). (ii) A sequence or a partial sequence has been previously determined but has not been published. We recently described such an instance in the case of putrescine carbamoyltransferases <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. (iii) We further observed that around 50% of the present orphan EC numbers are found in only one species or a few closely related organisms as shown on Fig. <figr fid="F6">6</figr>. This is due, in the large majority of the cases, to the fact that we miss genetic tools for such imperfectly studied organisms. Moreover, the availability of genomic sequences for closely related species is useless when the orphan EC numbers are specific for the studied organisms (see Tables <tblr tid="T2">2</tblr> and <tblr tid="T3">3</tblr>).</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We consider ORENZA to be a useful resource for all categories of biologists. Let us take for instance the data summarized in Table <tblr tid="T2">2</tblr> and more precisely the observation that human cells harbour six enzyme activities that are not found elsewhere and that are not associated with any amino acid sequence (Table <tblr tid="T3">3</tblr>).</p>
         <p>Any biologist would attempt to better understand the origin of such metabolic specificities. Any progress in this field could have positive consequences in terms of medical advances (see Table <tblr tid="T3">3</tblr>).</p>
         <p>The genomicist would wonder if the occurrence of these six orphans is not an indicator of a big annotation problem in the current analysis of the human genome. The expert for either a specific enzyme or a physiological aspect related with these orphan enzyme activities would feel personally concerned and we hope that he/she will promptly answer such a challenge.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>ORENZA resource is freely available via the Internet at <url>http://www.orenza.u-psud.fr</url>. The web accessibility has been tested to work with the Mozilla 1.7.12, Mozilla Firefox 1.5, and Internet Explorer 6.0 web browsers.</p>
         <p>Complete lists of all EC numbers and of orphan EC numbers are available and will be periodically updated. All data can be easily downloaded as text files.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>OL wrote the different programs necessary to collect all data from public sources and to build the relational database and the web server. Both authors participated in the data analysis and wrote the paper.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank the two anonymous reviewers for their constructive comments and Claudio Scazzocchio for critical reading of the manuscript and help with the English language. The Agence Nationale de Recherche (programme Masse de Donn&#233;es) and the CNRS have funded this project, including the processing charge for publishing this paper.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB)</p>
            </title>
            <source>Eur J Biochem</source>
            <pubdate>1999</pubdate>
            <volume>264</volume>
            <fpage>610</fpage>
            <lpage>650</lpage>
            <url>http://www.chem.qmul.ac.uk/iubmb/enzyme/index.html</url>
            <note>Enzyme Nomenclature</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1432-1327.1999.nomen.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">10491110</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>IntEnz, the integrated relational enzyme database</p>
            </title>
            <aug>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Darsow</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Degtyarenko</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Boyce</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Axelsen</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schomburg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tipton</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D434</fpage>
            <lpage>437</lpage>
            <url>http://www.ebi.ac.uk/intenz/index.html</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308853</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681451</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh119</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Call for an enzyme genomics initiative</p>
            </title>
            <aug>
               <au>
                  <snm>Karp</snm>
                  <fnm>PD</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>401</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">507876</pubid>
                  <pubid idtype="pmpid" link="fulltext">15287973</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-8-401</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Orphan enzymes?</p>
            </title>
            <aug>
               <au>
                  <snm>Lespinet</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Labedan</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>307</volume>
            <fpage>42</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.307.5706.42a</pubid>
                  <pubid idtype="pmpid" link="fulltext">15637255</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Puzzling over orphan enzymes</p>
            </title>
            <aug>
               <au>
                  <snm>Lespinet</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Labedan</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Cell Mol Life Sci</source>
            <pubdate>2006</pubdate>
            <volume>63</volume>
            <fpage>517</fpage>
            <lpage>523</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00018-005-5520-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">16465439</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The Universal Protein Resource (UniProt)</p>
            </title>
            <aug>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Boeckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ferro</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Magrane</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Redaschi</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Yeh</snm>
                  <fnm>LS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D154</fpage>
            <lpage>159</lpage>
            <url>http://www.expasy.uniprot.org/index.shtml</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540024</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608167</pubid>
                  <pubid idtype="doi">10.1093/nar/gki070</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The KEGG resources for deciphering the genome</p>
            </title>
            <aug>
               <au>
                  <snm>Kanehisa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Goto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kawashima</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Okuno</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D277</fpage>
            <lpage>D280</lpage>
            <url>http://www.genome.ad.jp/kegg</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308797</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681412</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>BRENDA, the enzyme database: updates and major new developments</p>
            </title>
            <aug>
               <au>
                  <snm>Schomburg</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Chang</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ebeling</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gremse</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Heldt</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Huhn</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schomburg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D431</fpage>
            <lpage>D433</lpage>
            <url>http://www.brenda.uni-koeln.de/</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308815</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681450</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh081</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Database resources of the National Center for Biotechnology Information</p>
            </title>
            <aug>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Chappey</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lash</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Leipe</snm>
                  <fnm>DD</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schuler</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Rapp</snm>
                  <fnm>BA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>10</fpage>
            <lpage>14</lpage>
            <url>http://www.ncbi.nlm.nih.gov/Taxonomy/</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102437</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592169</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.10</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The ENZYME database in 2000</p>
            </title>
            <aug>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>304</fpage>
            <lpage>305</lpage>
            <url>http://www.expasy.org/enzyme/</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102465</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592255</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.304</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>The PROSITE database</p>
            </title>
            <aug>
               <au>
                  <snm>Hulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bulliard</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cerutti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>De Castro</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Langendijk-Genevaux</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Pagni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sigrist</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D227</fpage>
            <lpage>D230</lpage>
            <url>http://www.expasy.org/prosite/</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347426</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381852</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Announcing the worldwide Protein Data Bank</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Henrick</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nature Structural Biology</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <fpage>980</fpage>
            <url>http://www.wwpdb.org/</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsb1203-980</pubid>
                  <pubid idtype="pmpid" link="fulltext">14634627</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers</p>
            </title>
            <aug>
               <au>
                  <snm>Green</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Karp</snm>
                  <fnm>PD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>4035</fpage>
            <lpage>4039</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1179732</pubid>
                  <pubid idtype="pmpid" link="fulltext">16034025</pubid>
                  <pubid idtype="doi">10.1093/nar/gki711</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>PostgreSQL</p>
            </title>
            <url>http://www.postgresql.org/</url>
         </bibl>
         <bibl id="B15">
            <title>
               <p>PHP</p>
            </title>
            <url>http://www.php.net/</url>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Retrieving sequences of enzymes experimentally characterized but erroneously annotated: the case of the putrescine carbamoyltransferase</p>
            </title>
            <aug>
               <au>
                  <snm>Naumoff</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Glansdorff</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Labedan</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>52</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">514541</pubid>
                  <pubid idtype="pmpid" link="fulltext">15287962</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-5-52</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
