<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2164-11-290</ui><ji>1471-2164</ji><fm>
<dochead>Research article</dochead>
<bibl>
<title>
<p>Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser</p>
</title>
<aug>
<au ca="yes" id="A1"><snm>Fitzpatrick</snm><mi>A</mi><fnm>David</fnm><insr iid="I1"/><insr iid="I2"/><email>david.fitzpatrick@nuim.ie</email></au>
<au id="A2"><snm>O'Gaora</snm><fnm>Peadar</fnm><insr iid="I3"/><email>peadar.ogaora@ucd.ie</email></au>
<au id="A3"><snm>Byrne</snm><mi>P</mi><fnm>Kevin</fnm><insr iid="I4"/><email>kevin.byrne@tcd.ie</email></au>
<au ca="yes" id="A4"><snm>Butler</snm><fnm>Geraldine</fnm><insr iid="I1"/><email>geraldine.butler@ucd.ie</email></au>
</aug>
<insg>
<ins id="I1"><p>UCD School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland</p></ins>
<ins id="I2"><p>Department of Biology, The National University of Ireland, Maynooth, County Kildare, Ireland</p></ins>
<ins id="I3"><p>UCD School of Medicine and Medical Science, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland</p></ins>
<ins id="I4"><p>Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin 2, Ireland</p></ins>
</insg>
<source>BMC Genomics</source>
<issn>1471-2164</issn>
<pubdate>2010</pubdate>
<volume>11</volume>
<issue>1</issue>
<fpage>290</fpage>
<url>http://www.biomedcentral.com/1471-2164/11/290</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-290</pubid><pubid idtype="pmpid">20459735</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>21</day><month>12</month><year>2009</year></date></rec><acc><date><day>10</day><month>5</month><year>2010</year></date></acc><pub><date><day>10</day><month>5</month><year>2010</year></date></pub></history>
<cpyrt><year>2010</year><collab>Fitzpatrick et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>
<it>Candida </it>species are the most common cause of opportunistic fungal infection worldwide. Recent sequencing efforts have provided a wealth of <it>Candida </it>genomic data. We have developed the <it>Candida </it>Gene Order Browser (CGOB), an online tool that aids comparative syntenic analyses of <it>Candida </it>species. CGOB incorporates all available <it>Candida </it>clade genome sequences including two <it>Candida albicans </it>isolates (SC5314 and WO-1) and 8 closely related species (<it>Candida dubliniensis</it>, <it>Candida tropicalis</it>, <it>Candida parapsilosis</it>, <it>Lodderomyces elongisporus</it>, <it>Debaryomyces hansenii</it>, <it>Pichia stipitis</it>, <it>Candida guilliermondii </it>and <it>Candida lusitaniae</it>). <it>Saccharomyces cerevisiae </it>is also included as a reference genome.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>CGOB assignments of homology were manually curated based on sequence similarity and synteny. In total CGOB includes 65617 genes arranged into 13625 homology columns. We have also generated improved <it>Candida </it>gene sets by merging/removing partial genes in each genome. Interrogation of CGOB revealed that the majority of tandemly duplicated genes are under strong purifying selection in all <it>Candida </it>species. We identified clusters of adjacent genes involved in the same metabolic pathways (such as catabolism of biotin, galactose and N-acetyl glucosamine) and we showed that some clusters are species or lineage-specific. We also identified one example of intron gain in <it>C. albicans</it>.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Our analysis provides an important resource that is now available for the <it>Candida </it>community. CGOB is available at <url>http://cgob.ucd.ie</url>.</p>
</sec>
</sec>
</abs>
</fm><meta>
<classifications>
<classification id="endnote" subtype="user_supplied_xml" type="bmc"/>
</classifications>
</meta><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Fungal infections are the fourth most common nosocomial bloodstream infection in the United States. <it>Candida </it>species account for approximately 10% of all bloodstream infections <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp> and worldwide are the most common cause of opportunistic fungal infection <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. Due to their increasing clinical importance, recent sequencing projects have determined the complete sequence of ten <it>Candida </it>genomes, including common pathogenic species and species rarely, if ever, associated with disease <abbrgrp>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
</abbrgrp>.</p>
<p>The term <it>Candida </it>was originally assigned to imperfect yeast species, with no known sexual cycle. This term now covers a variety of species of diverse origins (both sexual and asexual), and provides little information regarding evolutionary relationships. For example, <it>Candida glabrata </it>is more closely related to <it>Saccharomyces cerevisiae </it>than it is to <it>Candida albicans. Debaryomyces hansenii </it>and <it>Pichia stipitis </it>are close relatives of <it>Candida </it>species <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>. Some species, such as <it>C. lusitaniae</it>, were assigned two names, one (<it>Candida lusitaniae</it>) referring to the asexual (anamorph) form, and one (<it>Clavispora lusitaniae</it>) to the sexual (teleomorph) form. Similarly, <it>Candida guilliermondii </it>is also known as <it>Pichia guilliermondii</it>, and <it>Candida famata </it>as <it>D. hansenii</it>. These species share a relatively recent common ancestor (Figure <figr fid="F1">1</figr>), and in all cases the codon CUG is translated as serine rather than leucine <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>. For brevity, we refer to the above as <it>Candida </it>species that belong to the CTG clade <abbrgrp>
<abbr bid="B5">5</abbr>
<abbr bid="B8">8</abbr>
<abbr bid="B10">10</abbr>
</abbrgrp>.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Phylogenetic supertree of <it>Candida </it>species represented in CGOB</p></caption><text>
   <p><b>Phylogenetic supertree of <it>Candida </it>species represented in CGOB</b>. <it>Candida glabrata </it>and <it>Saccharomyces cerevisiae </it>have been selected as outgroups. Numbers on branches represent tandem duplications gained along each lineage.</p>
</text><graphic file="1471-2164-11-290-1" hint_layout="double"/></fig>
<p>Previous comparative analysis of eight <it>Candida </it>genomes led to the identification of gene families that are highly represented in strongly pathogenic species (such as <it>C. albicans, C. tropicalis, C. parapsilosis</it>), compared to weak pathogens such as <it>C. lusitaniae </it>and <it>C. guilliermondii</it>, and very rare or non-pathogenic species such as <it>D. hansenii </it>
<abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. These include three cell wall families; the ALS-like adhesins, which in <it>C. albicans </it>have been associated with virulence, biofilm development and acquisition of iron from the host <abbrgrp>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
</abbrgrp>; the Pga30-like family <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>, and the Hyr/Iff family <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>. Many families are highly enriched for gene duplications <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>.</p>
<p>There are currently two major browsers that display <it>Candida </it>genomes, CandidaDB, which contains information for several <it>Candida </it>genomes, and the Candida Genome Database, which predominantly describes <it>C. albicans </it>
<abbrgrp>
<abbr bid="B16">16</abbr>
<abbr bid="B17">17</abbr>
</abbrgrp>. The major disadvantage of these browsers is that it is difficult to compare genomes to each other. They also display a to-scale representation of a chromosomal region, which is unsuitable for analysis of gene order and evolution. To overcome these problems we developed the Candida Gene Order Browser (CGOB <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>).</p>
<p>CGOB incorporates all available genome sequences from <it>Candida </it>species, including two isolates of <it>C. albicans </it>(SC5314 and WO-1), its close relative and minor pathogen <it>C. dubliniensis</it>, the major pathogens <it>C. tropicalis </it>and <it>C. parapsilosis</it>, the minor pathogens <it>L. elongisporus</it>, <it>C. lusitaniae </it>and <it>C. guilliermondii</it>, the marine yeast <it>D. hansenii</it>, and <it>P. stipitis</it>, a xylose-digesting yeast that is associated with beetles found in wood <abbrgrp>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
</abbrgrp>. CGOB is based on the engine developed for the Yeast Gene Order Browser (YGOB) <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
</abbrgrp>, which has been applied to the analysis of genome duplication in the <it>Saccharomyces </it>group. To construct CGOB, all assignments of homology were manually curated, based on sequence similarity and gene order (synteny). Partial genes in each genome were identified and removed, leading to the generation of improved gene sets. CGOB was then used to analyze gene duplication, intron localization and clustering of genes involved in metabolic pathways. We found that the majority of tandemly duplicated genes are under strong purifying selection and that there are both conserved and species-specific clusters of metabolically related genes in <it>Candida</it>. CGOB is available at <url>http://cgob.ucd.ie</url>.</p>
</sec>
<sec>
<st>
<p>Results and discussion</p>
</st>
<sec>
<st>
<p>CGOB structure and <it>Candida </it>genome editing</p>
</st>
<p>Version 1 of CGOB includes ten <it>Candida </it>genomes obtained from a variety of sequencing centers (Table <tblr tid="T1">1</tblr>), together with the genome from <it>S. cerevisiae</it>. CGOB's visual display consists of horizontal tracks representing chromosomal segments and pillars (Figure <figr fid="F2">2</figr>). Pillars are the core data structures used to store list of homologies across all species represented in the gene order browser <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. Pillars contain vacant slots when homologous genes cannot be found in a particular genome. Genes were initially added to pillars based on automated assignments derived from best bidirectional BLASTP searches. The CGOB pillar dataset was manually refined by examining regions of dubious synteny and singleton genes. A combination of BLASTP scores, synteny and phylogenetic data were used to confirm assignments to pillars.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p><it>Candida </it>species displayed in CGOB.</p></caption><tblbdy cols="6">
      <r>
         <c ca="center">
            <p>
               <b>Species</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Citation</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Genes</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Partial ORFs</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Refined Gene set</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Singletons</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>C. albicans </it>SC5314</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B6">6</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="center">
            <p>6,185</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>6,185</p>
         </c>
         <c ca="center">
            <p>43</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>C. albicans </it>WO-1</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B5">5</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="center">
            <p>6,197</p>
         </c>
         <c ca="center">
            <p>91</p>
         </c>
         <c ca="center">
            <p>6,148</p>
         </c>
         <c ca="center">
            <p>99</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>C. dubliniensis </it>CD36</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B7">7</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="center">
            <p>5,924</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>5,924</p>
         </c>
         <c ca="center">
            <p>200</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>C. tropicalis </it>MYA-3404</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B5">5</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="center">
            <p>6,258</p>
         </c>
         <c ca="center">
            <p>116</p>
         </c>
         <c ca="center">
            <p>6,198</p>
         </c>
         <c ca="center">
            <p>737</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>C. parapsilosis </it>CDC 317</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B5">5</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="center">
            <p>5,823</p>
         </c>
         <c ca="center">
            <p>28</p>
         </c>
         <c ca="center">
            <p>5,809</p>
         </c>
         <c ca="center">
            <p>553</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>L. elongisporus </it>NRLL YB-4239</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B5">5</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="center">
            <p>5,802</p>
         </c>
         <c ca="center">
            <p>173</p>
         </c>
         <c ca="center">
            <p>5,710</p>
         </c>
         <c ca="center">
            <p>596</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>P. stipitis </it>CBS6054</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B3">3</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="center">
            <p>5,838</p>
         </c>
         <c ca="center">
            <p>12</p>
         </c>
         <c ca="center">
            <p>5,832</p>
         </c>
         <c ca="center">
            <p>470</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>D. hansenii </it>CBS767</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B4">4</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="center">
            <p>6,317</p>
         </c>
         <c ca="center">
            <p>12</p>
         </c>
         <c ca="center">
            <p>6,311</p>
         </c>
         <c ca="center">
            <p>981</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>C. guilliermondii </it>ATCC6260</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B5">5</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="center">
            <p>5,920</p>
         </c>
         <c ca="center">
            <p>142</p>
         </c>
         <c ca="center">
            <p>5,844</p>
         </c>
         <c ca="center">
            <p>666</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>C. lusitaniae </it>ATCC 42720</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B5">5</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="center">
            <p>5,941</p>
         </c>
         <c ca="center">
            <p>135</p>
         </c>
         <c ca="center">
            <p>5,869</p>
         </c>
         <c ca="center">
            <p>881</p>
         </c>
      </r>
   </tblbdy></tbl>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Candida Gene Order Browser (CGOB) screenshot for tandem cluster 32</p></caption><text>
   <p><b>Candida Gene Order Browser (CGOB) screenshot for tandem cluster 32</b>. Each box represents a gene and each color a chromosome (horizontal tracks). The gene in focus is highlighted with an orange border. Gene identifiers in the center of each box relate to annotations from the relevant sequencing centers. The "b" button performs a BLASTP search against CGOB's sequence database. The "S" button displays all the protein sequences in a vertical pillar. The "+" button outputs CGOB pillar data in a tabulated format. The "T" button reconstructs phylogenies on the fly. The "i" button retrieves functional data for <it>C. albicans </it>SC5314 and <it>S. cerevisiae via </it>the <it>Candida </it>Genome Database and the <it>Saccharomyces </it>genome database respectively. Non-syntenic genes are colored in grey. Genes lying in close proximity are joined by connectors: a solid bar for adjacent genes, two narrow bars connect genes up to 5 genes apart (not shown) and one narrow bar connects genes up to 20 genes apart (not shown). Inversions are denoted by orange connectors.</p>
</text><graphic file="1471-2164-11-290-2" hint_layout="double"/></fig>
<p>Similar to YGOB, CGOB allows the user to focus the screen display on a gene of interest and to view phylogenetic trees, sequences and BLASTP results (Figure <figr fid="F2">2</figr>). Hyperlinks to functional data can also be accessed for <it>C. albicans </it>SC5314 and <it>S. cerevisiae via </it>the <it>Candida </it>Genome Database <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp> and the <it>Saccharomyces </it>Genome Database <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp> respectively.</p>
<p>During manual editing of CGOB we observed that the original genome annotations contained a substantial number of apparently partial open reading frames, which we merged into full-length gene models. For example, <it>LELG_01495 </it>and <it>LELG_1496 </it>in <it>L. elongisporus </it>are both similar to parts of <it>C. albicans orf19.6045 </it>(<it>PSD1</it>). Closer inspection showed that <it>LELG_1496 </it>aligned with the N-terminus of <it>orf19.6045 </it>whereas <it>LELG_1496 </it>matches the C-terminus (Additional file <supplr sid="S1">1</supplr>). In cases like this we deleted the partial open reading frames from CGOB's homology pillars and inserted a new "merged" gene model. Overall, we identified 709 ORFs in 8 genomes that were subsequently merged into 335 full-length genes (Additional file <supplr sid="S2">2</supplr>). The <it>L. elongisporus </it>genome contained the highest number of partial ORFs (173 in total), while the highly curated genomes of <it>C. albicans </it>SC5314 and <it>C. dubliniensis </it>contain none (Table <tblr tid="T1">1</tblr>). The corrected gene sets are available for download from CGOB <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>.</p>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Merging partial open reading frames</b>. Section of alignment illustrating that the original automatically called gene sets contained partial open reading frames. In this example <it>LELG_01496 </it>and <it>LELG_01495 </it>from <it>L. elongisporus </it>are merged to give a new single gene (<it>LELG_01496</it>*).</p>
</text>
<file name="1471-2164-11-290-S1.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>List of partial ORFs in datasets obtained from sequencing centers</b>. Merged genes all have a * suffix and are present in CGOB. Partial ORFs have been removed from CGOB pillars but are present in the CGOB Blast database.</p>
</text>
<file name="1471-2164-11-290-S2.DOC">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Tandem Gene Duplications</p>
</st>
<p>Tandem gene duplication is one mechanism by which species acquire new genes, and by extrapolation, new functions. We therefore used both similarity and synteny measurements in CGOB to identify gene duplications in all <it>Candida </it>genomes. We identified and numbered all tandem clusters in each genome (Additional file <supplr sid="S3">3</supplr>). For some rapidly evolving genes, sequence similarity is not high enough to identify family members. For example, our initial BLAST based approach suggested that <it>orf19.2508 </it>(<it>PRM9</it>) and <it>orf19.2509 </it>in <it>C. albicans </it>are tandem duplicates (cluster 30, Additional file <supplr sid="S3">3</supplr>). Their orthologs in <it>C. dubliniensis</it>, <it>C. tropicalis</it>, <it>C. parapsilosis </it>and <it>L. elongisporus </it>are found adjacent to one another. However, these genes were not initially identified as tandem duplicates in the other species because they do not have a BLASTP <it>E</it>-value below our initial cut-off (See Methods). Therefore, slower evolving tandem duplicates in one <it>Candida </it>species (<it>C. albicans </it>SC5314 in this example) can be used to locate rapidly evolving tandems (or tandems with low sequence complexity) in other <it>Candida </it>genomes.</p>
<suppl id="S3">
<title>
<p>Additional file 3</p>
</title>
<text>
<p>
<b>List of all tandem duplicates located by CGOB</b>. <b>a) </b>Clusters are labeled 1-502. Those with a d<sub>N</sub>/d<sub>S </sub>value &gt; 1 are highlighted in red. Clusters displaying a ^ indicate that the initial BLAST search strategy failed to infer homology. Clusters displaying * indicate that there are intervening genes but they may be spurious gene models. Clusters displaying (INS) indicate that there is one intervening gene. <b>b) </b>List of tandem clusters with a d<sub>N</sub>/d<sub>S </sub>&gt; 1.</p>
</text>
<file name="1471-2164-11-290-S3.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<p>Tandem duplicates that subsequently underwent chromosomal rearrangement are difficult to identify. However, the ancestral arrangement can be inferred from an analysis of homologous genes in CGOB. For example, in <it>C. albicans </it>SC5314, the duplicate genes <it>orf19.7362 </it>(<it>SKN1</it>) and <it>orf19.7363 </it>(<it>KRE6</it>) are located beside one another on chromosome 3 (Figure <figr fid="F2">2</figr>, Additional file <supplr sid="S3">3</supplr> (cluster 32)). Their orthologs in <it>C. albicans </it>WO-1, <it>C. dubliniensis</it>, <it>C. tropicalis</it>, <it>C. parapsilosis</it>, <it>L. elongisporus </it>and <it>D. hansenii </it>are also adjacent (Figure <figr fid="F2">2</figr>). However in <it>P. stipitis</it>, <it>SKN1 </it>(<it>PICST_63619</it>) is located on chromosome 7 while <it>KRE6 </it>(<it>PICST_75745</it>) is located on chromosome 2 (Figure <figr fid="F2">2</figr>). The most parsimonious explanation is that a duplication of <it>SKN1 </it>or <it>KRE6 </it>occurred in an ancestor of all the <it>Candida </it>species, and this has been conserved in most. However, relocation of <it>SKN1 </it>has occurred exclusively in <it>P. stipitis </it>(Figure <figr fid="F2">2</figr>).</p>
<p>In total 901 tandem clusters were identified across all the <it>Candida </it>genomes (Table <tblr tid="T2">2</tblr>, Additional file <supplr sid="S3">3</supplr>). <it>C. lusitaniae </it>has the smallest number of tandem clusters (44) whereas <it>C. parapsilosis </it>has the highest number (139). This is noticeably high, as the closest relative of <it>C. parapsilosis</it>, <it>L. elongisporus </it>(Figure <figr fid="F1">1</figr>), contains only 93 tandem clusters (Table <tblr tid="T2">2</tblr>). The average number of genes per tandem cluster in all <it>Candida </it>species ranges is slightly greater than 2 (Table <tblr tid="T2">2</tblr>).</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>The total number of tandem duplicates found in each <it>Candida </it>species displayed in CGOB.</p></caption><tblbdy cols="6">
      <r>
         <c ca="center">
            <p>
               <b>Species</b>
            </p>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b>Tandem</b>
            </p>
            <p>
               <b>Clusters</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Species specific</b>
            </p>
            <p>
               <b>clusters</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b># of genes</b>
            </p>
            <p>
               <b>in clusters</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Average # of</b>
            </p>
            <p>
               <b>genes per cluster</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Tandem</b>
            </p>
            <p>
               <b>Duplicates</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. albicans</it>
            </p>
         </c>
         <c ca="center">
            <p>106</p>
         </c>
         <c ca="center">
            <p>21</p>
         </c>
         <c ca="center">
            <p>230</p>
         </c>
         <c ca="center">
            <p>2.17</p>
         </c>
         <c ca="center">
            <p>124 (1.99%)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. dubliniensis</it>
            </p>
         </c>
         <c ca="center">
            <p>83</p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>179</p>
         </c>
         <c ca="center">
            <p>2.16</p>
         </c>
         <c ca="center">
            <p>96 (1.62%)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. tropicalis</it>
            </p>
         </c>
         <c ca="center">
            <p>125</p>
         </c>
         <c ca="center">
            <p>62</p>
         </c>
         <c ca="center">
            <p>284</p>
         </c>
         <c ca="center">
            <p>2.27</p>
         </c>
         <c ca="center">
            <p>159 (2.56%)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. parapsilosis</it>
            </p>
         </c>
         <c ca="center">
            <p>139</p>
         </c>
         <c ca="center">
            <p>63</p>
         </c>
         <c ca="center">
            <p>328</p>
         </c>
         <c ca="center">
            <p>2.36</p>
         </c>
         <c ca="center">
            <p>189 (3.25%)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>L. elongisporus</it>
            </p>
         </c>
         <c ca="center">
            <p>93</p>
         </c>
         <c ca="center">
            <p>33</p>
         </c>
         <c ca="center">
            <p>206</p>
         </c>
         <c ca="center">
            <p>2.20</p>
         </c>
         <c ca="center">
            <p>114 (1.99%)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>D. hansenii</it>
            </p>
         </c>
         <c ca="center">
            <p>132</p>
         </c>
         <c ca="center">
            <p>84</p>
         </c>
         <c ca="center">
            <p>294</p>
         </c>
         <c ca="center">
            <p>2.23</p>
         </c>
         <c ca="center">
            <p>162 (2.56%)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>P. stipitis</it>
            </p>
         </c>
         <c ca="center">
            <p>96</p>
         </c>
         <c ca="center">
            <p>40</p>
         </c>
         <c ca="center">
            <p>204</p>
         </c>
         <c ca="center">
            <p>2.12</p>
         </c>
         <c ca="center">
            <p>108 (1.85%)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. guilliermondii</it>
            </p>
         </c>
         <c ca="center">
            <p>83</p>
         </c>
         <c ca="center">
            <p>44</p>
         </c>
         <c ca="center">
            <p>181</p>
         </c>
         <c ca="center">
            <p>2.18</p>
         </c>
         <c ca="center">
            <p>98 (1.67%)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. lusitaniae</it>
            </p>
         </c>
         <c ca="center">
            <p>44</p>
         </c>
         <c ca="center">
            <p>15</p>
         </c>
         <c ca="center">
            <p>97</p>
         </c>
         <c ca="center">
            <p>2.20</p>
         </c>
         <c ca="center">
            <p>53 (0.90%)</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Percentages in parenthesis refer to the total percentage of the genome that has arisen through tandem duplication.</p>
   </tblfn></tbl>
<p>We used CGOB to map species and lineage-specific tandem duplications (Figure <figr fid="F1">1</figr>). For example, since they last shared a common ancestor, <it>C. albicans </it>has undergone at least 21 species-specific tandem duplications gaining 24 paralogs, while its close relative <it>C. dubliniensis </it>has undergone a single tandem duplication (<it>Cd36_11890</it>, <it>Cd36_11900) </it>gaining 1 additional gene (Additional file <supplr sid="S3">3</supplr>, cluster 463). Similarly, <it>C. parapsilosis </it>has undergone 63 species-specific tandem duplications gaining 78 paralogs since diverging from its closest relative, <it>L. elongisporus</it>, which has undergone 33 tandem duplication gaining 41 paralogs in the same time (Additional file <supplr sid="S3">3</supplr>).</p>
<p>Cluster 22 (Additional file <supplr sid="S3">3</supplr>) illustrates an ancient duplication, resulting in a family of peroxisomal acyl-CoA thioesterases that are present in 3-5 tandem copies in all the <it>Candida </it>species. The cluster is particularly large in the branch containing <it>C. albicans, C. dubliniensis, C. tropicalis, C. parapsilosis </it>and <it>L. elongisporus</it>, where 5 family members are immediately adjacent to each other. The single homolog in <it>S. cerevisiae </it>is likely to be involved in fatty acid oxidation <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>. There is significant up-regulation of fatty acid &#946;-oxidation when <it>C. albicans </it>cells are engulfed by macrophages <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>, although this pathway does not appear to be essential for virulence <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp>. Many of the other tandem duplication clusters include members of larger gene families, such as lipases (cluster 10, Additional file <supplr sid="S3">3</supplr>), glucose transporters (cluster 15,61, Additional file <supplr sid="S3">3</supplr>) and ferric reductases (cluster 57, Additional file <supplr sid="S3">3</supplr>). Some clusters are lineage specific, such as the triplication of the pirin-domain genes <it>PRN2, PRN3 </it>and <it>PRN4 </it>(cluster 6, Additional file <supplr sid="S3">3</supplr>) in <it>C. albicans</it>, <it>C. dubliniensis </it>and <it>C. tropicalis</it>. The function of these genes in unknown, but they are likely to localize to the nucleus. There is an amplification of the <it>FRP6 </it>family in <it>D. hansenii </it>and <it>P. stipitis</it>; the <it>S. cerevisiae </it>orthologs are required for export of ammonia <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp> (cluster 497, Additional file <supplr sid="S3">3</supplr>). Other clusters are species-specific, such as the five adjacent 2' hydroxyisoflavone reductases (CIP1) described by Jeffries and Van Vleet <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>, which our analysis confirms is unique to <it>P. stipitis </it>(cluster 228). Most of the other species have a single copy, except for <it>C. parapsilosis</it>, which has two. <it>L. elongisporus </it>contains 5 tandem repeats of a large family with up to 13 members in this species, which is absent from all the other <it>Candida </it>genomes (cluster 280, Additional file <supplr sid="S3">3</supplr>). The function of this family is unclear but all members contain a Phosphatidylinositol Phosphate Kinase (PIPKc) domain.</p>
<p>We also determined whether tandem duplicates in individual <it>Candida </it>species are undergoing positive selection. Recent genome wide studies have shown that positive selection after tandem duplication can give rise to novel gene functions <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>, that may help pathogens evade the human immune response <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. At the DNA level, positive selection may be detected by comparing the rate of amino acid altering (nonsynonymous) nucleotide substitutions with the rate of synonymous substitution (d<sub>N</sub>/d<sub>S</sub>). A d<sub>N</sub>/d<sub>S </sub>ratio &gt; 1 is indicative of positive selection. The average d<sub>N</sub>/d<sub>S </sub>ratio for all tandem clusters was found to be 0.27 (not shown). Of the 901 <it>Candida </it>clusters examined, only 12 displayed a d<sub>N</sub>/d<sub>S </sub>ratio &gt; 1 (Additional file <supplr sid="S3">3</supplr>). Five of these are species-specific and have no homologs in any other <it>Candida </it>species (or any species in GenBank). The remaining 7 clusters under the influence of positive selection do not share homology with gene families (cell wall, hyphal, pseudohyphal, filamentous growth and biofilm functions) normally associated with pathogenicity in <it>Candida </it>
<abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. In <it>P. stipitis</it>, one cluster encodes putative ubiquitin protein ligases, one encodes zinc finger-containing proteins, and one encodes potential siderophore transporters (Additional file <supplr sid="S3">3</supplr>). In <it>D. hansenii</it>, one cluster encodes orthologs of <it>TFS1</it>, whose expression is induced during filamentation in <it>C. albicans </it>
<abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>. Overall our results suggest that the majority of <it>Candida </it>tandem duplicates are under the influence of strong purifying selection, presumably to conserve gene function.</p>
<p>We have extended an earlier analysis of duplicate genes in <it>Candida </it>genomes, which considered only members of multigene families <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. We also identified some clusters by manual inspection. For example, we first identified 85 tandem clusters in <it>C. albicans </it>SC5314 using a simple BLAST approach, and this was increased to 106 using synteny information, whereas only 24 were reported in Butler et al <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. In some species (such as <it>L.elongisporus </it>and <it>C. guilliermondii</it>) we identified a smaller number of clusters than Butler et al <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>, partly because we removed partial ORFs from the gene sets.</p>
</sec>
<sec>
<st>
<p>The <it>Candida </it>Paranome</p>
</st>
<p>Using our reannotated <it>Candida </it>genomes we determined the number of multigene families (the paranome <abbrgrp>
<abbr bid="B30">30</abbr>
</abbrgrp>) for each <it>Candida </it>species. <it>C. tropicalis </it>has the highest number (557), whereas <it>C. lusitaniae </it>(390) has the lowest (Table <tblr tid="T3">3</tblr>). In contrast, <it>C. parapsilosis </it>has the lowest number (4377) of genes that do not belong to families, whereas <it>C. lusitaniae </it>(4871) has the highest (Table <tblr tid="T3">3</tblr>). The average number of genes per multigene family is approximately 3 for all species, although all <it>Candida </it>species have larger gene families (Table <tblr tid="T3">3</tblr>).</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>The <it>Candida </it>paranome.</p></caption><tblbdy cols="8">
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b>Unique genes</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Multigene Families</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Average # genes per family</b>
            </p>
         </c>
         <c cspan="4" ca="center">
            <p>
               <b>Gene families containing</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b>2 members</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>3 members</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>4 members</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>>5 members</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>C. albicans </it>SC5314</p>
         </c>
         <c ca="center">
            <p>4662 (76.3%)</p>
         </c>
         <c ca="center">
            <p>484</p>
         </c>
         <c ca="center">
            <p>2.99</p>
         </c>
         <c ca="center">
            <p>10.0%</p>
         </c>
         <c ca="center">
            <p>3.9%</p>
         </c>
         <c ca="center">
            <p>2.7%</p>
         </c>
         <c ca="center">
            <p>7.1%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>C. albicans </it>W01</p>
         </c>
         <c ca="center">
            <p>4556 (76.8%)</p>
         </c>
         <c ca="center">
            <p>467</p>
         </c>
         <c ca="center">
            <p>2.95</p>
         </c>
         <c ca="center">
            <p>10.3%</p>
         </c>
         <c ca="center">
            <p>3.4%</p>
         </c>
         <c ca="center">
            <p>2.7%</p>
         </c>
         <c ca="center">
            <p>6.8%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. dubliniensis</it>
            </p>
         </c>
         <c ca="center">
            <p>4449 (75.1%)</p>
         </c>
         <c ca="center">
            <p>513</p>
         </c>
         <c ca="center">
            <p>2.87</p>
         </c>
         <c ca="center">
            <p>11.4%</p>
         </c>
         <c ca="center">
            <p>3.8%</p>
         </c>
         <c ca="center">
            <p>3.0%</p>
         </c>
         <c ca="center">
            <p>6.7%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. tropicalis</it>
            </p>
         </c>
         <c ca="center">
            <p>4603 (73.6%)</p>
         </c>
         <c ca="center">
            <p>557</p>
         </c>
         <c ca="center">
            <p>2.97</p>
         </c>
         <c ca="center">
            <p>11.8%</p>
         </c>
         <c ca="center">
            <p>3.7%</p>
         </c>
         <c ca="center">
            <p>2.6%</p>
         </c>
         <c ca="center">
            <p>8.3%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. parapsilosis</it>
            </p>
         </c>
         <c ca="center">
            <p>4377 (75.2%)</p>
         </c>
         <c ca="center">
            <p>472</p>
         </c>
         <c ca="center">
            <p>3.06</p>
         </c>
         <c ca="center">
            <p>10.5%</p>
         </c>
         <c ca="center">
            <p>3.5%</p>
         </c>
         <c ca="center">
            <p>2.4%</p>
         </c>
         <c ca="center">
            <p>8.4%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>L. elongisporus</it>
            </p>
         </c>
         <c ca="center">
            <p>4616 (79.6%)</p>
         </c>
         <c ca="center">
            <p>413</p>
         </c>
         <c ca="center">
            <p>2.86</p>
         </c>
         <c ca="center">
            <p>9.3%</p>
         </c>
         <c ca="center">
            <p>3.6%</p>
         </c>
         <c ca="center">
            <p>2.2%</p>
         </c>
         <c ca="center">
            <p>5.3%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>D. hansenii</it>
            </p>
         </c>
         <c ca="center">
            <p>4808 (76.2%)</p>
         </c>
         <c ca="center">
            <p>519</p>
         </c>
         <c ca="center">
            <p>2.89</p>
         </c>
         <c ca="center">
            <p>11.1%</p>
         </c>
         <c ca="center">
            <p>3.7%</p>
         </c>
         <c ca="center">
            <p>2.3%</p>
         </c>
         <c ca="center">
            <p>6.7%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>P. stipitis</it>
            </p>
         </c>
         <c ca="center">
            <p>4335 (74.3%)</p>
         </c>
         <c ca="center">
            <p>497</p>
         </c>
         <c ca="center">
            <p>3.02</p>
         </c>
         <c ca="center">
            <p>10.5%</p>
         </c>
         <c ca="center">
            <p>4.6%</p>
         </c>
         <c ca="center">
            <p>2.8%</p>
         </c>
         <c ca="center">
            <p>7.8%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. guillermondii</it>
            </p>
         </c>
         <c ca="center">
            <p>4558 (77.0%)</p>
         </c>
         <c ca="center">
            <p>473</p>
         </c>
         <c ca="center">
            <p>2.87</p>
         </c>
         <c ca="center">
            <p>10.5%</p>
         </c>
         <c ca="center">
            <p>4.5%</p>
         </c>
         <c ca="center">
            <p>1.6%</p>
         </c>
         <c ca="center">
            <p>6.4%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. lusitaniae</it>
            </p>
         </c>
         <c ca="center">
            <p>4871 (82.0%)</p>
         </c>
         <c ca="center">
            <p>390</p>
         </c>
         <c ca="center">
            <p>2.74</p>
         </c>
         <c ca="center">
            <p>9.3%</p>
         </c>
         <c ca="center">
            <p>3.0%</p>
         </c>
         <c ca="center">
            <p>1.6%</p>
         </c>
         <c ca="center">
            <p>4.1%</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>The number of single- and multi- gene families in each <it>Candida </it>species displayed in CGOB.</p>
   </tblfn></tbl>
<p>The largest gene family shared by all species contains a transporter (<it>DIP5</it>) annotated as a putative dicarboxylic amino acid permease in CGD. This family was previously suggested as a potential antifungal target, as there are no homologs in humans <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp>. All <it>Candida </it>species have at least 20 members of this family (not shown). The <it>MEP </it>family, encoding three ammonium permeases in <it>C. albicans </it>SC5314, has also suggested as an antifungal drug target <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp>. Three <it>MEP </it>genes are present in all <it>Candida </it>species except for <it>L. elongisporus</it>, which is missing the ortholog of one (<it>orf19.4446</it>, not shown). Drugs directed against these families should therefore be of broad specificity and target all <it>Candida </it>species, and are likely to have no undesired interactions with the human patient.</p>
<p>Approximately 20-25% of all <it>Candida </it>genes in CGOB belong to a multigene family (Table <tblr tid="T3">3</tblr>), similar to what has previously been reported for <it>C. albicans </it>SC5314 <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp>. This figure is lower than what has been observed for <it>S. cerevisiae </it>(~30%), which is unsurprising as <it>S. cerevisiae </it>has undergone a whole genome duplication <abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp> while <it>Candida </it>species have not <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Intron loss in <it>Candida genes</it>
</p>
</st>
<p>Yeast genomes from the Saccharomycotina are known to be intron poor; introns are found in fewer than 5% of genes from most species <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp>. The exact mechanisms of intron loss are not fully elucidated, but it is likely to occur via recombination of a chromosomal copy with a reverse transcript <abbrgrp>
<abbr bid="B35">35</abbr>
<abbr bid="B36">36</abbr>
</abbrgrp>. Introns have been predicted with some accuracy in only three of the sequenced <it>Candida </it>genomes - <it>C. albicans</it>, <it>C. dubliniensis </it>and <it>D. hansenii</it>. We therefore restricted our analysis to these species. Where we observed differences in intron locations in tandem duplicates in any one species, the corresponding genomic sequence of the other two was manually inspected to confirm intron presence or absence.</p>
<p>
<it>C. albicans </it>SC5314 has at least 381 genes containing 415 introns <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp>. Of these genes, 79 (~21%) belong to a multigene family, and five are located in tandem clusters.</p>
<p>Cluster 5 contains three paralogs in <it>C. albicans </it>(<it>orf19.5194.1</it>, <it>orf19.6837 </it>(<it>FMA1</it>), and <it>orf19.6838</it>) (Additional file <supplr sid="S3">3</supplr> and Figure <figr fid="F3">3A</figr>). The first two of these genes contain introns, as do their orthologs in <it>C. dubliniensis</it>. There is a single homolog in <it>D. hansenii</it>, which has undergone an inversion relative to <it>C. albicans</it>, but still contains an intron. However, the third gene (<it>orf19.6838</it>) does not contain an intron in any of the species (Figure <figr fid="F3">3A</figr>). The most likely hypotheses are that either the progenitor copy contained an intron and was duplicated twice, followed by intron loss in one copy, or that the progenitor was first duplicated to generate a second intron-containing copy, and then duplicated again in an RNA-mediated event.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Gene order around tandem clusters 5, 144 and 61 (A, B, C respectively)</p></caption><text>
   <p><b>Gene order around tandem clusters 5, 144 and 61 (A, B, C respectively)</b>. The diagram is re-drawn from CGOB with some species omitted for clarity. Homologs are organized in pillars. Intron containing genes are indicated with a tail. A) Cluster 5. <it>D. hansenii </it>does not contain an ortholog of FMA1. B) Cluster 144 is represented by a broken red line. GTT1 (<it>orf19.6998</it>) and <it>Cd36_85600 </it>from <it>C. albicans </it>and <it>C. dubliniensis </it>are not part of this cluster, and are located elsewhere in the genome. Diagonal lines indicate a gap of 2 and 4 genes in <it>C. albicans </it>and <it>C. dubliniensis </it>respectively. C) Cluster 61. Diagonal lines indicate a gap of 23 genes in <it>P. stipitis</it>. D) Partial alignment around the <it>C. albicans </it>HGT13 intron. The position of the intron in <it>C. albicans </it>is indicated with an inverted triangle.</p>
</text><graphic file="1471-2164-11-290-3" hint_layout="double"/></fig>
<p>Cluster 144 (Additional file <supplr sid="S3">3</supplr>) contains members of a family of glutathione-S-transferases that are present in two adjacent copies in some species and three in others (Figure <figr fid="F3">3B</figr>). Two genes in <it>C. albicans</it>, C. <it>dubliniensis </it>and <it>D. hansenii </it>have introns, suggesting that they arose through tandem duplication (Figure <figr fid="F3">3B</figr>). However, a third member of the family (<it>GTT1</it>) which lies within the cluster in <it>C. parapsilosis</it>, <it>L. elongisporus </it>and <it>C. guilliermondii </it>does not contain any introns in <it>C. albicans </it>or <it>C. dubliniensis </it>(there is no ortholog in <it>D. hansenii</it>). <it>GTT1 </it>may therefore have arisen via an RNA intermediate in an ancestral <it>Candida </it>species.</p>
<p>We also found evidence for a species-specific intron gain in a tandem duplicate. Cluster 61 (Additional file <supplr sid="S3">3</supplr>) contains two adjacent genes (<it>HGT12 </it>and <it>HGT13</it>) in <it>C. albicans</it>, <it>C. dubliniensis </it>and <it>P. stipitis </it>that belong to a large family of sugar transporters that have at least 20 members in <it>Candida </it>species <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp> (Figure <figr fid="F3">3C</figr>). The <it>HGT13 </it>homolog is not adjacent to <it>HGT12 </it>in the other species (not shown). In <it>C. albicans</it>, <it>HGT13 </it>contains an intron, whereas its paralog <it>HGT12 </it>does not. The <it>HGT13 </it>intron lies within the coding sequence, which makes it easy to identify (Figure <figr fid="F3">3D</figr>). Interestingly, although this intron is present in <it>HGT13 </it>from both <it>C. albicans </it>isolates, it is absent from all of the homologs in the other species, whether syntenic or not (Figure <figr fid="F3">3C</figr>). Intron gain is very rare <abbrgrp>
<abbr bid="B36">36</abbr>
<abbr bid="B39">39</abbr>
</abbrgrp>, but it appears more likely that <it>HGT13 </it>in <it>C. albicans </it>gained an intron, rather than the intron was independently lost from all the other species. At least one other member of the HGT family (<it>HGT9</it>) also contains introns in <it>C. albicans</it>, but these appear to be conserved in <it>C. dubliniensis </it>only.</p>
<p>Other gene families that are not tandemly arranged are also likely to have arisen through both DNA-based gene duplication and via an RNA intermediate, possibly including retrotransposition. For example in <it>C. albicans</it>, the glycosylphosphatidylinositol-linked cell wall gene <it>ECM33 </it>
<abbrgrp>
<abbr bid="B40">40</abbr>
</abbrgrp> has two paralogs <it>orf19.4955 </it>and <it>orf19.4255 </it>(<it>ECM331</it>) that are not adjacent to each other (not shown). <it>ECM33 </it>and <it>orf19.4955 </it>both contain an intron, as do their orthologs in <it>C. dubliniensis </it>and <it>D. hansenii. ECM331 </it>does not contain an intron in any of the three species. This suggests that <it>ECM33 </it>and <it>orf19.4955 </it>may have arisen through duplication, and <it>ECM331 </it>is the result of reverse transcription of one of the intron-containing paralogs, in an ancestor of the three species.</p>
</sec>
<sec>
<st>
<p>Clustering of adjacent genes in metabolic pathways</p>
</st>
<p>In bacteria the primary method of controlling gene expression is the organization of genes into operons, which are transcribed into a single mRNA. Bacterial operons often contain genes from the same metabolic pathway. Operons are not usually found in eukaryotes, with the notable exception of nematodes <abbrgrp>
<abbr bid="B41">41</abbr>
<abbr bid="B42">42</abbr>
<abbr bid="B43">43</abbr>
</abbrgrp>. However, there is evidence for clustering of genes at the same genomic location belonging to the same metabolic pathways in fungi. For example, genes involved in secondary metabolism are clustered in the genomes of filamentous ascomycetes <abbrgrp>
<abbr bid="B44">44</abbr>
</abbrgrp>, and many of the genes involved in metabolism of allantoin and galactose are clustered in the genome of <it>S. cerevisiae </it>and related species <abbrgrp>
<abbr bid="B45">45</abbr>
<abbr bid="B46">46</abbr>
</abbrgrp>. Many functionally-related genes are co-expressed, even when they do not share sequence similarity <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>. Lee and Sonnhammer <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp> found that there is significant tendency for genes from the same metabolic pathway to cluster in the genomes of fungi, and in other organisms. However, their definition of proximity was very large, and included genes that were separated by up to 400 other genes. Our analysis had a more focused approach, as we searched for evidence of genes involved in the same metabolic pathway lying up to 10 genes apart in <it>Candida </it>species.</p>
<p>Currently there are 155 metabolic pathways that have been manually curated by the <it>Candida </it>Genome Database. However, 23 of these contain only one gene, and a further 33 are redundant. For example, the list of genes involved in the acrylonitrile and aldoxime degradation pathways are identical. Similarly the tyrosol, tryptophan, phenylalanine and chorismate biosynthesis are all subsets of the superpathway of phenylalanine, tyrosine and tryptophan biosynthesis. There are 99 unique pathways, containing 659 genes. There are 511 unique genes in total, representing 8.2% of the <it>C. albicans </it>SC5314 gene set.</p>
<p>CGOB was interrogated for evidence of clustering of genes (i.e. lying within 10 genes of one another on the same chromosome) in the 99 nonredundant pathways. We identified 21 pathways that display evidence of gene clustering in at least one <it>Candida </it>species (Table <tblr tid="T4">4</tblr> and Additional file <supplr sid="S4">4</supplr>). Some metabolic pathway clusters result from tandem duplication; for example, <it>AOX1 </it>and <it>AOX2 </it>(encoding cyanide insensitive enzymes required for an alternative pathway of aerobic respiration) in <it>C. albicans</it>, <it>C. dubliniensis</it>, <it>C. tropicalis </it>and <it>C. parapsilosis</it>, were also identified as tandem cluster 9 (Additional file <supplr sid="S3">3</supplr>). There is evidence of species-specific clusters of unrelated genes, such as lysine biosynthesis and glycine biosynthesis, which are clustered in one species only (<it>C. tropicalis </it>and <it>C. parapsilosis </it>respectively, Table <tblr tid="T4">4</tblr>, Additional file <supplr sid="S4">4</supplr>).</p>
<tbl id="T4"><title><p>Table 4</p></title><caption><p>CGD metabolic pathways that show evidence of gene clustering.</p></caption><tblbdy cols="12">
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b>#Genes</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>SC5314</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>WO1</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Cdub</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Ctro</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Cpar</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Lelo</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Dhan</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Psti</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Pgui</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Clus</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="12">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Histidine, purine and pyrimidine biosynthesis</p>
         </c>
         <c ca="center">
            <p>41</p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Aerobic respiration (cyanide sensitive)</p>
         </c>
         <c ca="center">
            <p>14</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Aerobic respiration (cyanide insensitive)</p>
         </c>
         <c ca="center">
            <p>8</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>2-keto glutarate dehydrogenase complex</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>N-acetylglucosamine degradation</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="left">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Methylglyoxal pathway</p>
         </c>
         <c ca="center">
            <p>12</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Sphingolipid metabolism</p>
         </c>
         <c ca="center">
            <p>8</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Ergosterol biosynthesis</p>
         </c>
         <c ca="center">
            <p>21</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Biotin biosynthesis</p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>NAD salvage pathway</p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Tetrapyrrole biosynthesis</p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pantothenate and coA biosynthesis</p>
         </c>
         <c ca="center">
            <p>11</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Starch degradation</p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Lipid-linked oligosaccharide biosynthesis</p>
         </c>
         <c ca="center">
            <p>8</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Arginine degradation (arginase pathway)</p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Lysine biosynthesis</p>
         </c>
         <c ca="center">
            <p>6</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Superpathway of glycine biosynthesis</p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Biosynthesis of phe/tyr/trp</p>
         </c>
         <c ca="center">
            <p>13</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Superpathway of glycine biosynthesis</p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Acrylonitrile degradation</p>
         </c>
         <c ca="center">
            <p>4</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Galactose degradation</p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>4<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(2D)</sup></p>
         </c>
         <c ca="left">
            <p>4<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(2D)</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>tRNA charging pathway</p>
         </c>
         <c ca="center">
            <p>35</p>
         </c>
         <c ca="center">
            <p>13<sup>(6D)</sup></p>
         </c>
         <c ca="center">
            <p>13<sup>(6D)</sup></p>
         </c>
         <c ca="center">
            <p>13<sup>(6D)</sup></p>
         </c>
         <c ca="center">
            <p>10<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>7<sup>(4D)</sup></p>
         </c>
         <c ca="center">
            <p>9<sup>(6D)</sup></p>
         </c>
         <c ca="center">
            <p>9<sup>(4D)</sup></p>
         </c>
         <c ca="center">
            <p>13<sup>(6D)</sup></p>
         </c>
         <c ca="center">
            <p>9<sup>(4D)</sup></p>
         </c>
         <c ca="center">
            <p>5<sup>(4D)</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Fatty acid oxidation pathway</p>
         </c>
         <c ca="center">
            <p>14</p>
         </c>
         <c ca="center">
            <p>4<sup>(4D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(4D)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(4D)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(4D)</sup></p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>2<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2D)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Glutathione-glutaredoxin redox reactions</p>
         </c>
         <c ca="center">
            <p>9</p>
         </c>
         <c ca="center">
            <p>5<sup>(3D, 3D)</sup></p>
         </c>
         <c ca="left">
            <p>5<sup>(2D,2T)</sup>*</p>
         </c>
         <c ca="center">
            <p>6<sup>(3D,3T)</sup></p>
         </c>
         <c ca="center">
            <p>3<sup>(3T)</sup></p>
         </c>
         <c ca="center">
            <p>4<sup>(4T)</sup></p>
         </c>
         <c ca="center">
            <p>3<sup>(3T)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>3<sup>(3T)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2D)</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Isoleucine &amp; phenylalanine degradation</p>
         </c>
         <c ca="center">
            <p>12</p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Removal of superoxide radicals</p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>2<sup>(2T)</sup></p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
         <c ca="center">
            <p>-</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Superscript numerals in parenthesis refer to the numbers of genes that are either paralogs <sup>D </sup>or tandem duplicates <sup>T</sup>.</p>
   </tblfn></tbl>
<suppl id="S4">
<title>
<p>Additional file 4</p>
</title>
<text>
<p>
<b>List of CGD pathways and the corresponding genes that display evidence of clustering in each <it>Candida </it>species</b>. Numbers in parenthesis refer to cluster numbers and are retained across species. Clusters with "sig" in parenthesis infer that the cluster is significantly better than randomized data.</p>
</text>
<file name="1471-2164-11-290-S4.DOC">
   <p>Click here for file</p>
</file>
</suppl>
<p>A high proportion (48%) of the clusters identified contain only two genes and may not be biologically significant, as they appear at a high frequency in randomized data (see Methods). However, the metabolic clusters discussed here are highly significant, particularly for the three pathways discussed below.</p>
</sec>
<sec>
<st>
<p>(i) The biotin biosynthesis pathway</p>
</st>
<p>Biotin or vitamin H acts as a cofactor for a set of enzymes that catalyze carboxylation, decarboxylation, and transcarboxylation reactions in a number of crucial metabolic processes <abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp>. Most multicellular eukaryotes (except for plants) are biotin auxotrophs, whereas many bacterial species and some fungi (including <it>Aspergillus </it>and <it>Saccharomyces </it>species) are biotin prototrophs <abbrgrp>
<abbr bid="B49">49</abbr>
<abbr bid="B50">50</abbr>
</abbrgrp>.</p>
<p>In <it>S. cerevisiae </it>6 genes are involved in the production of biotin (<it>BIO1-6</it>). These are located in 2 clusters (<it>BIO1</it>/<it>BIO6 </it>and <it>BIO3</it>/<it>BIO4</it>/<it>BIO5</it>, with <it>BIO2 </it>at a different location) <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>. Hall and Dietrich <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp> showed that the original eukaryotic biotin pathway was lost in the last common ancestor of <it>Candida </it>and <it>Saccharomyces </it>species, but it has been rebuilt through horizontal gene transfer from bacterial species via transfers of <it>BIO3 </it>from &#948;-proteobacteria and <it>BIO4 </it>from &#945;-proteobacteria, followed by gene duplication and neofunctionalization.</p>
<p>We identified a biotin cluster of four genes (orthologs of <it>S. cerevisiae BIO2</it>, <it>BIO3</it>, <it>BIO4 </it>and <it>BIO5</it>) in both <it>C. albicans </it>strains (Figure <figr fid="F4">4</figr>). There is however an inversion of the surrounding region between SC5314 and WO-1 (Figure <figr fid="F4">4</figr>); this appears to result from a rearrangement between two members of the oligopeptide transporter gene family, <it>OPT9 </it>(a pseudogene) and <it>OPT1</it>.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Gene order around the biotin cluster</p></caption><text>
   <p><b>Gene order around the biotin cluster</b>. The diagram is re-drawn from CGOB with some genes and species omitted for clarity. Blocks of color represent chromosomes. Homologs are organized in pillars. Changes in color indicate breaks in synteny. The grey triangles indicate an inversion between the <it>C. albicans </it>SC5314 and WO-1 isolates. Diagonal lines indicate local inversions. Genes shown in grey boxes are not adjacent to any other gene shown.</p>
</text><graphic file="1471-2164-11-290-4" hint_layout="double"/></fig>
<p>The cluster in <it>C. albicans </it>is larger than the equivalent region in <it>S. cerevisiae </it>as it includes <it>BIO2. BIO2 </it>orthologs are in the same chromosomal region in <it>C. albicans</it>, <it>D. hansenii</it>, <it>C. lusitaniae </it>and <it>C. guilliermondii </it>(Figure <figr fid="F4">4</figr>). <it>BIO3, BIO4 </it>and <it>BIO5 </it>are also adjacent to each other in <it>C. tropicalis</it>, and they appear to have been recruited to the <it>BIO2 </it>region in <it>C. albicans</it>. Almost the entire cluster, together with an adjacent <it>OPT </it>gene, is missing from <it>C. dubliniensis</it>. Only <it>BIO2 </it>remains, and this is located elsewhere in the genome. The absence of the biotin cluster in <it>C. dubliniensis </it>has previously been reported, and it was suggested that its presence in <it>C. albicans </it>may contribute to increased prevalence and virulence <abbrgrp>
<abbr bid="B51">51</abbr>
</abbrgrp>. The entire set of <it>BIO </it>genes is also absent from <it>C. parapsilosis </it>and <it>L. elongisporus</it>, and was probably lost in their last common ancestor (not shown). There is some conservation of synteny of the surrounding genes (not shown), suggesting the genes were lost together, as a cluster. Unlike <it>S. cerevisiae</it>, the biotin clusters in <it>C. albicans </it>and <it>C. tropicalis </it>are not sub-telomeric.</p>
<p>The remaining <it>Candida </it>species contain some genes involved in biotin synthesis. <it>BIO2 </it>is present in almost all species, suggesting it may play a role independent of biotin synthesis. <it>BIO4 </it>and <it>BIO5 </it>are clustered in <it>D. hansenii </it>with <it>BIO2 </it>elsewhere in the genome, whereas <it>BIO2, BIO3 </it>and <it>BIO4 </it>are present in <it>P. stipitis</it>, but are not clustered (not shown). It is not clear why some components of the pathway are retained in some species. However, it may enable them to make biotin from some intermediates, as was described for <it>S. cerevisiae </it>
<abbrgrp>
<abbr bid="B52">52</abbr>
</abbrgrp>. It is generally assumed however that clustering of genes in biosynthetic pathways is the result of selection against toxic intermediates produced by incomplete pathways <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>. It is likely that the ancestral <it>Candida </it>species was able to synthesize biotin, but there has been substantial gene loss in many species.</p>
<p>In <it>S. cerevisiae BIO6 </it>is believed to have arisen through gene duplication of <it>BIO3 </it>followed by subfunctionalization <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>. We cannot locate an ortholog of <it>BIO6 </it>in any <it>Candida </it>species. Similarly we cannot locate any <it>Candida </it>ortholog of <it>BIO1 </it>(pimeloyl-CoA synthetase), the first enzyme involved in synthesizing biotin from pimelic acid. In <it>S. cerevisiae </it>S288C, <it>BIO1 </it>and <it>BIO6 </it>are pseudogenes <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>, but there is no evidence of corresponding pseudogenes in any <it>Candida </it>species. It is therefore unlikely that the genes are present in other unsequenced isolates of the same species.</p>
<p>The CGD biotin pathway data suggests that <it>orf19.3567 </it>(<it>BIO32</it>) is involved in biotin synthesis. <it>BIO32 </it>has a top BLASTP hit to <it>BIO3 </it>in <it>S. cerevisiae</it>. However, <it>BIO3 </it>belongs to a multigene family that also contains <it>ARG8</it>, <it>CAR2 </it>and <it>UGA1</it>. To determine the origin of <it>BIO32 </it>we reconstructed a phylogenetic tree using the same sequences used by Hall and Dietrich <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>, and included <it>ARG8</it>, <it>CAR2 </it>and <it>UGA1 </it>from <it>C. albicans </it>and <it>S. cerevisiae</it>. Our phylogeny places the <it>S. cerevisiae </it>and <it>C. albicans BIO3 </it>orthologs together with bacterial sequences, indicating that they originated from horizontal gene transfer as suggested by Hall and Dietrich <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>. <it>S. cerevisiae BIO6 </it>is also grouped in this clade, supporting the hypothesis that it is a duplicate of <it>BIO3</it>. However, <it>BIO32 </it>from <it>C. albicans </it>is grouped with <it>S. cerevisiae </it>and <it>C. albicans </it>orthologs of <it>ARG8</it>, <it>CAR2 </it>and <it>UGA1 </it>in a separate clade (not shown). <it>BIO32 </it>is therefore most likely a duplicate of one of these genes, and is more likely to be involved in arginine or glutamate metabolism than in biotin synthesis.</p>
</sec>
<sec>
<st>
<p>(ii) The <it>N</it>-acetylglucosamine regulon</p>
</st>
<p>It has been proposed that the ability of pathogenic strains of <it>Candida </it>to utilize sugars such as glucosamine and <it>N</it>-acetylglucosamine (Nag) as alternative carbon sources are important virulence factors <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp>. <it>C. albicans </it>mutants incapable of utilizing Nag are less virulent in a murine model of systemic candidiasis compared to wild type isolates <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp>. The three genes involved in the conversion of Nag to fructose-6-phosphate encode hexokinase kinase (<it>HXK1/orf19.2154</it>), Nag-6-phosphate deaminase (<it>NAG1/orf19.2156</it>) and Nag-6-phosphate deacetylase (<it>DAC1/orf19.2157</it>). These act sequentially on Nag and are present in <it>C. albicans </it>in a cluster termed the Nag regulon <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp>.</p>
<p>Our clustering analysis shows that the Nag regulon is conserved in all <it>Candida </it>species, with the exception of <it>C. lusitaniae </it>(Figure <figr fid="F5">5</figr>, Additional file <supplr sid="S4">4</supplr>). In the latter species, there has been an insertion of 5 species-specific genes in the region between <it>HXK1 </it>and <it>NAG1</it>, resulting in a sequence of 19,931 basepairs (bp), whereas the intergenic region in the other species is less than 516 bp. Several of the inserted genes encode members of a family of cell wall genes, related to <it>Flo1 </it>from <it>S. cerevisiae</it>. The Nag cluster is sub-telomeric in many of the <it>Candida </it>species, and repeats of cell wall genes are commonly found near telomeres <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp>. The conservation of the Nag regulon in pathogens like <it>C. albicans </it>and nonpathogens such as <it>P. stipitis </it>suggests that the ability to utilize Nag is not a virulence factor. <it>NAG3</it>, <it>NAG4 </it>and <it>NAG6</it>, which lie close to the NAG cluster in many <it>Candida </it>species (Figure <figr fid="F5">5</figr>), are not involved in the conversion of Nag, but are more likely to encode drug efflux pumps <abbrgrp>
<abbr bid="B56">56</abbr>
<abbr bid="B57">57</abbr>
</abbrgrp>. <it>NAG3 </it>is a tandem duplicate of <it>NAG4 </it>(Additional file <supplr sid="S3">3</supplr>, cluster 59), which occurred in the ancestor of <it>C. albicans, C. dubliniensis</it>, <it>C. tropicalis</it>, <it>C. parapsilosis </it>and <it>L. elongisporus </it>(Figure <figr fid="F1">1</figr>).</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Gene order around the <it>N</it>-acetylglucosamine (NAG) cluster</p></caption><text>
   <p><b>Gene order around the <it>N</it>-acetylglucosamine (NAG) cluster</b>. The diagram is re-drawn from CGOB. Blocks of color represent chromosomes, homologs are organized in pillars and genes shown in grey boxes are not adjacent to any other gene shown. Synteny of NAG enzymes is observed in all species except for <it>C. lusitaniae</it>, which has 5 intervening genes displayed as an insertion loop.</p>
</text><graphic file="1471-2164-11-290-5" hint_layout="double"/></fig>
<p>The phylogenomic distribution of the Nag regulon is intriguing. The cluster is found in the <it>Candida </it>species, but <it>NAG1 </it>and <it>DAC1 </it>are missing in the <it>Saccharomyces </it>lineage. Homologs are also absent from <it>Ashbya gossypii </it>and <it>Kluyveromyces waltii</it>, suggesting the cluster is missing from the entire Saccharomycetes lineage. However, the origin of the NAG cluster may be an ancient event. <it>DAC1 </it>and <it>NAG1 </it>are in close proximity (within two genes) in the <it>Aspergilli </it>and in <it>Neurospora crassa</it>, which belong to the Pezizomycotina, a sister clade to the Saccharomycotina. <it>DAC1 </it>and <it>NAG1 </it>also lie within 2 genes in the Basidiomycete, <it>Ustilago maydis</it>. If the cluster arose in an ancestor of the Ascomycota and the Basidiomycota, it is very ancient, and the genes have been subsequently lost from many species (including <it>Schizosaccharomyces</it>).</p>
</sec>
<sec>
<st>
<p>(iii) The Leloir galactose utilization pathway</p>
</st>
<p>Galactose is utilized by most organisms through its conversion to glucose-6-phosphate, which then enters glycolysis <abbrgrp>
<abbr bid="B58">58</abbr>
</abbrgrp>. The GAL pathway is composed of both structural and regulator elements <abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>. The galactose metabolism structural genes of <it>S. cerevisiae </it>and <it>C. albicans </it>are well conserved, whereas their regulatory components are distinct <abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>. In <it>C. albicans </it>the structural genes (<it>GAL1</it>, <it>GAL10 </it>and <it>GAL7</it>) are arranged in a cluster close to a hexose transporter <it>HGT2 </it>
<abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>. This cluster, together with two additional uncharacterized genes which lie between <it>GAL10 </it>and <it>GAL7</it>, is conserved in <it>C. albicans</it>, <it>C. dubliniensis</it>, <it>C. parapsilosis </it>and <it>D. hansenii </it>
<abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>. We show that the GAL pathway cluster is conserved in all <it>Candida </it>species present in CGOB (Figure <figr fid="F6">6</figr>).</p>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>Gene order around the Galactose (GAL) cluster</p></caption><text>
   <p><b>Gene order around the Galactose (GAL) cluster</b>. The diagram is re-drawn from CGOB. Blocks of color represent chromosomes and homologs are organized in pillars. Diagonal lines indicate local inversions. Genes shown in grey boxes are not adjacent to any other gene shown. Dubious genes (<it>orf19.3671 </it>and <it>CLUG_02293</it>) from <it>C. albicans </it>and <it>C. lusitaniae </it>are not displayed.</p>
</text><graphic file="1471-2164-11-290-6" hint_layout="double"/></fig>
<p>Both <it>C. albicans </it>strains and <it>C. lusitaniae </it>have a gene insertion between <it>GAL1 </it>and <it>GAL10 </it>(not shown). The <it>C. albicans </it>gene (<it>orf19.3671</it>) is designated "dubious " by CGD, and is a pseudogene in WO-1, with no significant similarity to any other gene known from any other organism. Similarly the <it>C. lusitaniae </it>gene (<it>CLUG_02293</it>) has no significant homologs in either GenBank or CGOB Blast databases. The intergenic regions between <it>GAL1 </it>and <it>GAL10 </it>are 490 and 1362 nucleotides in <it>C. lusitaniae </it>and <it>C. albicans</it>, similar to the intergenic regions in all the other <it>Candida </it>species. It is likely therefore that both <it>orf19.3671 </it>and <it>CLUG_02293 </it>are errors in annotation, rather than real genes and so are not shown in Figure <figr fid="F6">6</figr>.</p>
<p>Expression of the hexose transporter <it>HGT2 </it>is strongly induced by galactose in <it>C. albicans </it>
<abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>. An ortholog of <it>HGT2 </it>is also very close to <it>GAL1 </it>in <it>C. dubliniensis</it>, <it>C. tropicalis</it>, <it>C. guilliermondii </it>and <it>C. lusitaniae </it>(Figure <figr fid="F6">6</figr>). <it>HGT2 </it>belongs to a large gene family, and while multiple homologs were located in many species, there is no family member adjacent to the GAL cluster in <it>C. parapsilosis, L. elongisporus </it>and <it>P. stipitis</it>. A putative ortholog in <it>D. hansenii </it>was identified, although it resides on a different chromosome to the GAL genes (Figure <figr fid="F6">6</figr>). It is possible that even though the relative position of the hexose transporter is not conserved, co-expression with the GAL genes may be.</p>
<p>Interestingly, <it>orf19.3674</it>, which lies between <it>GAL10 </it>and <it>GAL7</it>, appears to be a paralog of <it>GAL10</it>, and is conserved in all the <it>Candida </it>species. This was also noted in the <it>P. stipitis </it>genome <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. Gal10 is more than twice the size of orf19.3674 (675 <it>vs</it>. 320 amino acids), and contains two recognized protein domains, an NAD dependent epimerase/dehydratase domain and an aldose 1-epimerase domain. Only the first domain is present in <it>orf19.3674 </it>and its orthologs. Expression of this gene is not influenced by galactose in <it>C. albicans </it>
<abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>. <it>orf19.3674 </it>may therefore have undergone subfunctionalization after duplication. Alternatively, recombination between a <it>GAL10 </it>precursor and another gene may have led to gene with a novel function. An ortholog of the adjacent conserved gene (<it>orf19.3673</it>) encodes a subunit of the transport protein particle (TRAPP) of the cis-Golgi in <it>S. cerevisiae </it>
<abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp> and is unlikely to be involved in galactose metabolism.</p>
</sec>
<sec>
<st>
<p>KEGG Analysis</p>
</st>
<p>We also assigned genes in each <it>Candida </it>species to individual metabolic pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG) <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp>. This approach permitted us to investigate species-specific metabolic pathways, as well as pathways that have been described only in <it>C. albicans </it>SC5314. Approximately 200 metabolic pathways were reconstructed for each <it>Candida </it>species (Table <tblr tid="T5">5</tblr>). However, ~30% of these were redundant (Table <tblr tid="T5">5</tblr>). For example, in <it>C. dubliniensis</it>, the inferred components of the pathways for peptidoglycan and alkaloid biosynthesis are completely contained in the alanine, aspartate and glutamate metabolism pathway (not shown). The number of gene assignments to the non-redundant pathways is approximately 2000 for each <it>Candida </it>species. On average close to 50% of these are represented in multiple pathways (Table <tblr tid="T5">5</tblr>). Therefore between 16-19% of genes from each <it>Candida </it>species have been successfully assigned to a unique KEGG metabolic pathway (Table <tblr tid="T5">5</tblr>), equating to 17.8% of all <it>Candida </it>genes represented in CGOB.</p>
<tbl id="T5"><title><p>Table 5</p></title><caption><p>KEGG metabolic pathways that show evidence of gene clustering in <it>Candida </it>species.</p></caption><tblbdy cols="6">
      <r>
         <c ca="left">
            <p>
               <b>Species</b>
            </p>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b>Pathways</b>
            </p>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b>Non-Redundant</b>
            </p>
            <p>
               <b>pathways</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Genes in</b>
            </p>
            <p>
               <b>Pathways</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Unique Genes</b>
            </p>
            <p>
               <b>in Pathways</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Metabolic</b>
            </p>
            <p>
               <b>clusters</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. albicans</it>
            </p>
         </c>
         <c ca="center">
            <p>190</p>
         </c>
         <c ca="center">
            <p>136</p>
         </c>
         <c ca="center">
            <p>1870</p>
         </c>
         <c ca="center">
            <p>991 (16.0%)</p>
         </c>
         <c ca="center">
            <p>38</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. dubliniensis</it>
            </p>
         </c>
         <c ca="center">
            <p>196</p>
         </c>
         <c ca="center">
            <p>139</p>
         </c>
         <c ca="center">
            <p>1857</p>
         </c>
         <c ca="center">
            <p>968 (16.3%)</p>
         </c>
         <c ca="center">
            <p>39</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. tropicalis</it>
            </p>
         </c>
         <c ca="center">
            <p>201</p>
         </c>
         <c ca="center">
            <p>139</p>
         </c>
         <c ca="center">
            <p>1864</p>
         </c>
         <c ca="center">
            <p>988 (15.9%)</p>
         </c>
         <c ca="center">
            <p>35</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. parapsilosis</it>
            </p>
         </c>
         <c ca="center">
            <p>204</p>
         </c>
         <c ca="center">
            <p>149</p>
         </c>
         <c ca="center">
            <p>2026</p>
         </c>
         <c ca="center">
            <p>1062 (18.3%)</p>
         </c>
         <c ca="center">
            <p>34</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>L. elongisporus</it>
            </p>
         </c>
         <c ca="center">
            <p>202</p>
         </c>
         <c ca="center">
            <p>142</p>
         </c>
         <c ca="center">
            <p>1991</p>
         </c>
         <c ca="center">
            <p>1048 (18.4%)</p>
         </c>
         <c ca="center">
            <p>33</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>D. hansenii</it>
            </p>
         </c>
         <c ca="center">
            <p>209</p>
         </c>
         <c ca="center">
            <p>152</p>
         </c>
         <c ca="center">
            <p>2165</p>
         </c>
         <c ca="center">
            <p>1148 (18.2%)</p>
         </c>
         <c ca="center">
            <p>39</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>P. stipitis</it>
            </p>
         </c>
         <c ca="center">
            <p>205</p>
         </c>
         <c ca="center">
            <p>151</p>
         </c>
         <c ca="center">
            <p>2134</p>
         </c>
         <c ca="center">
            <p>1114 (19.1%)</p>
         </c>
         <c ca="center">
            <p>44</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. guilliermondii</it>
            </p>
         </c>
         <c ca="center">
            <p>210</p>
         </c>
         <c ca="center">
            <p>158</p>
         </c>
         <c ca="center">
            <p>2162</p>
         </c>
         <c ca="center">
            <p>1126 (19.3%)</p>
         </c>
         <c ca="center">
            <p>36</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C. lusitaniae</it>
            </p>
         </c>
         <c ca="center">
            <p>207</p>
         </c>
         <c ca="center">
            <p>146</p>
         </c>
         <c ca="center">
            <p>2069</p>
         </c>
         <c ca="center">
            <p>1091 (18.6%)</p>
         </c>
         <c ca="center">
            <p>34</p>
         </c>
      </r>
   </tblbdy></tbl>
<p>We interrogated each <it>Candida </it>species in CGOB for evidence of clustering in the non-redundant KEGG pathways (Table <tblr tid="T5">5</tblr>). In total we identified 62 pathways; (33-44 per species) that display some evidence of gene clustering (Table <tblr tid="T5">5</tblr>). There are 767 KEGG clusters (KCs) shared amongst all species, and of these 32 have arisen through tandem duplication (Additional file <supplr sid="S5">5</supplr>). Most of the identified KCs are small, containing two or three genes. A high proportion (75%) of these may not be biologically significant (Additional file <supplr sid="S4">4</supplr>), as they appear at a high frequency in randomized data (see Methods).</p>
<suppl id="S5">
<title>
<p>Additional file 5</p>
</title>
<text>
<p>
<b>List of KEGG pathways and the corresponding genes that display evidence of clustering in each <it>Candida </it>species</b>. Clusters have been assigned numbers (KC) so it is possible to locate a cluster present in one species that is absent in another. Clusters denoted with a TD infer that the cluster has arisen through tandem duplication. Clusters that are significantly better than randomized data are highlighted with purple shading.</p>
</text>
<file name="1471-2164-11-290-S5.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<p>Overall the observed KEGG metabolic pathway clusters are generally distinct from those located using the CGD pathways (Additional file <supplr sid="S4">4</supplr> and Additional file <supplr sid="S5">5</supplr>). There is a small degree of crossover, including the CGD tRNA charging pathway, which is analogous to KEGG's aminoacyl-tRNA biosynthesis pathway, the CGD aerobic respiration pathway which is equivalent to oxidative phosphorylation in KEGG, and the galactose metabolism and histidine metabolism pathways in both. Several pathways (such as clustering of histone protein genes (ko05322) and ribosomal protein genes (ko03010) have been described previously. It is likely that other clusters will be identified when the assignments to pathways improve. For example, Jeffries and Van Vleet <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp> identified some small clusters of functionally-related genes in <it>P. stipitis </it>by visual inspection. Our approach found some of these, but not all.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>We describe here a unique tool for studying evolution and gene function in <it>Candida </it>species. During the development of CGOB we improved the existing annotations for several species, by identifying and removing partial open reading frames, and by manually assigning homology, based on sequence similarity and synteny. We also provide a detailed analysis of gene clusters in <it>Candida</it>, which will provide a basis for future investigation. We identified many of the clusters described in only one species <abbrgrp>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
<abbr bid="B26">26</abbr>
<abbr bid="B53">53</abbr>
<abbr bid="B59">59</abbr>
</abbrgrp>. However, we have also shown the benefits of a comparative approach; some clusters (such as NAG) although originally described in <it>C. albicans </it>only are present in all <it>Candida </it>species, whereas others (such as CIP) are unique to one (<it>P. stiptis</it>). Our analysis provides an important resource that is now available for the <it>Candida </it>community.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Genome Data</p>
</st>
<p>The complete <it>C. albicans </it>(SC5314) genome (Assembly 21 <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp>) was obtained from the <it>Candida </it>genome database (CGD) <abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp>. Gene sets for <it>C. albicans </it>WO-1, <it>C. tropicalis</it>, <it>L. elongisporus</it>, <it>C. guilliermondii</it>, and <it>C. lusitaniae </it>
<abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp> were obtained directly from the Broad Institute <abbrgrp>
<abbr bid="B63">63</abbr>
</abbrgrp> and for <it>C. dubliniensis </it>
<abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp> from GeneDB at the Wellcome Trust Sanger Institute <abbrgrp>
<abbr bid="B64">64</abbr>
</abbrgrp>. The first assembly of the <it>C. parapsilosis </it>genome was downloaded from the Sanger Institute <abbrgrp>
<abbr bid="B65">65</abbr>
</abbrgrp> and in-house gene annotations were called (as described in Fitzpatrick et al <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>). The resultant gene set contains 5,809 protein-coding genes. The <it>C. parapsilosis </it>genome was also automatically annotated by the Broad Institute <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>, and we use these gene names where possible.</p>
</sec>
<sec>
<st>
<p>Phylogenetic relationships</p>
</st>
<p>Phylogenetic relationships were determined using a supertree approach. All ten <it>Candida </it>genomes as well as two outgroups (<it>Saccharomyces cerevisiae </it>and <it>Candida glabrata</it>) were merged into a local Blast database. For a full descriptions of the methodology used please refer to Fitzpatrick et al <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Homology pillars and genome editing</p>
</st>
<p>Sets of homologous genes are stored in CGOB's pillars (Figure <figr fid="F2">2</figr>). Pillars are the core data structures used to store homology assignments across all species <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. All genes were integrated into homology pillars by performing an automated bi-directional best BLASTP hit (E- value cut-off of 10<sup>-5</sup>) strategy against <it>C. albicans </it>SC5314. A second round of automated searching merged singleton pillars using a BLASTP hit (E- value cut-off of 10<sup>-5</sup>) and synteny with at least one gene in an adjacent pillar. We then systematically manually edited CGOB by browsing along each <it>Candida </it>chromosome validating and refining homology pillars.</p>
<p>Several potential genes in the automatically called open reading frames sets are incomplete or "partial". We merged partial ORFs where possible, by aligning them against their complete orthologs from the other <it>Candida </it>genomes using Muscle <abbrgrp>
<abbr bid="B66">66</abbr>
</abbrgrp>. The resultant alignments were manually checked and where appropriate, partial ORFs were merged and the resulting gene models were renamed, and added to CGOB's pillars. For completeness both the merged genes and the original partial ORFs have been retained in the CGOB Blast sequence database.</p>
</sec>
<sec>
<st>
<p>Duplications</p>
</st>
<p>Genes that have arisen through tandem duplication were located using bl2seq from the NCBI suite of Blast executables. A tandem repeat was defined as adjacent genes with an E- value cut-off of 10<sup>-10 </sup>with a highest scoring sequence pair (HSP) more than half the length of the shortest sequence. This approach filters out genes with similarity over short regions. Tandem genes that are evolving rapidly or have low sequence complexity may not be located using sequence similarity. We therefore programmed CGOB to compare tandem duplicates in all genomes, and used synteny to locate fast evolving tandems (or tandems with low complexity) in another genome.</p>
<p>Synonymous (d<sub>S</sub>) and nonsynonymous (d<sub>N</sub>) substitution rates for genes located in tandem clusters were estimated using the methods of Yang and Nielsen <abbrgrp>
<abbr bid="B67">67</abbr>
</abbrgrp> as implemented in yn00 in the PAML suite <abbrgrp>
<abbr bid="B68">68</abbr>
</abbrgrp>.</p>
<p>To identify multigene families, every gene in a particular <it>Candida </it>proteome was searched against every other gene in its cognate genome. Genes with a BLASTP E- value less than 10<sup>-30 </sup>and a HSP more than 60% the length of the shortest sequence were considered to be members of the same family, this is the same strategy used by Braun <it>et al </it>
<abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Locating clusters of adjacent genes in metabolic pathways</p>
</st>
<p>Metabolic pathways for <it>C. albicans </it>SC5314 were downloaded from the <it>Candida </it>Genome Database <abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp>. The gene identifiers for each enzymatic step were mapped on CGOB. Clusters were defined as identifiers belonging to a particular metabolic pathway that lie within a contiguous window of 10 genes. The presence or absences of <it>C. albicans </it>SC5314 pathway homologs were then scored in the remaining nine <it>Candida </it>genomes.</p>
<p>For completeness we automatically inferred individual metabolic pathways for all <it>Candida </it>species using the KEGG automatic annotation server (KAAS) <abbrgrp>
<abbr bid="B69">69</abbr>
</abbrgrp>. KAAS is based on reciprocally best BLAST similarity hits against all KEGG orthology (KO) groups of functionally related genes assigned in the KEGG GENES database. KAAS assigned each <it>Candida </it>gene a KO number and these were subsequently mapped to one of KEGG's reference metabolic pathways. All <it>Candida </it>KO identifiers were mapped onto CGOB and we searched for metabolic clusters as described above.</p>
<p>The significance of metabolic clusters was tested using simulations where gene order was randomized to give pseudogenomes. Both CGD and KEGG pathway components were mapped onto randomized genome data and scored as described above. This process was repeated 10000 times for each pathway in each <it>Candida </it>genome. Clusters are considered significant if the number of linked genes in the pseudogenome is less than that observed in the real genome 95% of the time.</p>
</sec>
</sec>
<sec>
<st>
<p>Abbreviations</p>
</st>
<p>
<b>CGD</b>: <it>Candida </it>genome database; <b>CGOB</b>: Candida Gene Order Browser; <b>YGOB</b>: Yeast Gene Order Browser; <b>HSP: </b>highest scoring sequence pair; <b>KEGG</b>: Kyoto Encyclopedia of Genes and Genomes; <b>KCs</b>: KEGG clusters; <b>KAAS</b>: KEGG automatic annotation server; <b>KO</b>: KEGG orthology; <b>ORF: </b>open reading frame; <b>PIPKc</b>: Phosphatidylinositol Phosphate Kinase; <b>Nag</b>: <it>N</it>-acetylglucosamine; <b>HXK1</b>: hexokinase kinase; <b>DAC1</b>
<it>: </it>Nag-6-phosphate deacetylase; <b>bp</b>: basepairs; <b>d<sub>S</sub>
</b>: synonymous substitution; <b>d<sub>N</sub>
</b>: nonsynonymous substitution.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>DAF and GB were involved in the design phase. KPB developed and installed software. POG installed software. POG and DAF sourced homologs. DAF merged partial genes and manually curated homology columns. DAF and GB examined synteny, duplication and cluster data. DAF and GB drafted the manuscript. All authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>We would like to acknowledge the financial support of the Health Research Board of Ireland and Science Foundation Ireland (SFI 08/I.1/B1865), and the assistance of Ken Wolfe in adapting YGOB and for reading the manuscript.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Nosocomial bloodstream infections in US hospitals: analysis of 24,179 cases from a prospective nationwide surveillance study</p></title><aug><au><snm>Wisplinghoff</snm><fnm>H</fnm></au><au><snm>Bischoff</snm><fnm>T</fnm></au><au><snm>Tallent</snm><fnm>SM</fnm></au><au><snm>Seifert</snm><fnm>H</fnm></au><au><snm>Wenzel</snm><fnm>RP</fnm></au><au><snm>Edmond</snm><fnm>MB</fnm></au></aug><source>Clin Infect Dis</source><pubdate>2004</pubdate><volume>39</volume><fpage>309</fpage><lpage>317</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1086/421946</pubid><pubid idtype="pmpid" link="fulltext">15306996</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Epidemiology of invasive candidiasis: a persistent public health problem</p></title><aug><au><snm>Pfaller</snm><fnm>MA</fnm></au><au><snm>Diekema</snm><fnm>DJ</fnm></au></aug><source>Clin Microbiol Rev</source><pubdate>2007</pubdate><volume>20</volume><fpage>133</fpage><lpage>163</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/CMR.00029-06</pubid><pubid idtype="pmcid">1797637</pubid><pubid idtype="pmpid">17223626</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Genome sequence of the lignocellulose-bioconverting and xylose-fermenting yeast Pichia stipitis</p></title><aug><au><snm>Jeffries</snm><fnm>TW</fnm></au><au><snm>Grigoriev</snm><fnm>IV</fnm></au><au><snm>Grimwood</snm><fnm>J</fnm></au><au><snm>Laplaza</snm><fnm>JM</fnm></au><au><snm>Aerts</snm><fnm>A</fnm></au><au><snm>Salamov</snm><fnm>A</fnm></au><au><snm>Schmutz</snm><fnm>J</fnm></au><au><snm>Lindquist</snm><fnm>E</fnm></au><au><snm>Dehal</snm><fnm>P</fnm></au><au><snm>Shapiro</snm><fnm>H</fnm></au><au><snm>Jin</snm><fnm>YS</fnm></au><au><snm>Passoth</snm><fnm>V</fnm></au><au><snm>Richardson</snm><fnm>PM</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2007</pubdate><volume>25</volume><fpage>319</fpage><lpage>326</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1290</pubid><pubid idtype="pmpid" link="fulltext">17334359</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Genome evolution in yeasts</p></title><aug><au><snm>Dujon</snm><fnm>B</fnm></au><au><snm>Sherman</snm><fnm>D</fnm></au><au><snm>Fischer</snm><fnm>G</fnm></au><au><snm>Durrens</snm><fnm>P</fnm></au><au><snm>Casaregola</snm><fnm>S</fnm></au><au><snm>Lafontaine</snm><fnm>I</fnm></au><au><snm>De Montigny</snm><fnm>J</fnm></au><au><snm>Marck</snm><fnm>C</fnm></au><au><snm>Neuveglise</snm><fnm>C</fnm></au><au><snm>Talla</snm><fnm>E</fnm></au><au><snm>Goffard</snm><fnm>N</fnm></au><au><snm>Frangeul</snm><fnm>L</fnm></au><au><snm>Aigle</snm><fnm>M</fnm></au><au><snm>Anthouard</snm><fnm>V</fnm></au><au><snm>Babour</snm><fnm>A</fnm></au><au><snm>Barbe</snm><fnm>V</fnm></au><au><snm>Barnay</snm><fnm>S</fnm></au><au><snm>Blanchin</snm><fnm>S</fnm></au><au><snm>Beckerich</snm><fnm>JM</fnm></au><au><snm>Beyne</snm><fnm>E</fnm></au><au><snm>Bleykasten</snm><fnm>C</fnm></au><au><snm>Boisrame</snm><fnm>A</fnm></au><au><snm>Boyer</snm><fnm>J</fnm></au><au><snm>Cattolico</snm><fnm>L</fnm></au><au><snm>Confanioleri</snm><fnm>F</fnm></au><au><snm>De Daruvar</snm><fnm>A</fnm></au><au><snm>Despons</snm><fnm>L</fnm></au><au><snm>Fabre</snm><fnm>E</fnm></au><au><snm>Fairhead</snm><fnm>C</fnm></au><au><snm>Ferry-Dumazet</snm><fnm>H</fnm></au><au><snm>Groppi</snm><fnm>A</fnm></au><au><snm>Hantraye</snm><fnm>F</fnm></au><au><snm>Hennequin</snm><fnm>C</fnm></au><au><snm>Jauniaux</snm><fnm>N</fnm></au><au><snm>Joyet</snm><fnm>P</fnm></au><au><snm>Kachouri</snm><fnm>R</fnm></au><au><snm>Kerrest</snm><fnm>A</fnm></au><au><snm>Koszul</snm><fnm>R</fnm></au><au><snm>Lemaire</snm><fnm>M</fnm></au><au><snm>Lesur</snm><fnm>I</fnm></au><au><snm>Ma</snm><fnm>L</fnm></au><au><snm>Muller</snm><fnm>H</fnm></au><au><snm>Nicaud</snm><fnm>JM</fnm></au><au><snm>Nikolski</snm><fnm>M</fnm></au><au><snm>Oztas</snm><fnm>S</fnm></au><au><snm>Ozier-Kalogeropoulos</snm><fnm>O</fnm></au><au><snm>Pellenz</snm><fnm>S</fnm></au><au><snm>Potier</snm><fnm>S</fnm></au><au><snm>Richard</snm><fnm>GF</fnm></au><au><snm>Straub</snm><fnm>ML</fnm></au><au><snm>Suleau</snm><fnm>A</fnm></au><au><snm>Swennen</snm><fnm>D</fnm></au><au><snm>Tekaia</snm><fnm>F</fnm></au><au><snm>Wesolowski-Louvel</snm><fnm>M</fnm></au><au><snm>Westhof</snm><fnm>E</fnm></au><au><snm>Wirth</snm><fnm>B</fnm></au><au><snm>Zeniou-Meyer</snm><fnm>M</fnm></au><au><snm>Zivanovic</snm><fnm>I</fnm></au><au><snm>Bolotin-Fukuhara</snm><fnm>M</fnm></au><au><snm>Thierry</snm><fnm>A</fnm></au><au><snm>Bouchier</snm><fnm>C</fnm></au><au><snm>Caudron</snm><fnm>B</fnm></au><au><snm>Scarpelli</snm><fnm>C</fnm></au><au><snm>Gaillardin</snm><fnm>C</fnm></au><au><snm>Weissenbach</snm><fnm>J</fnm></au><au><snm>Wincker</snm><fnm>P</fnm></au><au><snm>Souciet</snm><fnm>JL</fnm></au></aug><source>Nature</source><pubdate>2004</pubdate><volume>430</volume><fpage>35</fpage><lpage>44</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature02579</pubid><pubid idtype="pmpid" link="fulltext">15229592</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Evolution of pathogenicity and sexual reproduction in eight Candida genomes</p></title><aug><au><snm>Butler</snm><fnm>G</fnm></au><au><snm>Rasmussen</snm><fnm>MD</fnm></au><au><snm>Lin</snm><fnm>MF</fnm></au><au><snm>Santos</snm><fnm>MA</fnm></au><au><snm>Sakthikumar</snm><fnm>S</fnm></au><au><snm>Munro</snm><fnm>CA</fnm></au><au><snm>Rheinbay</snm><fnm>E</fnm></au><au><snm>Grabherr</snm><fnm>M</fnm></au><au><snm>Forche</snm><fnm>A</fnm></au><au><snm>Reedy</snm><fnm>JL</fnm></au><au><snm>Agrafioti</snm><fnm>I</fnm></au><au><snm>Arnaud</snm><fnm>MB</fnm></au><au><snm>Bates</snm><fnm>S</fnm></au><au><snm>Brown</snm><fnm>AJ</fnm></au><au><snm>Brunke</snm><fnm>S</fnm></au><au><snm>Costanzo</snm><fnm>MC</fnm></au><au><snm>Fitzpatrick</snm><fnm>DA</fnm></au><au><snm>de Groot</snm><fnm>PW</fnm></au><au><snm>Harris</snm><fnm>D</fnm></au><au><snm>Hoyer</snm><fnm>LL</fnm></au><au><snm>Hube</snm><fnm>B</fnm></au><au><snm>Klis</snm><fnm>FM</fnm></au><au><snm>Kodira</snm><fnm>C</fnm></au><au><snm>Lennard</snm><fnm>N</fnm></au><au><snm>Logue</snm><fnm>ME</fnm></au><au><snm>Martin</snm><fnm>R</fnm></au><au><snm>Neiman</snm><fnm>AM</fnm></au><au><snm>Nikolaou</snm><fnm>E</fnm></au><au><snm>Quail</snm><fnm>MA</fnm></au><au><snm>Quinn</snm><fnm>J</fnm></au><au><snm>Santos</snm><fnm>MC</fnm></au><au><snm>Schmitzberger</snm><fnm>FF</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au><au><snm>Shah</snm><fnm>P</fnm></au><au><snm>Silverstein</snm><fnm>KA</fnm></au><au><snm>Skrzypek</snm><fnm>MS</fnm></au><au><snm>Soll</snm><fnm>D</fnm></au><au><snm>Staggs</snm><fnm>R</fnm></au><au><snm>Stansfield</snm><fnm>I</fnm></au><au><snm>Stumpf</snm><fnm>MP</fnm></au><au><snm>Sudbery</snm><fnm>PE</fnm></au><au><snm>Srikantha</snm><fnm>T</fnm></au><au><snm>Zeng</snm><fnm>Q</fnm></au><au><snm>Berman</snm><fnm>J</fnm></au><au><snm>Berriman</snm><fnm>M</fnm></au><au><snm>Heitman</snm><fnm>J</fnm></au><au><snm>Gow</snm><fnm>NA</fnm></au><au><snm>Lorenz</snm><fnm>MC</fnm></au><au><snm>Birren</snm><fnm>BW</fnm></au><au><snm>Kellis</snm><fnm>M</fnm></au><au><snm>Cuomo</snm><fnm>CA</fnm></au></aug><source>Nature</source><pubdate>2009</pubdate><volume>4;459</volume><issue>7247</issue><fpage>657</fpage><lpage>62</lpage><xrefbib><pubid idtype="doi">10.1038/nature08064</pubid></xrefbib></bibl><bibl id="B6"><title><p>The diploid genome sequence of Candida albicans</p></title><aug><au><snm>Jones</snm><fnm>T</fnm></au><au><snm>Federspiel</snm><fnm>NA</fnm></au><au><snm>Chibana</snm><fnm>H</fnm></au><au><snm>Dungan</snm><fnm>J</fnm></au><au><snm>Kalman</snm><fnm>S</fnm></au><au><snm>Magee</snm><fnm>BB</fnm></au><au><snm>Newport</snm><fnm>G</fnm></au><au><snm>Thorstenson</snm><fnm>YR</fnm></au><au><snm>Agabian</snm><fnm>N</fnm></au><au><snm>Magee</snm><fnm>PT</fnm></au><au><snm>Davis</snm><fnm>RW</fnm></au><au><snm>Scherer</snm><fnm>S</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2004</pubdate><volume>101</volume><fpage>7329</fpage><lpage>7334</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0401648101</pubid><pubid idtype="pmcid">409918</pubid><pubid idtype="pmpid">15123810</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Comparative genomics of the fungal pathogens Candida dubliniensis and C. albicans</p></title><aug><au><snm>Jackson</snm><fnm>AP</fnm></au><au><snm>Gamble</snm><fnm>JA</fnm></au><au><snm>Yeomans</snm><fnm>T</fnm></au><au><snm>Moran</snm><fnm>GP</fnm></au><au><snm>Saunders</snm><fnm>D</fnm></au><au><snm>Harris</snm><fnm>D</fnm></au><au><snm>Aslett</snm><fnm>M</fnm></au><au><snm>Barrell</snm><fnm>JF</fnm></au><au><snm>Butler</snm><fnm>G</fnm></au><au><snm>Citiulo</snm><fnm>F</fnm></au><au><snm>Coleman</snm><fnm>DC</fnm></au><au><snm>de Groot</snm><fnm>PW</fnm></au><au><snm>Goodwin</snm><fnm>TJ</fnm></au><au><snm>Quail</snm><fnm>MA</fnm></au><au><snm>McQuillan</snm><fnm>J</fnm></au><au><snm>Munro</snm><fnm>CA</fnm></au><au><snm>Pain</snm><fnm>A</fnm></au><au><snm>Poulter</snm><fnm>RT</fnm></au><au><snm>Rajandream</snm><fnm>MA</fnm></au><au><snm>Renauld</snm><fnm>H</fnm></au><au><snm>Spiering</snm><fnm>MJ</fnm></au><au><snm>Tivey</snm><fnm>A</fnm></au><au><snm>Gow</snm><fnm>NA</fnm></au><au><snm>Barrell</snm><fnm>B</fnm></au><au><snm>Sullivan</snm><fnm>DJ</fnm></au><au><snm>Berriman</snm><fnm>M</fnm></au></aug><source>Genome Res</source><pubdate>2009</pubdate><inpress/><xrefbib><pubidlist><pubid idtype="pmcid">2792176</pubid><pubid idtype="pmpid" link="fulltext">19745113</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>A Fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis</p></title><aug><au><snm>Fitzpatrick</snm><fnm>DA</fnm></au><au><snm>Logue</snm><fnm>ME</fnm></au><au><snm>Stajich</snm><fnm>JE</fnm></au><au><snm>Butler</snm><fnm>G</fnm></au></aug><source>BMC Evol Biol</source><pubdate>2006</pubdate><volume>6</volume><fpage>99</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2148-6-99</pubid><pubid idtype="pmcid">1679813</pubid><pubid idtype="pmpid">17121679</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Non-universal usage of the leucine CUG codon and the molecular phylogeny of the genus Candida</p></title><aug><au><snm>Sugita</snm><fnm>T</fnm></au><au><snm>Nakase</snm><fnm>T</fnm></au></aug><source>Syst Appl Microbiol</source><pubdate>1999</pubdate><volume>22</volume><fpage>79</fpage><lpage>86</lpage><xrefbib><pubid idtype="pmpid">10188281</pubid></xrefbib></bibl><bibl id="B10"><title><p>Evidence of recent interkingdom horizontal gene transfer between bacteria and Candida parapsilosis</p></title><aug><au><snm>Fitzpatrick</snm><fnm>DA</fnm></au><au><snm>Logue</snm><fnm>ME</fnm></au><au><snm>Butler</snm><fnm>G</fnm></au></aug><source>BMC Evol Biol</source><pubdate>2008</pubdate><volume>8</volume><fpage>181</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2148-8-181</pubid><pubid idtype="pmcid">2459174</pubid><pubid idtype="pmpid">18577206</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>The hyphal-associated adhesin and invasin Als3 of <it>Candida albicans </it>mediates iron acquisition from host ferritin</p></title><aug><au><snm>Almeida</snm><fnm>RS</fnm></au><au><snm>Brunke</snm><fnm>S</fnm></au><au><snm>Albrecht</snm><fnm>A</fnm></au><au><snm>Thewes</snm><fnm>S</fnm></au><au><snm>Laue</snm><fnm>M</fnm></au><au><snm>Edwards</snm><fnm>JE</fnm></au><au><snm>Filler</snm><fnm>SG</fnm></au><au><snm>Hube</snm><fnm>B</fnm></au></aug><source>PLoS Pathog</source><pubdate>2008</pubdate><volume>4</volume><fpage>e1000217</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.ppat.1000217</pubid><pubid idtype="pmcid">2581891</pubid><pubid idtype="pmpid">19023418</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Discovering the secrets of the <it>Candida albicans </it>agglutinin-like sequence (ALS) gene family--a sticky pursuit</p></title><aug><au><snm>Hoyer</snm><fnm>LL</fnm></au><au><snm>Green</snm><fnm>CB</fnm></au><au><snm>Oh</snm><fnm>SH</fnm></au><au><snm>Zhao</snm><fnm>X</fnm></au></aug><source>Med Mycol</source><pubdate>2008</pubdate><volume>46</volume><fpage>1</fpage><lpage>15</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1080/13693780701435317</pubid><pubid idtype="pmcid">2742883</pubid><pubid idtype="pmpid">17852717</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Temporal analysis of <it>Candida albicans </it>gene expression during biofilm development</p></title><aug><au><snm>Yeater</snm><fnm>KM</fnm></au><au><snm>Chandra</snm><fnm>J</fnm></au><au><snm>Cheng</snm><fnm>G</fnm></au><au><snm>Mukherjee</snm><fnm>PK</fnm></au><au><snm>Zhao</snm><fnm>X</fnm></au><au><snm>Rodriguez-Zas</snm><fnm>SL</fnm></au><au><snm>Kwast</snm><fnm>KE</fnm></au><au><snm>Ghannoum</snm><fnm>MA</fnm></au><au><snm>Hoyer</snm><fnm>LL</fnm></au></aug><source>Microbiology</source><pubdate>2007</pubdate><volume>153</volume><fpage>2373</fpage><lpage>2385</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1099/mic.0.2007/006163-0</pubid><pubid idtype="pmpid" link="fulltext">17660402</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Genome-wide identification of fungal GPI proteins</p></title><aug><au><snm>De Groot</snm><fnm>PW</fnm></au><au><snm>Hellingwerf</snm><fnm>KJ</fnm></au><au><snm>Klis</snm><fnm>FM</fnm></au></aug><source>Yeast</source><pubdate>2003</pubdate><volume>20</volume><fpage>781</fpage><lpage>796</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/yea.1007</pubid><pubid idtype="pmpid" link="fulltext">12845604</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>The <it>Candida albicans HYR1 </it>gene, which is activated in response to hyphal development, belongs to a gene family encoding yeast cell wall proteins</p></title><aug><au><snm>Bailey</snm><fnm>DA</fnm></au><au><snm>Feldmann</snm><fnm>PJ</fnm></au><au><snm>Bovey</snm><fnm>M</fnm></au><au><snm>Gow</snm><fnm>NA</fnm></au><au><snm>Brown</snm><fnm>AJ</fnm></au></aug><source>J Bacteriol</source><pubdate>1996</pubdate><volume>178</volume><fpage>5353</fpage><lpage>5360</lpage><xrefbib><pubidlist><pubid idtype="pmcid">178351</pubid><pubid idtype="pmpid">8808922</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>CandidaDB: a multi-genome database for Candida species and related Saccharomycotina</p></title><aug><au><snm>Rossignol</snm><fnm>T</fnm></au><au><snm>Lechat</snm><fnm>P</fnm></au><au><snm>Cuomo</snm><fnm>C</fnm></au><au><snm>Zeng</snm><fnm>Q</fnm></au><au><snm>Moszer</snm><fnm>I</fnm></au><au><snm>d&apos;Enfert</snm><fnm>C</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>D557</fpage><lpage>561</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkm1010</pubid><pubid idtype="pmcid">2238939</pubid><pubid idtype="pmpid">18039716</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Sequence resources at the Candida Genome Database</p></title><aug><au><snm>Arnaud</snm><fnm>MB</fnm></au><au><snm>Costanzo</snm><fnm>MC</fnm></au><au><snm>Skrzypek</snm><fnm>MS</fnm></au><au><snm>Shah</snm><fnm>P</fnm></au><au><snm>Binkley</snm><fnm>G</fnm></au><au><snm>Lane</snm><fnm>C</fnm></au><au><snm>Miyasato</snm><fnm>SR</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><fpage>D452</fpage><lpage>456</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl899</pubid><pubid idtype="pmcid">1669745</pubid><pubid idtype="pmpid">17090582</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>CGOB</p></title><url>http://cgob.ucd.ie/</url></bibl><bibl id="B19"><title><p>Visualizing syntenic relationships among the hemiascomycetes with the Yeast Gene Order Browser</p></title><aug><au><snm>Byrne</snm><fnm>KP</fnm></au><au><snm>Wolfe</snm><fnm>KH</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2006</pubdate><volume>34</volume><fpage>D452</fpage><lpage>455</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkj041</pubid><pubid idtype="pmcid">1347404</pubid><pubid idtype="pmpid">16381909</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species</p></title><aug><au><snm>Byrne</snm><fnm>KP</fnm></au><au><snm>Wolfe</snm><fnm>KH</fnm></au></aug><source>Genome Res</source><pubdate>2005</pubdate><volume>15</volume><fpage>1456</fpage><lpage>1461</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.3672305</pubid><pubid idtype="pmcid">1240090</pubid><pubid idtype="pmpid">16169922</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Gene Ontology annotations at SGD: new data sources and annotation methods</p></title><aug><au><snm>Hong</snm><fnm>EL</fnm></au><au><snm>Balakrishnan</snm><fnm>R</fnm></au><au><snm>Dong</snm><fnm>Q</fnm></au><au><snm>Christie</snm><fnm>KR</fnm></au><au><snm>Park</snm><fnm>J</fnm></au><au><snm>Binkley</snm><fnm>G</fnm></au><au><snm>Costanzo</snm><fnm>MC</fnm></au><au><snm>Dwight</snm><fnm>SS</fnm></au><au><snm>Engel</snm><fnm>SR</fnm></au><au><snm>Fisk</snm><fnm>DG</fnm></au><au><snm>Hirschman</snm><fnm>JE</fnm></au><au><snm>Hitz</snm><fnm>BC</fnm></au><au><snm>Krieger</snm><fnm>CJ</fnm></au><au><snm>Livstone</snm><fnm>MS</fnm></au><au><snm>Miyasato</snm><fnm>SR</fnm></au><au><snm>Nash</snm><fnm>RS</fnm></au><au><snm>Oughtred</snm><fnm>R</fnm></au><au><snm>Skrzypek</snm><fnm>MS</fnm></au><au><snm>Weng</snm><fnm>S</fnm></au><au><snm>Wong</snm><fnm>ED</fnm></au><au><snm>Zhu</snm><fnm>KK</fnm></au><au><snm>Dolinski</snm><fnm>K</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Cherry</snm><fnm>JM</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>D577</fpage><lpage>581</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkm909</pubid><pubid idtype="pmcid">2238894</pubid><pubid idtype="pmpid">17982175</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Identification of peroxisomal acyl-CoA thioesterases in yeast and humans</p></title><aug><au><snm>Jones</snm><fnm>JM</fnm></au><au><snm>Nau</snm><fnm>K</fnm></au><au><snm>Geraghty</snm><fnm>MT</fnm></au><au><snm>Erdmann</snm><fnm>R</fnm></au><au><snm>Gould</snm><fnm>SJ</fnm></au></aug><source>J Biol Chem</source><pubdate>1999</pubdate><volume>274</volume><fpage>9216</fpage><lpage>9223</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1074/jbc.274.14.9216</pubid><pubid idtype="pmpid" link="fulltext">10092594</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Transcriptional response of Candida albicans upon internalization by macrophages</p></title><aug><au><snm>Lorenz</snm><fnm>MC</fnm></au><au><snm>Bender</snm><fnm>JA</fnm></au><au><snm>Fink</snm><fnm>GR</fnm></au></aug><source>Eukaryot Cell</source><pubdate>2004</pubdate><volume>3</volume><fpage>1076</fpage><lpage>1087</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/EC.3.5.1076-1087.2004</pubid><pubid idtype="pmcid">522606</pubid><pubid idtype="pmpid">15470236</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Peroxisomal fatty acid beta-oxidation is not essential for virulence of Candida albicans</p></title><aug><au><snm>Piekarska</snm><fnm>K</fnm></au><au><snm>Mol</snm><fnm>E</fnm></au><au><snm>Berg</snm><mnm>van den</mnm><fnm>M</fnm></au><au><snm>Hardy</snm><fnm>G</fnm></au><au><snm>Burg</snm><mnm>van den</mnm><fnm>J</fnm></au><au><snm>van Roermund</snm><fnm>C</fnm></au><au><snm>MacCallum</snm><fnm>D</fnm></au><au><snm>Odds</snm><fnm>F</fnm></au><au><snm>Distel</snm><fnm>B</fnm></au></aug><source>Eukaryot Cell</source><pubdate>2006</pubdate><volume>5</volume><fpage>1847</fpage><lpage>1856</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/EC.00093-06</pubid><pubid idtype="pmcid">1694795</pubid><pubid idtype="pmpid">16963628</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Ammonia pulses and metabolic oscillations guide yeast colony development</p></title><aug><au><snm>Palkova</snm><fnm>Z</fnm></au><au><snm>Devaux</snm><fnm>F</fnm></au><au><snm>Icicova</snm><fnm>M</fnm></au><au><snm>Minarikova</snm><fnm>L</fnm></au><au><snm>Le Crom</snm><fnm>S</fnm></au><au><snm>Jacq</snm><fnm>C</fnm></au></aug><source>Mol Biol Cell</source><pubdate>2002</pubdate><volume>13</volume><fpage>3901</fpage><lpage>3914</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1091/mbc.E01-12-0149</pubid><pubid idtype="pmcid">133602</pubid><pubid idtype="pmpid">12429834</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Pichia stipitis genomics, transcriptomics, and gene clusters</p></title><aug><au><snm>Jeffries</snm><fnm>TW</fnm></au><au><snm>Van Vleet</snm><fnm>JR</fnm></au></aug><source>FEMS Yeast Res</source><pubdate>2009</pubdate><volume>9</volume><fpage>793</fpage><lpage>807</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1567-1364.2009.00525.x</pubid><pubid idtype="pmcid">2784038</pubid><pubid idtype="pmpid">19659741</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Recurrent tandem gene duplication gave rise to functionally divergent genes in Drosophila</p></title><aug><au><snm>Fan</snm><fnm>C</fnm></au><au><snm>Chen</snm><fnm>Y</fnm></au><au><snm>Long</snm><fnm>M</fnm></au></aug><source>Mol Biol Evol</source><pubdate>2008</pubdate><volume>25</volume><fpage>1451</fpage><lpage>1458</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/molbev/msn089</pubid><pubid idtype="pmcid">2878002</pubid><pubid idtype="pmpid" link="fulltext">18408233</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Duplicated paralogous genes subject to positive selection in the genome of Trypanosoma brucei</p></title><aug><au><snm>Emes</snm><fnm>RD</fnm></au><au><snm>Yang</snm><fnm>Z</fnm></au></aug><source>PLoS ONE</source><pubdate>2008</pubdate><volume>3</volume><fpage>e2295</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0002295</pubid><pubid idtype="pmcid">2386149</pubid><pubid idtype="pmpid">18509460</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Transcription profiling of Candida albicans cells undergoing the yeast-to-hyphal transition</p></title><aug><au><snm>Nantel</snm><fnm>A</fnm></au><au><snm>Dignard</snm><fnm>D</fnm></au><au><snm>Bachewich</snm><fnm>C</fnm></au><au><snm>Harcus</snm><fnm>D</fnm></au><au><snm>Marcil</snm><fnm>A</fnm></au><au><snm>Bouin</snm><fnm>AP</fnm></au><au><snm>Sensen</snm><fnm>CW</fnm></au><au><snm>Hogues</snm><fnm>H</fnm></au><au><snm>van het Hoog</snm><fnm>M</fnm></au><au><snm>Gordon</snm><fnm>P</fnm></au><au><snm>Rigby</snm><fnm>T</fnm></au><au><snm>Benoit</snm><fnm>F</fnm></au><au><snm>Tessier</snm><fnm>DC</fnm></au><au><snm>Thomas</snm><fnm>DY</fnm></au><au><snm>Whiteway</snm><fnm>M</fnm></au></aug><source>Mol Biol Cell</source><pubdate>2002</pubdate><volume>13</volume><fpage>3452</fpage><lpage>3465</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1091/mbc.E02-05-0272</pubid><pubid idtype="pmcid">129958</pubid><pubid idtype="pmpid">12388749</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Gene duplication and the structure of eukaryotic genomes</p></title><aug><au><snm>Friedman</snm><fnm>R</fnm></au><au><snm>Hughes</snm><fnm>AL</fnm></au></aug><source>Genome Res</source><pubdate>2001</pubdate><volume>11</volume><fpage>373</fpage><lpage>381</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.155801</pubid><pubid idtype="pmcid">311031</pubid><pubid idtype="pmpid">11230161</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>A human-curated annotation of the Candida albicans genome</p></title><aug><au><snm>Braun</snm><fnm>BR</fnm></au><au><snm>van Het Hoog</snm><fnm>M</fnm></au><au><snm>d&apos;Enfert</snm><fnm>C</fnm></au><au><snm>Martchenko</snm><fnm>M</fnm></au><au><snm>Dungan</snm><fnm>J</fnm></au><au><snm>Kuo</snm><fnm>A</fnm></au><au><snm>Inglis</snm><fnm>DO</fnm></au><au><snm>Uhl</snm><fnm>MA</fnm></au><au><snm>Hogues</snm><fnm>H</fnm></au><au><snm>Berriman</snm><fnm>M</fnm></au><au><snm>Lorenz</snm><fnm>M</fnm></au><au><snm>Levitin</snm><fnm>A</fnm></au><au><snm>Oberholzer</snm><fnm>U</fnm></au><au><snm>Bachewich</snm><fnm>C</fnm></au><au><snm>Harcus</snm><fnm>D</fnm></au><au><snm>Marcil</snm><fnm>A</fnm></au><au><snm>Dignard</snm><fnm>D</fnm></au><au><snm>Iouk</snm><fnm>T</fnm></au><au><snm>Zito</snm><fnm>R</fnm></au><au><snm>Frangeul</snm><fnm>L</fnm></au><au><snm>Tekaia</snm><fnm>F</fnm></au><au><snm>Rutherford</snm><fnm>K</fnm></au><au><snm>Wang</snm><fnm>E</fnm></au><au><snm>Munro</snm><fnm>CA</fnm></au><au><snm>Bates</snm><fnm>S</fnm></au><au><snm>Gow</snm><fnm>NA</fnm></au><au><snm>Hoyer</snm><fnm>LL</fnm></au><au><snm>Kohler</snm><fnm>G</fnm></au><au><snm>Morschhauser</snm><fnm>J</fnm></au><au><snm>Newport</snm><fnm>G</fnm></au><au><snm>Znaidi</snm><fnm>S</fnm></au><au><snm>Raymond</snm><fnm>M</fnm></au><au><snm>Turcotte</snm><fnm>B</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au><au><snm>Costanzo</snm><fnm>M</fnm></au><au><snm>Ihmels</snm><fnm>J</fnm></au><au><snm>Berman</snm><fnm>J</fnm></au><au><snm>Sanglard</snm><fnm>D</fnm></au><au><snm>Agabian</snm><fnm>N</fnm></au><au><snm>Mitchell</snm><fnm>AP</fnm></au><au><snm>Johnson</snm><fnm>AD</fnm></au><au><snm>Whiteway</snm><fnm>M</fnm></au><au><snm>Nantel</snm><fnm>A</fnm></au></aug><source>PLoS Genet</source><pubdate>2005</pubdate><volume>1</volume><fpage>36</fpage><lpage>57</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.0010001</pubid><pubid idtype="pmcid">1183520</pubid><pubid idtype="pmpid">16103911</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Molecular evidence for an ancient duplication of the entire yeast genome</p></title><aug><au><snm>Wolfe</snm><fnm>KH</fnm></au><au><snm>Shields</snm><fnm>DC</fnm></au></aug><source>Nature</source><pubdate>1997</pubdate><volume>387</volume><fpage>708</fpage><lpage>713</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/42711</pubid><pubid idtype="pmpid" link="fulltext">9192896</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Comparative analysis indicates regulatory neofunctionalization of yeast duplicates</p></title><aug><au><snm>Tirosh</snm><fnm>I</fnm></au><au><snm>Barkai</snm><fnm>N</fnm></au></aug><source>Genome Biol</source><pubdate>2007</pubdate><volume>8</volume><fpage>R50</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2007-8-4-r50</pubid><pubid idtype="pmcid">1895995</pubid><pubid idtype="pmpid">17411427</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Molecular evolution of eukaryotic genomes: hemiascomycetous yeast spliceosomal introns</p></title><aug><au><snm>Bon</snm><fnm>E</fnm></au><au><snm>Casaregola</snm><fnm>S</fnm></au><au><snm>Blandin</snm><fnm>G</fnm></au><au><snm>Llorente</snm><fnm>B</fnm></au><au><snm>Neuveglise</snm><fnm>C</fnm></au><au><snm>Munsterkotter</snm><fnm>M</fnm></au><au><snm>Guldener</snm><fnm>U</fnm></au><au><snm>Mewes</snm><fnm>HW</fnm></au><au><snm>Van Helden</snm><fnm>J</fnm></au><au><snm>Dujon</snm><fnm>B</fnm></au><au><snm>Gaillardin</snm><fnm>C</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2003</pubdate><volume>31</volume><fpage>1121</fpage><lpage>1135</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg213</pubid><pubid idtype="pmcid">150231</pubid><pubid idtype="pmpid">12582231</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>A role for reverse transcripts in gene conversion</p></title><aug><au><snm>Derr</snm><fnm>LK</fnm></au><au><snm>Strathern</snm><fnm>JN</fnm></au></aug><source>Nature</source><pubdate>1993</pubdate><volume>361</volume><fpage>170</fpage><lpage>173</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/361170a0</pubid><pubid idtype="pmpid" link="fulltext">8380627</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Evidence of mRNA-mediated intron loss in the human-pathogenic fungus Cryptococcus neoformans</p></title><aug><au><snm>Stajich</snm><fnm>JE</fnm></au><au><snm>Dietrich</snm><fnm>FS</fnm></au></aug><source>Eukaryot Cell</source><pubdate>2006</pubdate><volume>5</volume><fpage>789</fpage><lpage>793</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/EC.5.5.789-793.2006</pubid><pubid idtype="pmcid">1459680</pubid><pubid idtype="pmpid">16682456</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Computational and experimental approaches double the number of known introns in the pathogenic yeast <it>Candida albicans</it></p></title><aug><au><snm>Mitrovich</snm><fnm>QM</fnm></au><au><snm>Tuch</snm><fnm>BB</fnm></au><au><snm>Guthrie</snm><fnm>C</fnm></au><au><snm>Johnson</snm><fnm>AD</fnm></au></aug><source>Genome Res</source><pubdate>2007</pubdate><volume>17</volume><fpage>492</fpage><lpage>502</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.6111907</pubid><pubid idtype="pmcid">1832096</pubid><pubid idtype="pmpid">17351132</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Identification and phylogenetic analysis of a glucose transporter gene family from the human pathogenic yeast Candida albicans</p></title><aug><au><snm>Fan</snm><fnm>J</fnm></au><au><snm>Chaturvedi</snm><fnm>V</fnm></au><au><snm>Shen</snm><fnm>SH</fnm></au></aug><source>J Mol Evol</source><pubdate>2002</pubdate><volume>55</volume><fpage>336</fpage><lpage>346</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s00239-002-2330-4</pubid><pubid idtype="pmpid" link="fulltext">12187386</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>On the incidence of intron loss and gain in paralogous gene families</p></title><aug><au><snm>Roy</snm><fnm>SW</fnm></au><au><snm>Penny</snm><fnm>D</fnm></au></aug><source>Mol Biol Evol</source><pubdate>2007</pubdate><volume>24</volume><fpage>1579</fpage><lpage>1581</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/molbev/msm082</pubid><pubid idtype="pmpid" link="fulltext">17470438</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>Candida albicans Ecm33p is important for normal cell wall architecture and interactions with host cells</p></title><aug><au><snm>Martinez-Lopez</snm><fnm>R</fnm></au><au><snm>Park</snm><fnm>H</fnm></au><au><snm>Myers</snm><fnm>CL</fnm></au><au><snm>Gil</snm><fnm>C</fnm></au><au><snm>Filler</snm><fnm>SG</fnm></au></aug><source>Eukaryot Cell</source><pubdate>2006</pubdate><volume>5</volume><fpage>140</fpage><lpage>147</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/EC.5.1.140-147.2006</pubid><pubid idtype="pmcid">1360258</pubid><pubid idtype="pmpid">16400176</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>Operons as a common form of chromosomal organization in C. elegans</p></title><aug><au><snm>Zorio</snm><fnm>DA</fnm></au><au><snm>Cheng</snm><fnm>NN</fnm></au><au><snm>Blumenthal</snm><fnm>T</fnm></au><au><snm>Spieth</snm><fnm>J</fnm></au></aug><source>Nature</source><pubdate>1994</pubdate><volume>372</volume><fpage>270</fpage><lpage>272</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/372270a0</pubid><pubid idtype="pmpid" link="fulltext">7969472</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>Gene clusters and polycistronic transcription in eukaryotes</p></title><aug><au><snm>Blumenthal</snm><fnm>T</fnm></au></aug><source>Bioessays</source><pubdate>1998</pubdate><volume>20</volume><fpage>480</fpage><lpage>487</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/(SICI)1521-1878(199806)20:6&lt;480::AID-BIES6&gt;3.0.CO;2-Q</pubid><pubid idtype="pmpid" link="fulltext">9699460</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>A global analysis of Caenorhabditis elegans operons</p></title><aug><au><snm>Blumenthal</snm><fnm>T</fnm></au><au><snm>Evans</snm><fnm>D</fnm></au><au><snm>Link</snm><fnm>CD</fnm></au><au><snm>Guffanti</snm><fnm>A</fnm></au><au><snm>Lawson</snm><fnm>D</fnm></au><au><snm>Thierry-Mieg</snm><fnm>J</fnm></au><au><snm>Thierry-Mieg</snm><fnm>D</fnm></au><au><snm>Chiu</snm><fnm>WL</fnm></au><au><snm>Duke</snm><fnm>K</fnm></au><au><snm>Kiraly</snm><fnm>M</fnm></au><au><snm>Kim</snm><fnm>SK</fnm></au></aug><source>Nature</source><pubdate>2002</pubdate><volume>417</volume><fpage>851</fpage><lpage>854</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature00831</pubid><pubid idtype="pmpid" link="fulltext">12075352</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>Metabolic Pathway Gene Clusters in Filamentous Fungi</p></title><aug><au><snm>Keller</snm><fnm>NP</fnm></au><au><snm>Hohn</snm><fnm>TM</fnm></au></aug><source>Fungal Genet Biol</source><pubdate>1997</pubdate><volume>21</volume><fpage>17</fpage><lpage>29</lpage><xrefbib><pubid idtype="doi">10.1006/fgbi.1997.0970</pubid></xrefbib></bibl><bibl id="B45"><title><p>Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts</p></title><aug><au><snm>Hittinger</snm><fnm>CT</fnm></au><au><snm>Rokas</snm><fnm>A</fnm></au><au><snm>Carroll</snm><fnm>SB</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2004</pubdate><volume>101</volume><fpage>14144</fpage><lpage>14149</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0404319101</pubid><pubid idtype="pmcid">521130</pubid><pubid idtype="pmpid">15381776</pubid></pubidlist></xrefbib></bibl><bibl id="B46"><title><p>Birth of a metabolic gene cluster in yeast by adaptive gene relocation</p></title><aug><au><snm>Wong</snm><fnm>S</fnm></au><au><snm>Wolfe</snm><fnm>KH</fnm></au></aug><source>Nat Genet</source><pubdate>2005</pubdate><volume>37</volume><fpage>777</fpage><lpage>782</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1584</pubid><pubid idtype="pmpid" link="fulltext">15951822</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>Genomic gene clustering analysis of pathways in eukaryotes</p></title><aug><au><snm>Lee</snm><fnm>JM</fnm></au><au><snm>Sonnhammer</snm><fnm>EL</fnm></au></aug><source>Genome Res</source><pubdate>2003</pubdate><volume>13</volume><fpage>875</fpage><lpage>882</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.737703</pubid><pubid idtype="pmcid">430880</pubid><pubid idtype="pmpid">12695325</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>The mechanism of biotin-dependent enzymes</p></title><aug><au><snm>Knowles</snm><fnm>JR</fnm></au></aug><source>Annu Rev Biochem</source><pubdate>1989</pubdate><volume>58</volume><fpage>195</fpage><lpage>221</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1146/annurev.bi.58.070189.001211</pubid><pubid idtype="pmpid" link="fulltext">2673009</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>The Reacquisition of Biotin Prototrophy in Saccharomyces cerevisiae Involved Horizontal Gene Transfer, Gene Duplication and Gene Clustering</p></title><aug><au><snm>Hall</snm><fnm>C</fnm></au><au><snm>Dietrich</snm><fnm>FS</fnm></au></aug><source>Genetics</source><pubdate>2007</pubdate><volume>177</volume><fpage>2293</fpage><lpage>2307</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1534/genetics.107.074963</pubid><pubid idtype="pmcid">2219469</pubid><pubid idtype="pmpid">18073433</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>[Biotin formation by the fungus Rhizopus delemar]</p></title><aug><au><snm>Shchelokova</snm><fnm>EV</fnm></au><au><snm>Vorob&apos;eva</snm><fnm>LI</fnm></au></aug><source>Prikl Biokhim Mikrobiol</source><pubdate>1982</pubdate><volume>18</volume><fpage>630</fpage><lpage>635</lpage><xrefbib><pubid idtype="pmpid">7145874</pubid></xrefbib></bibl><bibl id="B51"><title><p>Comparative genomics using Candida albicans DNA microarrays reveals absence and divergence of virulence-associated genes in Candida dubliniensis</p></title><aug><au><snm>Moran</snm><fnm>G</fnm></au><au><snm>Stokes</snm><fnm>C</fnm></au><au><snm>Thewes</snm><fnm>S</fnm></au><au><snm>Hube</snm><fnm>B</fnm></au><au><snm>Coleman</snm><fnm>DC</fnm></au><au><snm>Sullivan</snm><fnm>D</fnm></au></aug><source>Microbiology</source><pubdate>2004</pubdate><volume>150</volume><fpage>3363</fpage><lpage>3382</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1099/mic.0.27221-0</pubid><pubid idtype="pmpid" link="fulltext">15470115</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>Characterization of the biotin biosynthesis pathway in Saccharomyces cerevisiae and evidence for a cluster containing BIO5, a novel gene involved in vitamer uptake</p></title><aug><au><snm>Phalip</snm><fnm>V</fnm></au><au><snm>Kuhn</snm><fnm>I</fnm></au><au><snm>Lemoine</snm><fnm>Y</fnm></au><au><snm>Jeltsch</snm><fnm>JM</fnm></au></aug><source>Gene</source><pubdate>1999</pubdate><volume>232</volume><fpage>43</fpage><lpage>51</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0378-1119(99)00117-1</pubid><pubid idtype="pmpid" link="fulltext">10333520</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>The inducible N-acetylglucosamine catabolic pathway gene cluster in Candida albicans: discrete N-acetylglucosamine-inducible factors interact at the promoter of NAG1</p></title><aug><au><snm>Kumar</snm><fnm>MJ</fnm></au><au><snm>Jamaluddin</snm><fnm>MS</fnm></au><au><snm>Natarajan</snm><fnm>K</fnm></au><au><snm>Kaur</snm><fnm>D</fnm></au><au><snm>Datta</snm><fnm>A</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2000</pubdate><volume>97</volume><fpage>14218</fpage><lpage>14223</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.250452997</pubid><pubid idtype="pmcid">18898</pubid><pubid idtype="pmpid">11114181</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>Attenuation of virulence and changes in morphology in Candida albicans by disruption of the N-acetylglucosamine catabolic pathway</p></title><aug><au><snm>Singh</snm><fnm>P</fnm></au><au><snm>Ghosh</snm><fnm>S</fnm></au><au><snm>Datta</snm><fnm>A</fnm></au></aug><source>Infect Immun</source><pubdate>2001</pubdate><volume>69</volume><fpage>7898</fpage><lpage>7903</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/IAI.69.12.7898-7903.2001</pubid><pubid idtype="pmcid">98888</pubid><pubid idtype="pmpid">11705974</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>Review: the dominant flocculation genes of Saccharomyces cerevisiae constitute a new subtelomeric gene family</p></title><aug><au><snm>Teunissen</snm><fnm>AW</fnm></au><au><snm>Steensma</snm><fnm>HY</fnm></au></aug><source>Yeast</source><pubdate>1995</pubdate><volume>11</volume><fpage>1001</fpage><lpage>1013</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/yea.320111102</pubid><pubid idtype="pmpid">7502576</pubid></pubidlist></xrefbib></bibl><bibl id="B56"><title><p>Two membrane proteins located in the Nag regulon of Candida albicans confer multidrug resistance</p></title><aug><au><snm>Sengupta</snm><fnm>M</fnm></au><au><snm>Datta</snm><fnm>A</fnm></au></aug><source>Biochem Biophys Res Commun</source><pubdate>2003</pubdate><volume>301</volume><fpage>1099</fpage><lpage>1108</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0006-291X(03)00094-9</pubid><pubid idtype="pmpid" link="fulltext">12589826</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>Characterization of the CaNAG3, CaNAG4, and CaNAG6 genes of the pathogenic fungus Candida albicans: possible involvement of these genes in the susceptibilities of cytotoxic agents</p></title><aug><au><snm>Yamada-Okabe</snm><fnm>T</fnm></au><au><snm>Yamada-Okabe</snm><fnm>H</fnm></au></aug><source>FEMS Microbiol Lett</source><pubdate>2002</pubdate><volume>212</volume><fpage>15</fpage><lpage>21</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1574-6968.2002.tb11238.x</pubid><pubid idtype="pmpid" link="fulltext">12076781</pubid></pubidlist></xrefbib></bibl><bibl id="B58"><title><p>The enzymatic transformation of uridine diphosphate glucose into a galactose derivative</p></title><aug><au><snm>Leloir</snm><fnm>LF</fnm></au></aug><source>Arch Biochem</source><pubdate>1951</pubdate><volume>33</volume><fpage>186</fpage><lpage>190</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0003-9861(51)90096-3</pubid><pubid idtype="pmpid">14885999</pubid></pubidlist></xrefbib></bibl><bibl id="B59"><title><p>Transcriptional rewiring of fungal galactose-metabolism circuitry</p></title><aug><au><snm>Martchenko</snm><fnm>M</fnm></au><au><snm>Levitin</snm><fnm>A</fnm></au><au><snm>Hogues</snm><fnm>H</fnm></au><au><snm>Nantel</snm><fnm>A</fnm></au><au><snm>Whiteway</snm><fnm>M</fnm></au></aug><source>Curr Biol</source><pubdate>2007</pubdate><volume>17</volume><fpage>1007</fpage><lpage>1013</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cub.2007.05.017</pubid><pubid idtype="pmpid" link="fulltext">17540568</pubid></pubidlist></xrefbib></bibl><bibl id="B60"><title><p>Identification and characterization of five new subunits of TRAPP</p></title><aug><au><snm>Sacher</snm><fnm>M</fnm></au><au><snm>Barrowman</snm><fnm>J</fnm></au><au><snm>Schieltz</snm><fnm>D</fnm></au><au><snm>Yates</snm><fnm>JR</fnm></au><au><snm>Ferro-Novick</snm><fnm>S</fnm></au></aug><source>Eur J Cell Biol</source><pubdate>2000</pubdate><volume>79</volume><fpage>71</fpage><lpage>80</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1078/S0171-9335(04)70009-6</pubid><pubid idtype="pmpid">10727015</pubid></pubidlist></xrefbib></bibl><bibl id="B61"><title><p>KEGG: Kyoto Encyclopedia of Genes and Genomes</p></title><aug><au><snm>Ogata</snm><fnm>H</fnm></au><au><snm>Goto</snm><fnm>S</fnm></au><au><snm>Sato</snm><fnm>K</fnm></au><au><snm>Fujibuchi</snm><fnm>W</fnm></au><au><snm>Bono</snm><fnm>H</fnm></au><au><snm>Kanehisa</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1999</pubdate><volume>27</volume><fpage>29</fpage><lpage>34</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/27.1.29</pubid><pubid idtype="pmcid">148090</pubid><pubid idtype="pmpid">9847135</pubid></pubidlist></xrefbib></bibl><bibl id="B62"><title><p>The Candida Genome Database</p></title><url>http://www.candidagenome.org</url></bibl><bibl id="B63"><title><p>The Candida group at the Broad Institute</p></title><url>http://www.broad.mit.edu/annotation/genome/candida_group/MultiHome.html</url></bibl><bibl id="B64"><title><p>GeneDB</p></title><url>http://www.genedb.org</url></bibl><bibl id="B65"><title><p>The Wellcome Trust Sanger Institute</p></title><url>http://www.sanger.ac.uk/</url></bibl><bibl id="B66"><title><p>MUSCLE: multiple sequence alignment with high accuracy and high throughput</p></title><aug><au><snm>Edgar</snm><fnm>RC</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2004</pubdate><volume>32</volume><fpage>1792</fpage><lpage>1797</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkh340</pubid><pubid idtype="pmcid">390337</pubid><pubid idtype="pmpid">15034147</pubid></pubidlist></xrefbib></bibl><bibl id="B67"><title><p>Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models</p></title><aug><au><snm>Yang</snm><fnm>Z</fnm></au><au><snm>Nielsen</snm><fnm>R</fnm></au></aug><source>Molecular Biology and Evolution</source><pubdate>2000</pubdate><volume>17</volume><fpage>32</fpage><lpage>43</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">10666704</pubid></xrefbib></bibl><bibl id="B68"><title><p>PAML: a program package for phylogenetic analysis by maximum likelihood</p></title><aug><au><snm>Yang</snm><fnm>Z</fnm></au></aug><source>Computer Applications in the Biosciences: Cabios</source><pubdate>1997</pubdate><volume>13</volume><fpage>555</fpage><lpage>556</lpage><xrefbib><pubid idtype="pmpid">9367129</pubid></xrefbib></bibl><bibl id="B69"><title><p>KAAS: an automatic genome annotation and pathway reconstruction server</p></title><aug><au><snm>Moriya</snm><fnm>Y</fnm></au><au><snm>Itoh</snm><fnm>M</fnm></au><au><snm>Okuda</snm><fnm>S</fnm></au><au><snm>Yoshizawa</snm><fnm>AC</fnm></au><au><snm>Kanehisa</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><fpage>W182</fpage><lpage>185</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkm321</pubid><pubid idtype="pmcid">1933193</pubid><pubid idtype="pmpid">17526522</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>