<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-10-r223</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Comparative genomic analysis of fungal genomes reveals intron-rich ancestors</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Stajich</snm>
               <mi>E</mi>
               <fnm>Jason</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jason_stajich@berkeley.edu</email>
            </au>
            <au id="A2">
               <snm>Dietrich</snm>
               <mi>S</mi>
               <fnm>Fred</fnm>
               <insr iid="I1"/>
               <email>dietr003@mc.duke.edu</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Roy</snm>
               <mi>W</mi>
               <fnm>Scott</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>royscott@ncbi.nlm.nih.gov</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Molecular Genetics and Microbiology, Center for Genome Technology, Institute for Genome Science and Policy, Duke University, Durham, NC 27710, USA</p>
            </ins>
            <ins id="I2">
               <p>Miller Institute for Basic Research and Department of Plant and Microbial Biology, 111 Koshland Hall #3102, University of California, Berkeley, CA 94720-3102, USA</p>
            </ins>
            <ins id="I3">
               <p>National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>10</issue>
         <fpage>R223</fpage>
         <url>http://genomebiology.com/2007/8/10/R223</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17949488</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-10-r223</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>19</day>
               <month>12</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>12</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>19</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>19</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Stajich et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Intron evolution in fungal genomes</p>
      </shorttitle>
      <shortabs>
         <p>Analysis of intron gain and loss in fungal genomes provides support for an intron-rich fungus-animal ancestor.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from transcripts before protein translation. Many facets of spliceosomal intron evolution, including age, mechanisms of origins, the role of natural selection, and the causes of the vast differences in intron number between eukaryotic species, remain debated. Genome sequencing and comparative analysis has made possible whole genome analysis of intron evolution to address these questions.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We analyzed intron positions in 1,161 sets of orthologous genes across 25 eukaryotic species. We find strong support for an intron-rich fungus-animal ancestor, with more than four introns per kilobase, comparable to the highest known modern intron densities. Indeed, the fungus-animal ancestor is estimated to have had more introns than any of the extant fungi in this study. Thus, subsequent fungal evolution has been characterized by widespread and recurrent intron loss occurring in all fungal clades. These results reconcile three previously proposed methods for estimation of ancestral intron number, which previously gave very different estimates of ancestral intron number for eight eukaryotic species, as well as a fourth more recent method. We do not find a clear inverse correspondence between rates of intron loss and gain, contrary to the predictions of selection-based proposals for interspecific differences in intron number.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our results underscore the high intron density of eukaryotic ancestors and the widespread importance of intron loss through eukaryotic evolution.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Unlike bacteria, the protein-coding genes of eukaryotes are typically interrupted by spliceosomal introns, which are removed from gene transcripts before translation into proteins. Eukaryotic species vary dramatically in their number of introns, ranging from a few introns per genome to several introns per gene. The reasons for these vast differences, as well as the explanation for the particular pattern of intron number across species, remain obscure. The first genomes with characterized intron densities suggested the possibility of a close association between intron number and organismal complexity. The initial animal and land plant species studied had high intron densities, for instance, <it>Homo sapiens </it>with 8.1 introns per gene <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, <it>Caenorhabditis elegans </it>with 4.7 <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, <it>Drosophila melanogaster </it>with 3.4 <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, and <it>Arabidopsis thaliana </it>with 4.4 <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. By contrast, many unicellular species were found to have few <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. However, further studies have shown high intron densities in a variety of single-celled species <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>, with great variation in intron density within eukaryotic kingdoms.</p>
         <p>The case of fungi is particularly striking. The first fungal genomes characterized, the yeasts <it>Schizosaccharomyces pombe </it>(0.9 per gene) <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and <it>Saccharomyces cerevisiae </it>(0.05 per gene) <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, have low intron densities. However, the euascomycete fungi <it>Neurospora crassa </it>and <it>Aspergillus nidulans </it>have much higher intron densities (2-3 per gene) <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, and intron densities in basidiomycete and zygomycete fungi are among the highest known among eukaryotes (4-6 per gene) <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. Gene structures among fungal species are known to differ between closely related <it>Cryptococcus </it>species <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> or more distantly related euascomycete species <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Conservation of intron positions between deeply diverged fungal groups has not been systematically evaluated, and it is not known whether the large numbers of introns among these major fungal lineages are due primarily to retention of introns present in fungal ancestors or to intron gain into ancestrally intron-poor genes.</p>
         <p>Many intron positions are shared between eukaryotic kingdoms. In particular, many intron positions are shared between plants and animals but not the intron-sparse fungi <it>S. pombe </it>and <it>S. cerevisiae</it>, a pattern that is due to some combination of loss in fungi <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>, and homoplastic insertion in plants and animals <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. Separate analyses have supported different pictures, either of moderate ancestral intron densities followed by a tripling of intron number in vertebrates and plants <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B19">19</abbr></abbrgrp>, or of high ancestral intron density and massive intron loss in <it>S. pombe</it>, <it>S. cerevisiae</it>, and a variety of other species <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr></abbrgrp>. This study represents the first multi-kingdom comparative analysis to include multiple diverse and intron rich fungi, permitting a more accurate reconstruction of intron evolution through fungal history.</p>
         <p>We used comparative genomic analysis of the gene structures of 1,161 sets of orthologs among 21 fungal species and four outgroups. We found that studied fungal species share many intron positions with distantly related species; both the fungal ancestor and fungus-animal ancestor (Opisthokont) were very intron rich, with intron densities matching or exceeding the highest known average densities in modern species of fungi and approaching the highest known across eukaryotes. Fungal evolution has been dominated by intron loss and we identify independent nearly complete intron loss along three distinct fungal lineages in addition to overall patterns of intron loss.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Intron position data set</p>
            </st>
            <p>To study fungal intron evolution, we identified 1,161 orthologs among 21 fungal species and 4 outgroups (Figure <figr fid="F1">1</figr>; see Materials and methods). We aligned the amino acid sequences and mapped the corresponding intron positions onto the alignments. There were a total of 7,535 intron positions in 4.15 Megabases of conserved regions of alignment (hereafter 'conserved orthologous regions' (CORs)). Species' intron counts ranged from 0.001 introns per kilobase (kb) in CORs (in <it>S. cerevisiae </it>with 7 total introns) to 6.7 introns per kb (2,737 introns in humans; Figure <figr fid="F1">1</figr>). Figure <figr fid="F2">2</figr> summarizes the average number of introns per kb of coding sequence versus median intron length. In general, major lineages are clearly separated by intron density. One exception is <it>Ustilago maydis</it>, a basidiomycete fungus that has many fewer introns than other members of its clade. Median intron length is inversely and significantly correlated with the average number of introns per kb (R<sup>2 </sup>= 0.23, <it>P </it>= 1e<sup>-4</sup>; Spearman correlation coefficient), although the trend is not significant when the hemiascomycete fungi are excluded (R<sup>2 </sup>= 0.18, <it>P </it>= 0.06). This finding of much longer introns in the very intron-poor hemiascomycetes is intriguing, particularly in light of other peculiarities of evolution in very intron poor lineages <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. In particular, very intron-poor lineages, including hemiascomycetes (see below), have more regular 5' intronic sequences (that is, a stronger consensus sequence at the beginning of introns). Presumably, this conservation of 5' boundaries facilitates intron splicing, in which case increased intron length might be better accommodated. Comparison between other very intron-poor species and more intron-rich relatives should yield insight into the peculiarities of evolution of very intron-poor lineages. Additional data file 4 provides the summary statistics of coding sequence, intron length, and density for the sampled fungal genomes.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>This figure depicts a phylogenetic tree of the species used for this analysis</p>
               </caption>
               <text>
                  <p>This figure depicts a phylogenetic tree of the species used for this analysis. The tree is based on Bayesian phylogenetic reconstruction of 30 aligned orthologous proteins from the 25 species. The numbers after the species names list the total number of introns present in the CORs for each species. <it>U. maydis </it>is colored purple to indicate it has a different intron pattern than the rest of the basidiomycete fungi sampled. Numbers in boxes are node numbers that are used in Tables seen Additional data files 4 and 5.</p>
               </text>
               <graphic file="gb-2007-8-10-r223-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Intron length versus average number of introns per kilobase</p>
               </caption>
               <text>
                  <p>Intron length versus average number of introns per kilobase. Colored boxes indicate the fungal clade as shown in Figure 1: red, Hemiascomycota; yellow, Archiascomycota; green, Euascomycota; orange, Zygomycota; blue, Basidiomycota; purple, basidiomycete <it>U. maydis</it>. Bars indicating standard deviation in intron length are drawn but only visible for the intron-poor species. CDS, coding sequence.</p>
               </text>
               <graphic file="gb-2007-8-10-r223-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Patterns of intron sharing</p>
            </st>
            <p>Patterns of intron position sharing vary across fungal species. Excluding the extremely intron-poor Hemiascomycota clade, species show between 3.7% and 38.7% species-specific intron positions, while between 32.0% and 76.5% of introns are shared with a species outside of the clade (different colors in Figure <figr fid="F1">1</figr>), and between 20.5% and 60.1% are shared with a non-fungal species. Figure <figr fid="F3">3</figr> summarizes the pattern of species-specific and shared intron positions across the CORs. Out of 7,535 intron positions, 3,307 are species-specific positions, 1,602 of which are specific to <it>A. thaliana</it>. Of the 501 intron positions shared between plants and animals, from 2.76% in <it>U. maydis </it>to 43.2% in <it>Phanerochaete chrysosporium </it>(Figure <figr fid="F4">4</figr>) are shared with the various fungal species. In all, 60.7% of shared plant-animal positions are also represented in at least one fungal species.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Pattern of intron sharing of fungal species</p>
               </caption>
               <text>
                  <p>Pattern of intron sharing of fungal species. Fractions of intron positions that are shared with animal or plant (A+P), plant, animal, with another fungal clade (Euascomycota, Hemiascomycota, or Basidiomycota), or specific to the species or clade.</p>
               </text>
               <graphic file="gb-2007-8-10-r223-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Fraction of shared plant-animal intron positions in each fungal species</p>
               </caption>
               <text>
                  <p>Fraction of shared plant-animal intron positions in each fungal species. Among the 501 intron positions that are shared between <it>A. thaliana </it>and a vertebrate (and thus likely present in the fungus-animal ancestor), the fraction that is shared with each fungal species is given. Color coding is lavender: introns found only within the clade or a single species, maroon: introns shared only with other fungi,, pink: introns shared with animals, green: introns shared with plants (<it>A. thaliana</it>), brown: introns shared with animals or plants.</p>
               </text>
               <graphic file="gb-2007-8-10-r223-4"/>
            </fig>
            <p>Species within a clade share more intron positions than between clades. Another way to visualize this is using a phylogenetic tree derived from a parsimony analysis where each intron position is a binary character (Additional data file 1). We constructed a phylogenetic tree using Dollo parsimony <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp> from the intron presence absence matrix for the CORs. Dollo parsimony assumes that 0 to 1 transitions (intron gain) can occur only once across the tree for each site, and then infers a minimum number of 1 to 0 transitions (intron loss) to explain each phylogenetic pattern. Surprisingly, our species tree and parsimony tree from the intron position matrix provide nearly the same result, with two exceptions: the unresolved hemiascomycetes, which have few intron presence characters; and the position of <it>U. maydis </it>and <it>S. pombe</it>, presumably due to a high degree of intron loss in those lineages. Previous failed attempts to reconstruct phylogeny by applying parsimony analysis to intron positions experienced a similar phenomenon, with intron poor taxa artificially grouping together <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. As such, it seems possible that intron positions could be good phylogenetic characters in slowly evolving taxa, but will likely encounter problems in cases of widespread intron loss.</p>
         </sec>
         <sec>
            <st>
               <p>High ancestral intron number and ongoing loss and gain</p>
            </st>
            <p>We next studied intron loss and gain in fungi in CORs of 1,161 genes. Four previously proposed methods showed very similar pictures, with large numbers of introns present in ancestral genomes and widespread subsequent intron number reduction along various fungal lineages (Figure <figr fid="F5">5</figr>, and tables in additional files 4 and 5). We find that the fungal ancestor was at least as intron rich as any modern fungal species and that the fungus-animal ancestor was 25% more intron-rich than any modern fungus, with at least three-quarters as many introns as modern vertebrates.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Estimated number of introns per kilobase in CORs through fungal history using the EREM method</p>
               </caption>
               <text>
                  <p>Estimated number of introns per kilobase in CORs through fungal history using the EREM method. Numbers in ovals give estimated ancestral values normalized by the total number of aligned bases in the CORs (4.15 Mb). Numbers in black boxes represent the node number references in the tables in Additional data files 4 and 5. Blue branches indicate two or more estimated losses for each estimated gain; red > 1.5 gains per loss. <b>(a) </b>Summarized fungal tree. Triangles indicate clades, with values for the clade ancestor indicated. <b>(b) </b>Introns per kilobase through Euascomycota history, the clade indicated by the grey box in (a).</p>
               </text>
               <graphic file="gb-2007-8-10-r223-5"/>
            </fig>
            <p>Intron number reduction has been a general feature of fungal evolution (Figure <figr fid="F5">5</figr>). We estimate that at least half of the studied fungal lineages (excluding hemiascomycetes) experienced at least 50% more losses than gains, while only between three and six experienced 50% more gains than losses (Figure <figr fid="F5">5</figr>; depending on method used, see Additional file 5). Dramatic intron reduction has occurred within each fungal clade. <it>U. maydis' </it>0.21 introns per kb represent a 94% reduction in intron number relative to the basidiomycete ancestor; since the ascomycete ancestor (with at least 2.77 introns per kb), hemiascomycetes (0.01-0.07 introns per kb) species have reduced their intron number by at least 94%, <it>S. pombe </it>has reduced its intron number by 81% (0.52 introns per kb), and even relatively intron-rich euascomycete species (0.81-1.16 introns per kb) have undergone a 60% reduction in intron number. Interestingly, following dramatic intron number reduction in the euascomycete ancestor, intron number has remained relatively unchanged within the clade (Figure <figr fid="F5">5b</figr>), consistent with previous results <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B24">24</abbr></abbrgrp>.</p>
            <p>On the other hand, our results also attest to ongoing intron gain. Most species have experienced hundreds of intron gains in CORs (although many have subsequently been lost) since the fungal ancestor, and nearly every studied species is estimated to have gained more than one intron per kb since the intron ancestor. Differences in intron gain are sometimes the central determinant of modern differences in intron number. For instance, <it>S. pombe </it>shares as many of the 507 intron positions shared between plants and animals (most of which are likely ancestral) as most euascomycetes; euascomycete species' 50-100% more introns than <it>S. pombe </it>are thus primarily due not to greater retention of ancestral introns but to recent gain. Likewise, <it>Cryptococcus neoformans </it>retains fewer shared plant-animal introns than does <it>Rhizopus oryzae</it>, yet has 70% more introns, apparently due to more intron gain.</p>
         </sec>
         <sec>
            <st>
               <p>Intron evolution in hemiascomycetes</p>
            </st>
            <p>Intron evolution within hemiascomycetes provides insights into the evolution of nearly intronless lineages. The extensive loss of introns in hemiascomycetes corresponds to the position in the fungal phylogeny with a significant shift in intron structure. Intron structure in hemiascomycetes requires a six base sequence at the 5' splice site and a seven base pair site at the branching point <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The other sampled fungi require only a limited intron splice consensus at the 5' splice site and branching point. Previous results have shown that this correspondence between greatly reduced intron number and stronger conservation of intron boundaries across eukaryotes is a general trend <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Two explanations have been proposed. Irimia <it>et al</it>. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> suggested that mutations that led to stricter sequence requirements by the spliceosome might be favored in intron-poor but not intron-rich species, in which case widespread intron loss would lead to increased strictness of splicing requirements (and thus intron boundaries). Another possibility <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> is that a shift in splicing mechanism, requiring more extensive conserved sequences at the branch point and 5' splice junction, would create a condition where introns would be more deleterious due to the additional sequence constraint necessary for splicing. In this case, increased strictness of splicing requirements (and thus intron boundaries) would drive intron loss.</p>
            <p>Why have all of the introns then not been lost in hemiascomycete species? Some of the <it>S. cerevisiae </it>introns encode functional elements such as small nucleolar RNAs (snRNAs) <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> or promoter elements <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. snRNAs located in the introns of ribosomal proteins are found in orthologous loci of basidiomycetes and ascomycetes (for example, snR39 in RPL7A of <it>S. cerevisiae</it>), indicating their conservation since divergence from the fungal ancestor. However, only 8 of 76 snRNAs are found in the 275 nuclear introns in <it>S. cerevisiae </it><abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Introns also play a role in regulation of RNA and proteins <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, perhaps through a role in recruiting factors that mediate splicing-dependent export <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Some of the remaining introns in hemiascomycetes may also provide a necessary role as <it>cis</it>-regulatory containing elements or encoding factors necessary for post-transcriptional regulation, but they may also persist by chance due to low rates of loss.</p>
            <p>On the other hand, our results show that hemiascomycete intron positions are not in general widely shared. Only one of the seven intron positions in non-<it>Yarrowia lipolytica </it>hemiascomycete species examined is shared with any species more distant than euascomycetes. However, six of the seven are broadly shared within the hemiascomycete lineage, suggesting either that the remaining introns are very hard to lose or that loss rates have greatly diminished within the lineage. By contrast, 14 of 23 introns present in <it>Y. lipolytica </it>but no other hemiascomycete are shared with a non-euascomycete, and 10 are shared with plants and/or animals; thus, widely shared introns have been preferentially lost among hemiascomycetes after the divergence with the <it>Y. lipolytica </it>ancestor.</p>
         </sec>
         <sec>
            <st>
               <p>Selection and intron evolution</p>
            </st>
            <p>Eukaryotic species vary in their numbers of introns by orders of magnitude. These differences have traditionally been attributed to alleged differences in the intensity of selection against introns across eukaryotes <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. Additionally, it has been proposed that selection against introns could be similar, with differences in population size determining intron number <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. Under these models, lineages with strong selection against introns (or large population size) should experience low rates of intron gain and high rates of intron loss. Lineages with weaker selection (or smaller population size) should experience more intron gain and less intron loss. Both models thus predict a strong inverse correlation between intron gain and loss rates. However, the data presented here show no clear pattern of inverse correlation (Figure <figr fid="F5">5</figr>).</p>
         </sec>
         <sec>
            <st>
               <p>On the reconstruction of intron evolution</p>
            </st>
            <p>These results provide an excellent opportunity to compare different previously proposed methods for reconstruction of intron evolution. There are five previously proposed methods. Dollo parsimony assumes a minimal number of changes but that once an intron is lost at a position, it is never regained <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Roy and Gilbert's method ('RG') <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr></abbrgrp> assumes that all intron positions shared between species are representative of retained ancestral introns, while the methods of Cs&#369;r&#246;s <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and of Nguyen and coauthors ('NYK') <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> allow multiple intron insertions into the same site, so-called 'parallel insertion'. Carmel and coauthors' <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> method additionally allows for the possibility of heterogeneity of rates of both intron loss and gain across sites.</p>
            <p>Previously, application of four methods (Dollo, RG, Cs&#369;r&#246;s, and NYK) to intron positions in conserved regions of 684 sets of orthologs showed very different pictures of early eukaryotic evolution. Roy and Gilbert estimated the animal-fungus and plant-animal ancestors had some three-fifths as many introns as vertebrates (among the most intron-dense known modern species) <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, while Rogozin and collaborators <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, Cs&#369;r&#246;s <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, and Nguyen and collaborators <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> all concluded that these ancestors had only half that many introns, and that higher intron densities in plants and vertebrates were due to dramatic increases in intron number. This difference has repeatedly been attributed to overestimation by the RG method <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>, and the RG estimates have been called 'drastic' and 'generous' <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. The rationale for this conclusion has been that if a significant number of matching intron positions represent parallel insertion, the RG method will clearly overestimate ancestral intron number.</p>
            <p>We used all five methods to reconstruct intron evolution for the current data set. In contrast to the previous discordance, all methods now provide similar estimates for the numbers of introns in the animal-fungus ancestor. Dollo parsimony tended to be very different from the rest of the estimates for deep nodes in the tree. The Carmel and NYK methods show the most striking agreement, with less than 2% difference across all nodes except for the Opisthokont ancestor (3.3% difference). The NYK and Cs&#369;r&#246;s methods also show striking agreement, giving estimates within 2% of each other for 13 out of 18 (non-hemiascomycetes) nodes, and to within 10% for 17 out of 18. The RG method agreed with the other three methods to within 15% for all nodes except six and was not more than 30% higher than either of the other methods for any node other than the Ascomycete node. Notably, the three nodes on which RG was comparatively highest for the current data set are deep nodes near very long branches in this tree. Thus, further taxonomic sampling would likely bring even these nodes into better agreement (see below). Numbers of intron losses and gains in CORs along each branch were also estimated using all four methods. Though absolute numbers of estimated intron losses and gains along each branch varied more considerably between methods, there was a striking agreement in the relative incidence of intron loss and gain, with Cs&#369;r&#246;s (2.03 losses per gain), evolutionary reconstruction by expectation-maximization (EREM; 2.14) and NYK (2.12) nearly identical and RG only 21% higher (2.66). Notably, overall estimated numbers of gains were very similar, with only 19 more gains by RG than NYK. Results for all methods are given in Additional data files 4 and 5.</p>
            <p>Strikingly, all four methods now estimate that the fungus-animal ancestor had at least 70% as many introns as vertebrates, 15% more than estimated by Roy and Gilbert and more than twice that previously estimated by Cs&#369;r&#246;s and NYK. Thus, it appears that the previous difference in estimated intron density in the animal-fungal ancestor was not due to overestimation by the RG method, but to a 2.5-fold underestimation by the other methods. Indeed, even the estimates of Roy and Gilbert appear to have been conservative <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
            <p>Why should this be? Following the original authors <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, we suggest that this pattern may be due to unrecognized differences in rates of intron loss across sites. Clear differences in rates of intron loss across sites (that is, different rates of loss for introns at different positions along the same lineage) have been observed over both short <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp> and long <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp> evolutionary timescales; however, three out of four methods fail to take into account such differences in loss rate. Given the recurrent finding of differences in intron loss rates in a variety of studies, it is interesting that Carmel and coauthors' recent work did not find significant differences in rates, and that their method so closely cleaves to the findings of the other methods described here. Clearly, more study into possible differences in rates of evolution across sites, and their effects on current methods, is necessary.</p>
            <p>We performed simulations of intron evolution that included variations in intron loss rate across sites, and reconstructed intron loss/gain evolution on each set using four of the five methods (Dollo, RG, Cs&#369;r&#246;s, EREM). We considered a four-taxa case in which taxa A and B are sisters, and taxa C and D are sisters (Additional data file 2), and in which there were 1,000 introns in CORs in the common ancestor and allowed loss rates to vary between intron positions (Figure <figr fid="F6">6</figr>). In these simulated data sets no parallel gain was allowed to occur.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Performance of Cs&#369;r&#246;s, RG, Dollo parsimony, and EREM methods for the four-taxa case under intron loss rate variation with loss rates given by a standard gamma distribution with indicated alpha value, in which 30% or 70% of introns are lost along each external branch</p>
               </caption>
               <text>
                  <p>Performance of Cs&#369;r&#246;s, RG, Dollo parsimony, and EREM methods for the four-taxa case under intron loss rate variation with loss rates given by a standard gamma distribution with indicated alpha value, in which 30% or 70% of introns are lost along each external branch. The actual number of simulated ancestral intron numbers is 1,000; thus, both Cs&#369;r&#246;s and Dollo methods underestimate ancestral density under all cases. The relevant phylogeny is given in Additional file 2.</p>
               </text>
               <graphic file="gb-2007-8-10-r223-6"/>
            </fig>
            <p>There are four clear observations, each of which held over all sets of parameters. First, all methods underestimated ancestral intron density. Second, for each data set RG was closest to the real value, followed by EREM, then by Cs&#369;r&#246;s, then by Dollo parsimony. Third, the Cs&#369;r&#246;s and EREM methods consistently estimated significant numbers of parallel insertions even though none were included in the simulations - that is, both methods overestimated parallel insertions. Fourth, these trends typically increased with overall branch length. An exception to this was the lack of clear dependency of EREM on branch length.</p>
            <p>Together, these observations suggest the following explanation for the discrepancy between previous and current estimates. In the previous data sets <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, the fungi were represented by only <it>S. pombe </it>and <it>S. cerevisiae</it>, both of which have lost the vast majority of their ancestral introns (that is, the fungal branch was very long). Under such long branch conditions, the RG method somewhat underestimated ancestral intron density, while the other methods considerably underestimated intron density and overestimated parallel insertion. In the new data set, the inclusion of fungal species that retain many more of their ancestral introns shortened the fungal branch, leading to a convergence of the four methods on better estimates (and less or no overestimation of parallel gain by NYK and Cs&#369;r&#246;s).</p>
            <p>Indeed, the difference between NYK's estimation of the incidence of parallel gain between the present and previous data sets is striking. According to the NYK method of calculating parallel intron insertions, our data set showed very little evidence for parallel intron gain. Their method estimated 93.08 total parallel gains; thus, only 2.2 % of 4,228 shared introns were due to parallel gain. This is much less than the previous estimate that 18.5% of shared positions in the Rogozin data set were due to parallel gains. This is despite the fact that the overall number of estimated intron gains, as well as the overall number of estimated gains per kb, was higher in our data set than in the Rogozin data set. Thus, it seems that parallel gains were previously overestimated, and given the near identity of results from Cs&#369;r&#246;s method to NYK's, the same is very likely true of Cs&#369;r&#246;s' method.</p>
            <p>This decrease in the estimated incidence of parallel gain is all the more striking given the increased number of taxa across data sets, which presumably brings with it an increased number of real gains and real parallel gains, although the implications are not entirely clear given that the species present in the current data set are not a superset of the species in the previous set. Our simulations suggest here that there will be countervailing effects of greater taxonomic sampling, with a decrease in the overestimation of parallel gains due to long-branch effects coinciding with an increase in the overall number of true parallel gains. The decrease in estimated incidence of parallel gain seen here implies that currently the former effect dominates; however, with better and better sampling the latter effect may come to dominate in future data sets. More thorough simulation studies will be necessary to more completely understand this issue.</p>
            <p>What of other ancestral nodes of key biological interest for which the different methods gave very different estimates? The three methods' previous estimates based on the Rogozin data set also differed significantly for the fungi-animal-plant ancestor and the bilateran ancestor. In the previous data set, both ancestors were flanked by at least one very long branch, suggesting that all methods might have underestimated intron densities. The finding of intron-rich protostomes and apicomplexans would make resolution of this issue possible in the near future. This argument suggests that intron density was very high even in very early eukaryote ancestors.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>These results resolve a debate over the intron density of the fungal-animal ancestor. All proposed methodologies now agree that this ancestor was very intron rich, and that all modern fungi have experienced more intron loss than gain since divergence. These results underscore that intron evolution in eukaryotic evolution often defies common assumptions of organismal and gene structure complexity and requires new models of intron loss and gain evolution.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Genome data and annotation</p>
            </st>
            <p>Annotated genomes of many of the fungi analyzed were obtained from GenBank or directly from sequencing centers and are listed in Additional data file 3. For unannotated genomes, gene predictions were generated using a combination of <it>ab initio </it>and evidence based gene predictions and combined into a single composite gene call with the tool GLEAN <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. The <it>ab initio </it>gene prediction programs SNAP <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, AUGUSTUS <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, and Genezilla <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> were first trained on a set of genes for each genome based on alignments of conserved fungal proteins to the genome using Genewise <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> and Exonerate <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. At the start of this study, high quality annotations of <it>Aspergillus fumigatus</it>, <it>Aspergillus terreus</it>, <it>Coprinus cinereus</it>, <it>Podospora anserina </it>and <it>Rhizopus oryzae </it>were not available so automated annotations were generated so that these species could be included. We generated a new annotation of the v1 <it>P. chrysosporium </it>genome <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> as we found the previously published gene structures were not of sufficient quality based on multiple sequence alignments of the proteins with other fungal proteins. Prediction parameters derived from the closest annotated species were used with at least one round of retraining as previously described <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Frozen versions of genome sequences, annotations in GFF format, Genome Browser <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> and Web BLAST <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> interface to the genomes, predicted coding sequences and proteins are available for download from the authors' site <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Ortholog processing and intron to alignment mapping</p>
            </st>
            <p>The predicted proteins from the 21 fungal genome annotations (Additional data file 3), were combined with the <it>A. thaliana </it>annotations (Feb 2005) <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> available from GenBank and the <it>Fugu rubripes </it>(Ensembl 30.2e, assembly 2) <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, <it>Mus musculus </it>(Ensembl 30.33f) <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>, and <it>H. sapiens </it>annotations (Ensembl 30.35c) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The longest transcript was used for genes with multiple isoforms. The protein set was masked for low complexity sequences with pseg <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> searched in an all-against-all fashion using FASTP <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> with an expectation value cutoff of 1 &#215; 10<sup>-5</sup>. The output was processed with a custom Perl script to generate, for each pair of species, pairwise orthologs via best-mutual-FASTP hits. The pairwise orthologs were combined via single-linkage clustering for all sets of species into multi-way orthologs only if they formed clusters that contained exactly one protein member from each species.</p>
            <p>The protein sequences for these orthologs were then aligned using the multiple sequence alignment program MUSCLE <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. The protein alignments were used as a guide to align the genes' coding sequences and intron positions were mapped into both the protein and coding sequence alignments using Perl language modules from BioPerl <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. The 5' and 3' ends of most genes were not alignable and many introns that occurred in these regions, in particular most of the hemiascomycete introns that tend to be within the first few codons of a gene, could not be considered in this study. Alignments of the orthologs are provided as Additional data file 8.</p>
            <p>The alignments were evaluated for these intron positions in order to build a matrix of all intron positions. Similar to methodology in previous work <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, each observed intron column in the alignment was classified as to which species shared that intron position. Additionally, an intron position was classified as 'gapped' and removed from the final data matrix if it was within six nucleotides of a column with gaps following methodology from previous studies <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The aligned data with intron positions inserted are available in Additional data file 7.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic analyses</p>
            </st>
            <p>A random sampling of 30 of the protein alignments were used to generate a species tree by concatenating the aligned sequences and removing all gap columns from the alignment. The tree was computed and bootstrapped with MrBayes <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. The fungal species tree topology was constrained so that <it>Stagonospora nodorum </it>is basal to the euascomycetes for consistency with more exhaustive phylogenetic methods using larger sampling of taxa <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. Other than this constraint, the phylogenetic reconstruction was consistent with other studies that used a larger sampling of orthologous gene sequences <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>.</p>
            <p>Dollo parsimony was computed with dollop from the PHYLIP package <abbrgrp><abbr bid="B60">60</abbr></abbrgrp> using default parameters. We generated 1,000 bootstrap replicates with seqboot and Dollo parsimony was recomputed on the replicates. The strict consensus tree was computed from these trees with consense in PHYLIP.</p>
         </sec>
         <sec>
            <st>
               <p>Ancestral intron density reconstruction</p>
            </st>
            <p>The resulting matrix of classified intron positions was evaluated using the RG method computed along the species tree to compute intron densities, numbers of intron gains and losses, and the fraction of introns present at different internal nodes in the tree. The NYK method was also used to construct intron loss and gain rates and densities in ancestral nodes after modification of the authors' C code. The modified RG Perl code and the NYK C code is available in Additional data file 6. The Cs&#369;r&#246;s method, implemented in intronRates.jar program <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B61">61</abbr></abbrgrp>, was applied to the data set and allowed to find the optimal number of all-zero unobserved sites. EREM and Dollo parsimony values were computed with the EREM program <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B62">62</abbr></abbrgrp>. The EREM values were computed under a homogenous model. The values reported in Figure <figr fid="F5">5</figr> represent the maximum likelihood estimate from the EREM program of the numbers of predicted introns and gains and losses. Reconstructed values from all five methods are reported in the tables found in Additional data files 4 and 5. No overall comparisons between methods was made for the 'Crown' node or branches leading from this node as not all methods estimate ancestral density or rates without an outgroup.</p>
         </sec>
         <sec>
            <st>
               <p>Simulations</p>
            </st>
            <p>We simulated a four-taxa case in which taxa A and B are sisters, and taxa C and D are sisters, and in which there were 1,000 introns in CORs in the common ancestor (Additional data file 2). Different introns were assigned different loss rates as given by a standard gamma distribution, with varying gamma-values. The internal branch was set to length zero (neither intron loss nor gain along the internal branch). External branch lengths were set to be of equal length, with a length chosen for each gamma value such that, on average, a given fraction (70% or 30%) of all introns present at the ancestral node were retained in each descendent taxon. We generated data sets for gamma values from 2.0 (most variation in intron loss rate) to 10.0 (least) in increments of 0.5. No insertion, parallel or otherwise, was assumed. For each set of parameters we generated expected numbers of introns with each phylogenetic distribution, and used these values, rounded to the nearest integer, as inputs for all three methods.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>COR, conserved orthologous region; EREM, evolutionary reconstruction by expectation-maximization; NYK, Nguyen, Yoshihama, and Kenmochi method of intron reconstruction; RG, Roy-Gilbert method of intron reconstruction.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JES conceived of the project, collected the genome data and annotations, annotated genomes, wrote and ran necessary software, performed analyses, created the figures and tables, and wrote the paper. FSD contributed to the content and writing of the paper, provided access to computational resources, and supervised JES. SWR provided intron reconstruction methods, conceived and performed the simulations, analyzed the data, created figures and tables, and wrote the paper.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a comparison of two cladograms for the 25 species. Additional data file <supplr sid="S2">2</supplr> shows the phylogenetic tree used in the simulation data analysis. Additional data file <supplr sid="S3">3</supplr> provides the genomes and annotations used for this analysis with the source and version of the annotation indicated, with references for previously published annotations. Additional data file <supplr sid="S4">4</supplr> lists the intron reconstruction values for each node on the tree using the five methods from Nguyen <it>et al</it>. <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, Cs&#369;r&#246;s <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, Roy and Gilbert <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr></abbrgrp>, EREM from Carmel <it>et al</it>. <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, and Dollo parsimony as computed by EREM. Additional data file <supplr sid="S5">5</supplr> lists the rates and numbers of gains and loss for each branch on the tree using the four methods from Nguyen <it>et al</it>. <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, Cs&#369;r&#246;s <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, Roy and Gilbert <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr></abbrgrp>, EREM from Carmel <it>et al</it>. <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, and Dollo parsimony as computed by EREM. Additional data file <supplr sid="S6">6</supplr> provides ummary statistics for intron length, intron frequency, total length per genome, total intron count, total length of coding sequence, and genome size. Additional data file <supplr sid="S7">7</supplr> is a zip file containing data for the matrix of intron positions used for this analysis and phylogenetic tree representing the species; the file also contains the customized software for running NYK and the RG intron calculations. Additional data file <supplr sid="S8">8</supplr> is a zip file containing multi-FASTA alignments of orthologous genes with introns inserted into protein alignments.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Comparison of two cladograms for the 25 species</p>
            </caption>
            <text>
               <p>The left tree was built with a MrBayes using 30 orthologous proteins with the position of <it>S. nodorum </it>constrained based on previously published phylogenies. The tree on the right is the strict consensus tree of 116 MP trees built using Dollo parsimony and the matrix of presence or absence of intron positions. Nodes that are not present in all 116 trees are collapsed. Species groups are colored so that Euascomycota are in dark green, Hemiascomycota in red, archiascomycete <it>S. pombe </it>in yellow, Basidiomycota excluding <it>U. maydis </it>in blue, <it>U. maydis </it>in purple, zygomycete <it>R. oryzae </it>in orange, vertebrates in pink, and green plant <it>A. thaliana </it>in light green.</p>
            </text>
            <file name="gb-2007-8-10-r223-S1.eps">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>The phylogenetic tree used in the simulation data analysis</p>
            </caption>
            <text>
               <p>The phylogenetic tree used in the simulation data analysis.</p>
            </text>
            <file name="gb-2007-8-10-r223-S2.eps">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Genomes and annotations used for this analysis with the source and version of the annotation indicated, with references for previously published annotations</p>
            </caption>
            <text>
               <p>Genomes and annotations used for this analysis with the source and version of the annotation indicated, with references for previously published annotations.</p>
            </text>
            <file name="gb-2007-8-10-r223-S3.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>The intron reconstruction values for each node on the tree using the five methods from Nguyen <it>et al</it>. <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, Cs&#369;r&#246;s <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, Roy and Gilbert <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr></abbrgrp>, EREM from Carmel <it>et al</it>. <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, and Dollo parsimony as computed by EREM</p>
            </caption>
            <text>
               <p>Not all methods reconstruct a value for the Crown ancestor as this requires an outgroup and additional assumptions.</p>
            </text>
            <file name="gb-2007-8-10-r223-S4.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>The rates and numbers of gains and loss for each branch on the tree using the four methods from Nguyen <it>et al</it>. <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, Cs&#369;r&#246;s <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, Roy and Gilbert <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr></abbrgrp>, EREM from Carmel <it>et al</it>. <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, and Dollo parsimony as computed by EREM</p>
            </caption>
            <text>
               <p>Not all methods estimate gain and loss from Crown ancestor as this requires an outgroup and additional assumptions.</p>
            </text>
            <file name="gb-2007-8-10-r223-S5.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>Summary statistics for intron length, intron frequency, total length per genome, total intron count, total length of coding sequence, and genome size</p>
            </caption>
            <text>
               <p>Summary statistics for intron length, intron frequency, total length per genome, total intron count, total length of coding sequence, and genome size.</p>
            </text>
            <file name="gb-2007-8-10-r223-S6.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S7">
            <title>
               <p>Additional data file 7</p>
            </title>
            <caption>
               <p>Data for the matrix of intron positions used for this analysis and phylogenetic tree representing the species</p>
            </caption>
            <text>
               <p>The file also contains the customized software for running NYK and the RG intron calculations.</p>
            </text>
            <file name="gb-2007-8-10-r223-S7.zip">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S8">
            <title>
               <p>Additional data file 8</p>
            </title>
            <caption>
               <p>Multi-FASTA alignments of orthologous genes with introns inserted into protein alignments</p>
            </caption>
            <text>
               <p>Introns are represented by numbers in the alignment indicating the phase of the intron (0,1,2) as defined by the position in the codon the intron falls within. Coding sequence alignments and unaligned sequences are available from the authors.</p>
            </text>
            <file name="gb-2007-8-10-r223-S8.tgz">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We wish to thank BA Friedman, MW Hahn, TY James, V Maselli, MK Uyenoyama, and M Yandell for helpful discussion of this work. SWR thanks Walter Gilbert, Daniel Hartl, and David Penny for financial and intellectual support through the course of the project. We also thank HD Nguyen for kindly providing the C code for their method, M Cs&#369;r&#246;s for making Java implementation of his approach available on his website and L Carmel for assistance getting EREM to run. Computational analysis and genome annotation pipelines were performed on the Duke Shared Cluster Resource in the Center for Computational Science, Engineering, and Medicine. Website hosting for <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> is provided by the Duke Institute for Genome Science and Policy. JES was supported by an NSF graduate research fellowship and FSD was supported by NIH grant NS042263-03.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Finishing the euchromatic sequence of the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Collins</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <cnm>International Human Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>431</volume>
            <fpage>931</fpage>
            <lpage>945</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02945</pubid>
                  <pubid idtype="pmpid" link="fulltext">15496913</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>WormBase: better software, richer content.</p>
            </title>
            <aug>
               <au>
                  <snm>Schwarz</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Antoshechkin</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bastiani</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bieri</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Blasiar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Canaran</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <issue>34 Database</issue>
            <fpage>D475</fpage>
            <lpage>478</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347424</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381915</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj061</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>FlyBase: genes and gene models.</p>
            </title>
            <aug>
               <au>
                  <snm>Drysdale</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <issue>33 Database</issue>
            <fpage>D390</fpage>
            <lpage>395</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540000</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608223</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Complete reannotation of the <it>Arabidopsis </it>genome: methods, tools, protocols and the final release.</p>
            </title>
            <aug>
               <au>
                  <snm>Haas</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Ronning</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Hannick</snm>
                  <fnm>LI</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>RK</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Maiti</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Farzad</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <etal/>
            </aug>
            <source>BMC Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>7</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1082884</pubid>
                  <pubid idtype="pmpid" link="fulltext">15784138</pubid>
                  <pubid idtype="doi">10.1186/1741-7007-3-7</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Molecular evolution: recent cases of spliceosomal intron gain?</p>
            </title>
            <aug>
               <au>
                  <snm>Logsdon</snm>
                  <fnm>JM</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Stoltzfus</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>R560</fpage>
            <lpage>R563</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(07)00361-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">9707398</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The chaperonin genes of jakobid and jakobid-like flagellates: implications for eukaryotic evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Archibald</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>O'Kelly</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>422</fpage>
            <lpage>431</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11919283</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Analysis of <it>Chlamydomonas reinhardtii </it>genome structure using large-scale sequencing of regions on linkage groups I and III.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jia</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Roe</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Kulp</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Dutcher</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>J Eukaryot Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>50</volume>
            <fpage>145</fpage>
            <lpage>155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1550-7408.2003.tb00109.x</pubid>
                  <pubid idtype="pmpid">12836870</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The genome sequence of <it>Schizosaccharomyces pombe </it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Wood</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Gwilliam</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rajandream</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Lyne</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lyne</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sgouros</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Peat</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hayles</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>871</fpage>
            <lpage>880</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature724</pubid>
                  <pubid idtype="pmpid" link="fulltext">11859360</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Genome Snapshot: a new resource at the Saccharomyces Genome Database (SGD) presenting an overview of the <it>Saccharomyces cerevisiae </it>genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Hirschman</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Balakrishnan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Christie</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Costanzo</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Dwight</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Engel</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Fisk</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Hong</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Livstone</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Nash</snm>
                  <fnm>R</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <issue>34 Database</issue>
            <fpage>D442</fpage>
            <lpage>445</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347479</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381907</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj117</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The genome sequence of the filamentous fungus <it>Neurospora crassa </it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Galagan</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Calvo</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Borkovich</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Selker</snm>
                  <fnm>EU</fnm>
               </au>
               <au>
                  <snm>Read</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Jaffe</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>FitzHugh</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Smirnov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Purcell</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>422</volume>
            <fpage>859</fpage>
            <lpage>868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01554</pubid>
                  <pubid idtype="pmpid" link="fulltext">12712197</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Sequencing of <it>Aspergillus nidulans </it>and comparative analysis with <it>A</it>. <it>fumigatus </it>and <it>A</it>. <it>oryzae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Galagan</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Calvo</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Cuomo</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>SI</fnm>
               </au>
               <au>
                  <snm>Basturkmen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Spevak</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Clutterbuck</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>438</volume>
            <fpage>1105</fpage>
            <lpage>1115</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04341</pubid>
                  <pubid idtype="pmpid" link="fulltext">16372000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The genome of the basidiomycetous yeast and human pathogen <it>Cryptococcus neoformans </it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Loftus</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Fung</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Roncaglia</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Amedeo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bruno</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Vamathevan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Miranda</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Anderson</snm>
                  <fnm>IJ</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>JA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>307</volume>
            <fpage>1321</fpage>
            <lpage>1324</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1103773</pubid>
                  <pubid idtype="pmpid" link="fulltext">15653466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Genome sequence of the lignocellulose degrading fungus <it>Phanerochaete chrysosporium </it>strain RP78.</p>
            </title>
            <aug>
               <au>
                  <snm>Martinez</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Larrondo</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Putnam</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gelpke</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Chapman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Helfenbein</snm>
                  <fnm>KG</fnm>
               </au>
               <au>
                  <snm>Ramaiya</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Detter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Larimer</snm>
                  <fnm>F</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2004</pubdate>
            <volume>22</volume>
            <fpage>695</fpage>
            <lpage>700</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt967</pubid>
                  <pubid idtype="pmpid" link="fulltext">15122302</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Evidence of mRNA-mediated intron loss in the human-pathogenic fungus <it>Cryptococcus neoformans </it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Stajich</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Dietrich</snm>
                  <fnm>FS</fnm>
               </au>
            </aug>
            <source>Eukaryot Cell</source>
            <pubdate>2006</pubdate>
            <volume>5</volume>
            <fpage>789</fpage>
            <lpage>793</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1459680</pubid>
                  <pubid idtype="pmpid" link="fulltext">16682456</pubid>
                  <pubid idtype="doi">10.1128/EC.5.5.789-793.2006</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Patterns of intron gain and loss in fungi.</p>
            </title>
            <aug>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Birren</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Galagan</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <fpage>e422</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">532390</pubid>
                  <pubid idtype="pmpid" link="fulltext">15562318</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0020422</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Likely scenarios of intron evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Cs&#369;r&#246;s</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proceedings of the Third RECOMB Satellite Workshop on Comparative Genomics</source>
            <publisher>Dublin, IE: Springer LNBI</publisher>
            <editor>McLysaght A, Huson D</editor>
            <pubdate>2005</pubdate>
            <volume>3678</volume>
            <fpage>47</fpage>
            <lpage>60</lpage>
         </bibl>
         <bibl id="B17">
            <title>
               <p>New maximum likelihood estimators for eukaryotic intron evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>HD</fnm>
               </au>
               <au>
                  <snm>Yoshihama</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kenmochi</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>1</volume>
            <fpage>e79</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1323467</pubid>
                  <pubid idtype="pmpid" link="fulltext">16389300</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0010079</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Complex early genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Roy</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>1986</fpage>
            <lpage>1991</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">548548</pubid>
                  <pubid idtype="pmpid" link="fulltext">15687506</pubid>
                  <pubid idtype="doi">10.1073/pnas.0408355101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Sorokin</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Mirkin</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1512</fpage>
            <lpage>1517</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(03)00558-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12956953</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Rates of intron loss and gain: implications for early eukaryotic evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Roy</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>5773</fpage>
            <lpage>5778</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">556292</pubid>
                  <pubid idtype="pmpid" link="fulltext">15827119</pubid>
                  <pubid idtype="doi">10.1073/pnas.0500383102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Coevolution of genomic intron number and splice sites.</p>
            </title>
            <aug>
               <au>
                  <snm>Irimia</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Penny</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Roy</snm>
                  <fnm>SW</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>321</fpage>
            <lpage>325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2007.04.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">17442445</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Phylogenetic analysis under Dollo's law.</p>
            </title>
            <aug>
               <au>
                  <snm>Farris</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Syst Zool</source>
            <pubdate>1977</pubdate>
            <volume>26</volume>
            <fpage>77</fpage>
            <lpage>88</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2412867</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The uniquely evolved character concept and its cladistic application.</p>
            </title>
            <aug>
               <au>
                  <snm>Le Quesne</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Syst Zool</source>
            <pubdate>1974</pubdate>
            <volume>23</volume>
            <fpage>513</fpage>
            <lpage>517</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2412469</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Fungal Genome Initiative</p>
            </title>
            <url>http://www.broad.mit.edu/annotation/fgi/</url>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Molecular evolution of eukaryotic genomes: hemiascomycetous yeast spliceosomal introns.</p>
            </title>
            <aug>
               <au>
                  <snm>Bon</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Casaregola</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Blandin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Llorente</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Neuveglise</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Munsterkotter</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Guldener</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Mewes</snm>
                  <fnm>HW</fnm>
               </au>
               <au>
                  <snm>Van Helden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dujon</snm>
                  <fnm>B</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>1121</fpage>
            <lpage>1135</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">150231</pubid>
                  <pubid idtype="pmpid" link="fulltext">12582231</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg213</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The evolution of spliceosomal introns.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>AO</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>701</fpage>
            <lpage>710</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(02)00360-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12433585</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>The small nucleolar RNAs.</p>
            </title>
            <aug>
               <au>
                  <snm>Maxwell</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Fournier</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Annu Rev Biochem</source>
            <pubdate>1995</pubdate>
            <volume>64</volume>
            <fpage>897</fpage>
            <lpage>934</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.bi.64.070195.004341</pubid>
                  <pubid idtype="pmpid" link="fulltext">7574504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The intron of the yeast actin gene contains the promoter for an antisense RNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson-Jager</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Domdey</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Curr Genet</source>
            <pubdate>1990</pubdate>
            <volume>17</volume>
            <fpage>269</fpage>
            <lpage>273</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00312620</pubid>
                  <pubid idtype="pmpid">1692772</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Introns regulate RNA and protein abundance in yeast.</p>
            </title>
            <aug>
               <au>
                  <snm>Juneau</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Miranda</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hillenmeyer</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Nislow</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2006</pubdate>
            <volume>174</volume>
            <fpage>511</fpage>
            <lpage>518</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1569799</pubid>
                  <pubid idtype="pmpid" link="fulltext">16816425</pubid>
                  <pubid idtype="doi">10.1534/genetics.106.058560</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>A new view of mRNA export: separating the wheat from the chaff.</p>
            </title>
            <aug>
               <au>
                  <snm>Reed</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Magni</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Nat Cell Biol</source>
            <pubdate>2001</pubdate>
            <volume>3</volume>
            <fpage>E201</fpage>
            <lpage>E204</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ncb0901-e201</pubid>
                  <pubid idtype="pmpid" link="fulltext">11533670</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Genes in pieces - were they ever together.</p>
            </title>
            <aug>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1978</pubdate>
            <volume>272</volume>
            <fpage>581</fpage>
            <lpage>582</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1038/272581a0</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The exon theory of genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Cold Spring Harb Symp Quant Biol</source>
            <pubdate>1987</pubdate>
            <volume>52</volume>
            <fpage>901</fpage>
            <lpage>905</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2456887</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Intron evolution as a population-genetic process.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>6118</fpage>
            <lpage>6123</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">122912</pubid>
                  <pubid idtype="pmpid" link="fulltext">11983904</pubid>
                  <pubid idtype="doi">10.1073/pnas.092595699</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>The origins of genome complexity.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Conery</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <fpage>1401</fpage>
            <lpage>1404</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1089370</pubid>
                  <pubid idtype="pmpid" link="fulltext">14631042</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Three distinct modes of intron dynamics in the evolution of eukaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Carmel</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>1034</fpage>
            <lpage>1044</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1899114</pubid>
                  <pubid idtype="pmpid" link="fulltext">17495008</pubid>
                  <pubid idtype="doi">10.1101/gr.6438607</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Conservation versus parallel gains in intron evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Sverdlov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Babenko</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>1741</fpage>
            <lpage>1748</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1069513</pubid>
                  <pubid idtype="pmpid" link="fulltext">15788746</pubid>
                  <pubid idtype="doi">10.1093/nar/gki316</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Analysis of evolution of exon-intron structure of eukaryotic genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Sverdlov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Babenko</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Brief Bioinform</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>118</fpage>
            <lpage>134</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bib/6.2.118</pubid>
                  <pubid idtype="pmpid" link="fulltext">15975222</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p><it>Caenorhabditis </it>phylogeny predicts convergence of hermaphroditism and extensive intron loss.</p>
            </title>
            <aug>
               <au>
                  <snm>Kiontke</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Gavin</snm>
                  <fnm>NP</fnm>
               </au>
               <au>
                  <snm>Raynes</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Roehrig</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Piano</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Fitch</snm>
                  <fnm>DH</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>9003</fpage>
            <lpage>9008</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">428462</pubid>
                  <pubid idtype="pmpid" link="fulltext">15184656</pubid>
                  <pubid idtype="doi">10.1073/pnas.0403094101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Frequent intron loss in the white gene: a cautionary tale for phylogeneticists.</p>
            </title>
            <aug>
               <au>
                  <snm>Krzywinski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Besansky</snm>
                  <fnm>NJ</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>362</fpage>
            <lpage>366</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11861897</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The pattern of intron loss.</p>
            </title>
            <aug>
               <au>
                  <snm>Roy</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>713</fpage>
            <lpage>718</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545554</pubid>
                  <pubid idtype="pmpid" link="fulltext">15642949</pubid>
                  <pubid idtype="doi">10.1073/pnas.0408274102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Preferential loss and gain of introns in 3' portions of genes suggests a reverse-transcription mechanism of intron insertion.</p>
            </title>
            <aug>
               <au>
                  <snm>Sverdlov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Babenko</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2004</pubdate>
            <volume>338</volume>
            <fpage>85</fpage>
            <lpage>91</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2004.05.027</pubid>
                  <pubid idtype="pmpid" link="fulltext">15302409</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Creating a honey bee consensus gene set.</p>
            </title>
            <aug>
               <au>
                  <snm>Elsik</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Mackey</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Reese</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Milshina</snm>
                  <fnm>NV</fnm>
               </au>
               <au>
                  <snm>Roos</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>R13</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1839126</pubid>
                  <pubid idtype="pmpid" link="fulltext">17241472</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-1-r13</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Gene finding in novel genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>59</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">421630</pubid>
                  <pubid idtype="pmpid" link="fulltext">15144565</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-59</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Gene prediction with a hidden Markov model and a new intron submodel.</p>
            </title>
            <aug>
               <au>
                  <snm>Stanke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Waack</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>Suppl 2</issue>
            <fpage>II215</fpage>
            <lpage>II225</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14534192</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>TigrScan and GlimmerHMM: two open source <it>ab initio </it>eukaryotic gene-finders.</p>
            </title>
            <aug>
               <au>
                  <snm>Majoros</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Pertea</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>2878</fpage>
            <lpage>2879</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth315</pubid>
                  <pubid idtype="pmpid" link="fulltext">15145805</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>GeneWise and Genomewise.</p>
            </title>
            <aug>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>988</fpage>
            <lpage>995</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">479130</pubid>
                  <pubid idtype="pmpid" link="fulltext">15123596</pubid>
                  <pubid idtype="doi">10.1101/gr.1865504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Automated generation of heuristics for biological sequence comparison.</p>
            </title>
            <aug>
               <au>
                  <snm>Slater</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>31</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">553969</pubid>
                  <pubid idtype="pmpid" link="fulltext">15713233</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-31</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>The generic genome browser: a building block for a model organism system database.</p>
            </title>
            <aug>
               <au>
                  <snm>Stein</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Shu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Caudy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mangone</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nickerson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Stajich</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Arva</snm>
                  <fnm>A</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1599</fpage>
            <lpage>1610</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187535</pubid>
                  <pubid idtype="pmpid" link="fulltext">12368253</pubid>
                  <pubid idtype="doi">10.1101/gr.403602</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Resources for Fungal Comparative Genomics</p>
            </title>
            <url>http://fungal.genome.duke.edu</url>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Whole-genome shotgun assembly and analysis of the genome of <it>Fugu rubripes </it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Aparicio</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chapman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stupka</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Putnam</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Chia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Dehal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Christoffels</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rash</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hoon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>297</volume>
            <fpage>1301</fpage>
            <lpage>1310</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1072104</pubid>
                  <pubid idtype="pmpid" link="fulltext">12142439</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Initial sequencing and comparative analysis of the mouse genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Agarwala</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ainscough</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Alexandersson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>An</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>520</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01262</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466850</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Statistics of local complexity in amino acid sequences and sequence databases.</p>
            </title>
            <aug>
               <au>
                  <snm>Wootton</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Federhen</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Computers Chem</source>
            <pubdate>1993</pubdate>
            <volume>17</volume>
            <fpage>149</fpage>
            <lpage>163</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0097-8485(93)85006-X</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Improved tools for biological sequence comparison.</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>2444</fpage>
            <lpage>2448</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">280013</pubid>
                  <pubid idtype="pmpid" link="fulltext">3162770</pubid>
                  <pubid idtype="doi">10.1073/pnas.85.8.2444</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>MUSCLE: multiple sequence alignment with high accuracy and high throughput.</p>
            </title>
            <aug>
               <au>
                  <snm>Edgar</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>1792</fpage>
            <lpage>1797</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">390337</pubid>
                  <pubid idtype="pmpid" link="fulltext">15034147</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh340</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>The Bioperl toolkit: Perl modules for the life sciences.</p>
            </title>
            <aug>
               <au>
                  <snm>Stajich</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Block</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Boulez</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Chervitz</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Dagdigian</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fuellen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lapp</snm>
                  <fnm>H</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1611</fpage>
            <lpage>1618</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187536</pubid>
                  <pubid idtype="pmpid" link="fulltext">12368254</pubid>
                  <pubid idtype="doi">10.1101/gr.361602</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>MrBayes 3: Bayesian phylogenetic inference under mixed models.</p>
            </title>
            <aug>
               <au>
                  <snm>Ronquist</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Huelsenbeck</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>1572</fpage>
            <lpage>1574</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg180</pubid>
                  <pubid idtype="pmpid" link="fulltext">12912839</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Reconstructing the early evolution of Fungi using a six-gene phylogeny.</p>
            </title>
            <aug>
               <au>
                  <snm>James</snm>
                  <fnm>TY</fnm>
               </au>
               <au>
                  <snm>Kauff</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Schoch</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Matheny</snm>
                  <fnm>PB</fnm>
               </au>
               <au>
                  <snm>Hofstetter</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Celio</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gueidan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fraker</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Miadlikowska</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>443</volume>
            <fpage>818</fpage>
            <lpage>822</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature05110</pubid>
                  <pubid idtype="pmpid" link="fulltext">17051209</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Fitzpatrick</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Logue</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Stajich</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2006</pubdate>
            <volume>6</volume>
            <fpage>99</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1679813</pubid>
                  <pubid idtype="pmpid" link="fulltext">17121679</pubid>
                  <pubid idtype="doi">10.1186/1471-2148-6-99</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>PHYLIP (Phylogeny Inference Package)</source>
            <publisher>Seattle, WA: Department of Genome Sciences, University of Washington</publisher>
            <edition>3.6</edition>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Intron Evolution: In Search of Lost Introns</p>
            </title>
            <url>http://www.iro.umontreal.ca/~csuros/introns/</url>
         </bibl>
         <bibl id="B62">
            <title>
               <p>EREM: Evolutionary Reconstruction by Expectation-Maximization</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/CBBresearch/Fellows/Carmel/software/EREM/erem.html</url>
         </bibl>
      </refgrp>
   </bm>
</art>
