<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2001-2-5-research0016</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Identification of conserved C2H2 zinc-finger gene families in the Bilateria</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Knight</snm>
               <mi>D</mi>
               <fnm>Robert</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
            </au>
            <au id="A2" ca="yes">
               <snm>Shimeld</snm>
               <mi>M</mi>
               <fnm>Sebastian</fnm>
               <insr iid="I1"/>
               <email>s.m.shimeld@reading.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>School of Animal and Microbial Sciences, University of Reading, Whiteknights, Reading, RG6 6AJ, UK</p>
            </ins>
            <ins id="I2">
               <p>Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2001</pubdate>
         <volume>2</volume>
         <issue>5</issue>
         <fpage>research0016.1</fpage>
         <lpage>research0016.8</lpage>
         <url>http://genomebiology.com/2001/2/5/research/0016</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2001-2-5-research0016</pubid>
               <pubid idtype="pmpid">11387037</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>8</day>
               <month>12</month>
               <year>2000</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>6</day>
               <month>2</month>
               <year>2001</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>5</day>
               <month>3</month>
               <year>2001</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>24</day>
               <month>4</month>
               <year>2001</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2001</year>
         <collab>Knight and Shimeld, licensee BioMed Central Ltd</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Identification of orthologous relationships between genes from widely divergent taxa allows partial reconstruction of the gene complement of ancestral genomes. C2H2 zinc-finger genes are one of the largest and most complex gene superfamilies in metazoan genomes, with hundreds of members in the human genome. Here we analyze C2H2 zinc-finger genes from three taxa - <it>Drosophila, Caenorhabditis elegans</it> and human - from which near-complete genome sequence data are available.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Our analyses conclusively identify 39 families of genes, of which 38 can be defined as orthology groups in that they are descended from single ancestral genes in the common ancestor of <it>Drosophila, C. elegans</it> and humans.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>On the basis of current metazoan phylogeny, these 39 groups represent the minimum complement of C2H2 zinc-finger genes present in the genome of the bilaterian common ancestor.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Model organisms such as the fruit fly <it>Drosophila melanogaster</it> and the nematode <it>Caenorhabditis elegans</it> are commonly used to investigate gene function. Frequently, genes with similar sequence can be identified in the human genome, allowing prediction of human gene function by extrapolation from <it>Drosophila</it> and/or <it>C. elegans.</it> Implicit in such extrapolations is that the genes being compared are orthologous, that is, they derive from the same ancestral gene in the common ancestor of the model organism and humans [<abbr bid="B1">1</abbr>]. Correct identification of such relationships is therefore essential if extrapolation of function is to be fully exploited. In one form, such identifications typically utilize database comparisons with algorithms such as BLAST, with the highest-scoring sequences inferred to be orthologs [<abbr bid="B2">2</abbr>,<abbr bid="B3">3</abbr>]. Additional criteria can then be applied to confirm orthologous relationships, including checking that orthologs have similar domain structures, and ensuring that no sequence from a more distantly related taxon is more closely related to one proposed ortholog than to another. In more complex analyses, molecular phylogenetic reconstruction of gene family history is employed. Such reconstructions help distinguish speciation from gene duplication, thereby revealing orthologous and paralogous relationships.</p>
         <p>With the near-completion of the human, <it>C. elegans</it> and <it>Drosophila</it> genome sequences, it is becoming possible to extend the identification of such relationships to analyses of large, complex gene superfamilies in the Metazoa. Such an exercise essentially reconstructs the minimum gene complement, for a particular superfamily, that would have been present in the last common ancestor of these three taxa and, given their phylogenetic relationship [<abbr bid="B4">4</abbr>], gives insight into the genome complexity of the bilaterian common ancestor. Here we present an analysis of the C2H2 zinc finger (C2H2 ZNF) genes: a superfamily that, with over 600 members in humans, contains 1-2% of all human genes. C2H2 ZNF genes primarily encode DNA- and chromatin-binding transcription factors, and include familiar and well-studied developmental genes such as <it>Krox-20, snail, Gli, Kruppel</it> and <it>hunchback,</it> as well as numerous genes whose function is yet to be established. By defining orthologous relationships within this superfamily, we aim to reconstruct the minimum complement of C2H2 ZNFs present in the bilaterian common ancestor.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>The organization of a typical C2H2 ZNF includes two features that make inference of evolutionary history complicated (Figure <figr fid="F1">1</figr>). The first is the conservation in almost all C2H2 ZNFs of a number of key residues critical for the structure of the domain. This means all C2H2 ZNFs have a high baseline of identity. The second is repetition of the C2H2 ZNF motif in individual genes. This makes BLAST scores unreliable indicators of evolutionary relationships, as the score depends on the length of matching sequence and will be misleadingly high for genes that have independently evolved multiple contiguous fingers. Finger repetition also means that molecular phylogenetics can only be employed where the relationships of individual fingers between genes can be determined. This is only possible for subgroups where a robust phylogenetic framework has already been established, and is consequently of little use in defining such subgroups.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Schematic diagram of a C2H2 zinc-finger motif</p>
            </caption>
            <text>
               <p>Schematic diagram of a C2H2 zinc-finger motif. The paired cysteines (C) and histidines (H) that bind the zinc ion are shown in yellow and blue, respectively. The linker sequence, shown in green with its consensus sequence in the single-letter amino acid code, frequently joins adjacent fingers. This is apparent in the lower panel, which shows the typical arrangement of fingers in a C2H2 ZNF protein. The two large hydrophobic residues, which are also structurally important, are shown in red. The black residues are not structurally important and include those responsible for contacting DNA during sequence-specific binding [16]. The precise number of 'black' residues between the cysteines, histidines and on the loop may vary [10].</p>
            </text>
            <graphic file="gb-2001-2-5-research0016-1"/>
         </fig>
         <p>The limitations of BLAST and molecular phylogenetics lead us to seek alternative criteria for defining orthology of C2H2 ZNF genes. We used percentage amino-acid sequence identity over the ZNF region, as determined by FASTA [<abbr bid="B5">5</abbr>], as a preliminary indicator of relationships. First, we compiled datasets of all <it>Drosophila, C. elegans</it> and human proteins that contained C2H2 ZNFs. For a preliminary view of the levels of identity between species, we used FASTA to compare the <it>Drosophila</it> and <it>C. elegans</it> datasets to the human dataset and recorded the highest identity match in the human dataset for each <it>Drosophila</it> and <it>C. elegans</it> gene. To visualize the results, we combined identity scores (which potentially range from 0 to 100%) into 5% intervals and plotted the proportion of each dataset that had its highest match in each interval (Figure <figr fid="F2">2</figr>). The results were essentially the same for <it>Drosophila</it> and <it>C. elegans,</it> with a peak of highest identity centered at about 40% and a tail of genes with matches higher than 50%. A large majority of invertebrate genes had their highest identity matches to human genes within the peak in the 25-50% range.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Highest percentage-identity match in 5% intervals for the E &lt; 10 datasets of <it>Drosophila</it> and <it>C. elegans</it> compared to the human dataset</p>
            </caption>
            <text>
               <p>Highest percentage-identity match in 5% intervals for the E &lt; 10 datasets of <it>Drosophila</it> and <it>C. elegans</it> compared to the human dataset. Baseline identity between typical C2H2 ZNF domains is between 20 and 44%, and this is where most genes show their highest identity. Values higher than this range are strongly suggestive of orthology. We also examined the difference between this analysis and an analysis of more stringent datasets (E &lt; 1). All but one of the sequences detected at E &lt; 10 but excluded from E &lt; 1 had maximum identity matches below 40%.</p>
            </text>
            <graphic file="gb-2001-2-5-research0016-2"/>
         </fig>
         <p>In a typical C2H2 ZNF motif, between 20 and 44% of the amino acids are structurally important and highly conserved, with variation within this range mostly arising from the presence or absence of a six-residue linker sequence that frequently joins adjacent fingers (Figure <figr fid="F1">1</figr>). Therefore the peak centered at 40% in Figure <figr fid="F2">2</figr> can be largely explained by the baseline of identity that occurs between most C2H2 ZNF sequences. A similarity score of 45% and above indicates a closer relationship and therefore possible orthology. These values, however, cannot be used either to definitively exclude or conclude orthology without further evaluation because of the limited but significant variation in baseline identity. We therefore examined highest matches by eye to judge whether they indicated orthology. We used the presence of conserved amino acids in the zinc fingers other than those important for structure as a criterion to assess this. Specifically, we did not include the paired cysteines and histidines that bind the zinc ion (Figure <figr fid="F1">1</figr>). The consensus linker, where present, was also excluded. We also compared all <it>Drosophila</it> and <it>C. elegans</it> C2H2 ZNF sequences to available human genome and expressed sequence tag (EST) sequences to detect potential orthologs absent from our human C2H2 ZNF dataset. This step was essential as, because of the incomplete cataloguing of human protein data, our human C2H2 ZNF protein dataset is certain to be incomplete. With these analyses we defined a total of 39 families of genes (Table <tblr tid="T1">1</tblr>) which we propose represent 'orthology groups', as we infer that each group is descended from a single ancestral gene in the most recent common ancestor of <it>Drosophila, C. elegans</it> and humans. Multiple genes from one species within a group are therefore paralogs. To our knowledge, 17 of these groups have not previously been defined. As an additional check of orthology we also compared our C2H2 ZNF datasets to a yeast C2H2 ZNF dataset [<abbr bid="B6">6</abbr>]. No yeast sequences were more closely related to single orthology group members than to all group members, which supports our group definitions. Each orthology group typically contains genes with the same number and arrangement of fingers. This fulfilled another standard prediction of orthology (similar domain structure), and allowed us to use molecular phylogenetics to examine, where relevant, the pattern of evolution within a group and to determine whether our assumption of descent from a single gene in the most recent common ancestor was supported (Figure <figr fid="F3">3</figr>). In all but one case, molecular phylogenetics either produced trees that were too poorly resolved to confirm or disprove our inference of orthology or produced trees that supported our inference of orthology. The exception was the KLF family (Table <tblr tid="T1">1</tblr>), which tree topology suggested might include more than one orthology group; data from additional taxa will be necessary to further resolve this family. All sequences that showed an identity score >55% were in orthology groups. Conversely, we consider some sequences with scores of &lt;44% to be in orthology groups.</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Phylogenies of the gene families identified in our analysis for which more than three family members were present</p>
            </caption>
            <text>
               <p>Phylogenies of the gene families identified in our analysis for which more than three family members were present. <b>(a)</b> SP and KLF families; <b>(b)</b> Odd-like family; <b>(c)</b> Spalt family; <b>(d)</b> YY1 family; <b>(e)</b> Disco family; <b>(f)</b> IA-1 family; <b>(g)</b> Zep family; <b>(h)</b> Zic and Gli families; <b>(i)</b> Evi-1 family; <b>(j)</b> Snail family; <b>(k)</b> Ovo family; <b>(l)</b> Egr family. In each tree, the scale bar indicates a maximum likelihood branch length of 0.1 inferred substitutions per site and the numbers next to relevant branches are percentage quartet-puzzling support values. Genes and branches are color coded according to species: human genes are red, <it>Drosophila</it> genes are blue and <it>C. elegans</it> genes are green. Most trees are unrooted and built with members of only a single orthology group, as in only two cases could sequences from separate groups be confidently aligned. One of these exceptions is the SP and KLF families (a), which were analyzed together as their similar ZNF number and structure suggest relatively recent common ancestry. The other is the Zic and Gli families (h), which have a similar number and arrangement of C2H2 fingers. This tree also includes two 'orphan' <it>Drosophila</it> genes that have a similar finger arrangement. The phylogenetic analyses, with the exception of the KLF group, either failed to resolve relationships sufficiently to confirm or disprove orthology or showed that each group was descended from a single gene present in the common ancestor of humans, <it>C. elegans</it> and <it>Drosophila</it>. We therefore call these families 'orthology groups', implying that genes from different species within each family are orthologs. Consequently, genes from one species within a family are paralogs. For the KLF and SP genes, the tree topology shows monophyly of the SP genes and suggests that multiple KLF orthology groups may be present, although the poor resolution does not allow definition of these.</p>
            </text>
            <graphic file="gb-2001-2-5-research0016-3"/>
         </fig>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>The 39 groups of orthologous C2H2 ZNF genes defined by our analyses</p>
            </caption>
            <tblbdy cols="4">
               <r>
                  <c ca="left">
                     <p>Gene family</p>
                  </c>
                  <c ca="left">
                     <p>Human</p>
                  </c>
                  <c ca="left">
                     <p>
                        <it>Drosophila</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <it>C. elegans</it>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>1 Sp</p>
                  </c>
                  <c ca="left">
                     <p>Sp1 (SP:P08047)</p>
                  </c>
                  <c ca="left">
                     <p>Btd (CT35305; SPTR:Q24266)</p>
                  </c>
                  <c ca="left">
                     <p>T22C8.5 (SPTR:Q22678)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Sp2 (SP:Q02086)</p>
                  </c>
                  <c ca="left">
                     <p>DSp1 (CT2914; SPTR:Q9U1K4)</p>
                  </c>
                  <c ca="left">
                     <p>Y40B1A.4 (SPTR:Q9XW26)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Sp3 (SP:Q02447)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Sp4 (SP:Q02446)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>2 Zic</p>
                  </c>
                  <c ca="left">
                     <p>Zic1 (SP:Q15915)</p>
                  </c>
                  <c ca="left">
                     <p>Opa (CT1819; SPTR:P39768)</p>
                  </c>
                  <c ca="left">
                     <p>C47C12.3 (SPTR:Q94178)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Zic2 (SP:O95409)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Zic3 (SP:O60481)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>3 Ovo</p>
                  </c>
                  <c ca="left">
                     <p>Ovol1 (SP:O14753)</p>
                  </c>
                  <c ca="left">
                     <p>Ovo (CT21113, CT36311; SPTR:P51521)</p>
                  </c>
                  <c ca="left">
                     <p>F34D10.5 (SPTR:Q19996)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>SPTR:O00110</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>SPTR:Q9Y4M0</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>4 Snail</p>
                  </c>
                  <c ca="left">
                     <p>Slug (SP:O43623)</p>
                  </c>
                  <c ca="left">
                     <p>Snail (CT13146; SPTR:P08044)</p>
                  </c>
                  <c ca="left">
                     <p>C55C2.1 (SPTR:O01830)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Snai1 (SP:O95863)</p>
                  </c>
                  <c ca="left">
                     <p>Escargot (CT12561; SPTR:P25932)</p>
                  </c>
                  <c ca="left">
                     <p>F43G9.11 (SPTR:Q93721)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>SnaiP1 (Snai1 pseudogene)</p>
                  </c>
                  <c ca="left">
                     <p>Worniu (CT13175; SPTR:Q9NK88)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Scratch (CT1817; SPTR:Q24140)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>CT33426 (SPTR:Q9W0P9)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>CT34835 (SPTR:Q9VZK3)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>5 Gli</p>
                  </c>
                  <c ca="left">
                     <p>Gli/Gli1 (SP:P08151)</p>
                  </c>
                  <c ca="left">
                     <p>Ci (CT6641; SPTR:P19538)</p>
                  </c>
                  <c ca="left">
                     <p>Tra-1 (Y47D3A.6; SPTR:Q9U2C0)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Gli2 (SP:P10070)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Gli3 (SP:P10071)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>6 Egr/Krox</p>
                  </c>
                  <c ca="left">
                     <p>Egr1/Krox-24 (SP:P18146)</p>
                  </c>
                  <c ca="left">
                     <p>Stripe (CT23724; SPTR:Q24163)</p>
                  </c>
                  <c ca="left">
                     <p>C27C12.2 (SPTR:Q18250)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Egr2/Krox-20 (SP:P11161)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Y55F3AM.7 (SPTR:Q9N374)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Egr3 (SP:Q06889)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Egr4 (SP:Q05215)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>7 KLF</p>
                  </c>
                  <c ca="left">
                     <p>EZF/GKLF (SPTR:Q9UNP3)</p>
                  </c>
                  <c ca="left">
                     <p>CT2144 (SPTR:Q9VZN4)</p>
                  </c>
                  <c ca="left">
                     <p>F56F11.3 (SPTR:Q9TZ64)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>LKLF (SPTR:Q9UKR6)</p>
                  </c>
                  <c ca="left">
                     <p>CT27882 (SPTR:Q9W1W2)</p>
                  </c>
                  <c ca="left">
                     <p>F53F8.1 (SPTR:O62259)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>UKLF (SP:O75840)</p>
                  </c>
                  <c ca="left">
                     <p>CT14096 (SPTR:Q9VPQ5)</p>
                  </c>
                  <c ca="left">
                     <p>mua1/F54H5.4 (SPTR: P91329)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>BKLF (SP:P57682)</p>
                  </c>
                  <c ca="left">
                     <p>CT9920 (SPTR:O77251)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>EKLF (SP:Q13351)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>KKLF (SPTR:Q9UIH9)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ZNF741 (SPTR:O95600)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>NSLP1 (SPTR:Q9Y356)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>BTEB1 (SP:Q13886)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ZF9/CPBP (SP:Q99612)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>BTEB2/CKLF (SP:Q13887)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>AP-2REP (SPTR:Q9UHZ0)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>TIEG1 (SP:Q13118)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>TIEG2 (SP:O14901)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>8 Zfh-1</p>
                  </c>
                  <c ca="left">
                     <p>SPTR:O60315</p>
                  </c>
                  <c ca="left">
                     <p>Zfh-1 (CT2773; SPTR:P28166)</p>
                  </c>
                  <c ca="left">
                     <p>F28F9.1 (SPTR:Q94196)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>NIL-2-A (SP:P37275)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>9 Zfh-2</p>
                  </c>
                  <c ca="left">
                     <p>ATBF1 (SPTR:Q13719)</p>
                  </c>
                  <c ca="left">
                     <p>Zfh-2 (CT3397; SPTR:P28167)</p>
                  </c>
                  <c ca="left">
                     <p>ZC123.3 (SPTR:O45019)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>10 Odd-like</p>
                  </c>
                  <c ca="left">
                     <p>EM:AI126171</p>
                  </c>
                  <c ca="left">
                     <p>Odd (CT12867; SPTR:P23803)</p>
                  </c>
                  <c ca="left">
                     <p>YKC4 (B0280.4; SPTR:P41995)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Sob (CT10899; SPTR:Q24571)</p>
                  </c>
                  <c ca="left">
                     <p>C34H3.2 (SPTR:Q9N5X6)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Bowl (CT9648, CT37221;</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>CT40018, SPTR: Q9VQU9)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>11 Spalt</p>
                  </c>
                  <c ca="left">
                     <p>HSAL1 (SPTR:Q99881)</p>
                  </c>
                  <c ca="left">
                     <p>Spalt-major (CT20082; SPTR:P39770)</p>
                  </c>
                  <c ca="left">
                     <p>SEM-4 (SPTR:Q17396)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>HSAL2 (SPTR:Q9Y467)</p>
                  </c>
                  <c ca="left">
                     <p>Spalt-related (CT15643; SPTR:Q24163)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>SALL3 (SPTR:Q9UGH1)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>12 Disco</p>
                  </c>
                  <c ca="left">
                     <p>Basonuclin (SPTR:Q01954)</p>
                  </c>
                  <c ca="left">
                     <p>Disco (CT27904; SPTR:P23792)</p>
                  </c>
                  <c ca="left">
                     <p>F55C5 (SPTR: Q20815)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>SPTR:Q9NXV0</p>
                  </c>
                  <c ca="left">
                     <p>CT26340 (SPTR:Q9VXJ5)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>13 GFI</p>
                  </c>
                  <c ca="left">
                     <p>GFI-1, GFI-1B (SPTR:Q99684)</p>
                  </c>
                  <c ca="left">
                     <p>CT31381 (SPTR:Q9VM77)</p>
                  </c>
                  <c ca="left">
                     <p>F45B8.4 (SPTR:O02265)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>14 YY1</p>
                  </c>
                  <c ca="left">
                     <p>TYY1 (SPTR:P25490)</p>
                  </c>
                  <c ca="left">
                     <p>Pho (CT39329; SPTR:O76247)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>SPTR:O15391</p>
                  </c>
                  <c ca="left">
                     <p>CT11601 (SPTR:Q9VSZ3)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>15 BLIMP-1</p>
                  </c>
                  <c ca="left">
                     <p>BLIMP-1 (SPTR:O95914)</p>
                  </c>
                  <c ca="left">
                     <p>CT16759 (SPTR:Q9VRN4)</p>
                  </c>
                  <c ca="left">
                     <p>F25D7.3 (SPTR:Q93560)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>16 Zep</p>
                  </c>
                  <c ca="left">
                     <p>Zep1 (SP:P15822)</p>
                  </c>
                  <c ca="left">
                     <p>Schnurri (CT23537; PIR:A56922)</p>
                  </c>
                  <c ca="left">
                     <p>T05A10.1 (SPTR:Q22190)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Zep2 (SP:P31629)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>KBP-1 (SPTR:Q99302)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>17 IA-1<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>IA-1 (SP:Q01101)</p>
                  </c>
                  <c ca="left">
                     <p>CT31935 (SPTR:Q9VH29)</p>
                  </c>
                  <c ca="left">
                     <p>K11G9.4 (SPTR:Q23011)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Nerfin-1 (CT33443; SPTR:Q9V3B8)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>18 Evi-1<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>Evi-1 SP:Q03112)</p>
                  </c>
                  <c ca="left">
                     <p>CT29074 (SPTR:Q9VJ55)</p>
                  </c>
                  <c ca="left">
                     <p>R53.3 (A and B) (SPTR:Q22024)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>CT29650 (SPTR:Q9VJ52)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>19 SAP61</p>
                  </c>
                  <c ca="left">
                     <p>SAP 61 (SPTR:Q12874)</p>
                  </c>
                  <c ca="left">
                     <p>Noisette (CT7078; SPTR:O46106)</p>
                  </c>
                  <c ca="left">
                     <p>T13H5.4 (SPTR:Q22469)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>20 SP62<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>SP62 (SP:Q15428)</p>
                  </c>
                  <c ca="left">
                     <p>CT30142 (SPTR:Q9VU15)</p>
                  </c>
                  <c ca="left">
                     <p>F11A10.2 (SPTR:Q19335)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>21 Kin-17</p>
                  </c>
                  <c ca="left">
                     <p>KIN-17 (SPTR: O60870)</p>
                  </c>
                  <c ca="left">
                     <p>Kin-17 (CT17834; SPTR:O76926)</p>
                  </c>
                  <c ca="left">
                     <p>Y52B11A.9 (SPTR:Q9XWF2)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>22 Hindsight</p>
                  </c>
                  <c ca="left">
                     <p>FinB (SPTR:Q9Y474)</p>
                  </c>
                  <c ca="left">
                     <p>Hindsight</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>RREB-1 (SP:Q92766)</p>
                  </c>
                  <c ca="left">
                     <p>(CT11247; PIR:T13594)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>23 MTF</p>
                  </c>
                  <c ca="left">
                     <p>MTF-1 (SPTR:Q14872)</p>
                  </c>
                  <c ca="left">
                     <p>CT12477 (SPTR:Q9NFS1)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>24 ZNF207<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>ZNF207 (SP:O43670)</p>
                  </c>
                  <c ca="left">
                     <p>CT39886 (SPTR:Q9VJI6)</p>
                  </c>
                  <c ca="left">
                     <p>B0035.1 (SPTR:Q93156)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>25 ZNF277<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>ZNF277 (SP:Q9NRM2)</p>
                  </c>
                  <c ca="left">
                     <p>CT27874 (SPTR:Q9W1V7)</p>
                  </c>
                  <c ca="left">
                     <p>F46B6.7 (SPTR:Q20448)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>26 Fez</p>
                  </c>
                  <c ca="left">
                     <p>SPTR:Q9NWB9</p>
                  </c>
                  <c ca="left">
                     <p>CT22557 (SPTR:Q9VQ56)</p>
                  </c>
                  <c ca="left">
                     <p>Y38H8A.5 (SPTR:O62425)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>27 OAZ<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>OAZ (SPTR:Q9NZ13)</p>
                  </c>
                  <c ca="left">
                     <p>CT33481 (SPTR:Q9V724)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>28 Zfam 1<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>HSPC038 (SPTR:Q9Y5V0)</p>
                  </c>
                  <c ca="left">
                     <p>CT35941 (SPTR:Q9VUU8)</p>
                  </c>
                  <c ca="left">
                     <p>C01F6.9 (SPTR:O62023)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>CT40578 (SPTR:Q9VUU7)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>29 Zfam 2<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>SPTR:Q9NWA7</p>
                  </c>
                  <c ca="left">
                     <p>CT27270 (SPTR:Q9W3S1)</p>
                  </c>
                  <c ca="left">
                     <p>F13H6.1 (SPTR:O16350)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>30 Zfam 3<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>SPTR:Q9NTN4</p>
                  </c>
                  <c ca="left">
                     <p>CT15069 (SPTR:Q9VCS3)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>31 Zfam 4<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>EM:AI907237</p>
                  </c>
                  <c ca="left">
                     <p>CT17352 (SPTR:Q9U9A8)</p>
                  </c>
                  <c ca="left">
                     <p>Lin29 (SPTR:Q9N6B5)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>32 Zfam 5<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>EM:Z64553</p>
                  </c>
                  <c ca="left">
                     <p>CT21013 (SPTR:Q9VX08)</p>
                  </c>
                  <c ca="left">
                     <p>C16A3.4 (SPTR:Q18036)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>33 Zfam 6<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>Ptg-12 (EM:X97303)</p>
                  </c>
                  <c ca="left">
                     <p>CT36542 (SPTR:Q9VZF0)</p>
                  </c>
                  <c ca="left">
                     <p>ZK686.4 (SP:P34670)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>34 Zfam 7<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>EM:AC005606</p>
                  </c>
                  <c ca="left">
                     <p>CT31867 (SPTR:Q9W149)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>35 Zfam 8<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>EM:AK000711</p>
                  </c>
                  <c ca="left">
                     <p>CT4004 (SPTR:Q9V9Z6)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>36 Zfam 9<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>EM:HS626B19</p>
                  </c>
                  <c ca="left">
                     <p>CT32584 (SPTR:Q9VRV0)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>37 Zfam 10<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>Br140/BRPF1 (SP:P55201)</p>
                  </c>
                  <c ca="left">
                     <p>CT5659 (SPTR:Q9V4J4)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Br140-like (SP:O95696)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>38 Zfam 11<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>EM:AI077328</p>
                  </c>
                  <c ca="left">
                     <p>CT32574 (SPTR:Q9VRQ6)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>39 Zfam 12<sup>*</sup></p>
                  </c>
                  <c ca="left">
                     <p>HPCMF (SPTR:Q9P0J7)</p>
                  </c>
                  <c ca="left">
                     <p>CT32121 (SPTR:Q9VHI5)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p><sup>*</sup>Families we believe not to have been defined previously. Human genes are identified by gene name and, where names have not yet been given, by database accession number. <it>Drosophila</it> sequences are identified by gene name and corresponding Gadfly protein symbol in brackets [<abbr bid="B14">14</abbr>], or just by symbol where no name has been ascribed. <it>C. elegans</it> sequences are identified by name where possible and by coding sequence identifier [<abbr bid="B15">15</abbr>]. All sequences are also identified by accession number: where possible these are SWISSPROT TREMBL accession numbers (designated SPTR). In a few cases only SWISSPROT accession numbers (designated SP) could be identified. For a minority of human genes no protein database entries have been made. These derive from EST or genomic sequences and the corresponding EMBL nucleotide database accession number (designated EM) is given.</p>
            </tblfn>
         </tbl>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The 39 families identified above represent the conservative minimum of C2H2 ZNF genes present in the common ancestor of <it>Drosophila, C. elegans</it> and humans. They have, however, essentially been defined on one criterion - sequence identity at defined sites. It is possible that other features of zinc-finger genes could indicate orthology in the absence of sequence conservation, including similarities in the spacing between the paired histidines and cysteines, finger number, finger organization, intron/exon structure, the presence of other conserved domains and similarity of function. An example of this is the invertebrate <it>hunchback</it> and vertebrate <it>Ikaros</it>-related genes (<it>Ikaros, Helios, Eos</it> and <it>Aiolos</it>), which have low levels of sequence identity but a similar unusual arrangement of zinc fingers. Such examples may also represent orthology groups; their definition is, however, more subjective and we have not included them in our 39 groups.</p>
         <p>Even including speculative orthology groups such as <it>hunchback/Ikaros,</it> genes for which orthology can be determined represent less than 25% of the C2H2 ZNF gene complement of each genome. This suggests that many orthologous relationships may not have been identified using our criteria. Whereas lineage-specific gene loss may account for our inability to identify orthologs for a proportion of the remaining 'nonassignable' genes, for most genes orthology is presumably cryptic to the point that it can no longer be recognized. This is presumably a result of high rates of sequence divergence. A key question, then, is how many orthology groups are hidden in this remaining approximately 75% of genes? Direct extrapolation from our finding that 39 orthology groups contain about 25% of genes would suggest that another 117 orthology groups remain undetected. Evidence from human and <it>Xenopus</it> genomes, however, suggests that the number may be much less, as in both taxa a considerable number of C2H2 ZNF genes (<it>KRAB</it> C2H2 ZNF genes in humans and <it>FAX</it> and <it>FAR</it> C2H2 ZNF genes in <it>Xenopus</it>) have been reported to have evolved by separate mass gene duplications [<abbr bid="B7">7</abbr>,<abbr bid="B8">8</abbr>,<abbr bid="B9">9</abbr>]. Such lineage-restricted gene duplication suggests that a considerable proportion of the nonassignable genes may have evolved from a comparatively small number of ancestral genes. We therefore suggest that our 39 orthology groups represent a much larger proportion of the total existing groups than the 25% of genes they contain would suggest. Identifying precisely how many other groups there are, however, is a major bioinformatic challenge that will require data from other, phylogenetically well placed, taxa.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>We have conclusively identified 39 families of C2H2 ZNF genes by comparing <it>Drosophila</it> and <it>C. elegans</it> sequences with human sequences. Of these, 17 have not been previously defined, and we propose that 38 represent definitive groups of orthologous genes, each deriving from a single gene in the common ancestor of these three organisms. Therefore, on the basis of current metazoan phylogeny [<abbr bid="B4">4</abbr>], a member of each of these groups was primitively present in all triploblast bilaterian taxa, and they represent the minimum C2H2 ZNF complement in the bilaterian common ancestor.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <p><it>Drosophila</it> and <it>C. elegans</it> sequences were identified by searching the complete predicted protein sets (gadfly and wormpep 24, respectively) with a Hidden Markov Model profile generated from the PFAM C2H2 ZNF seed alignment [<abbr bid="B10">10</abbr>]. We searched at two stringencies, E &lt; 10 and E &lt; 1, identifying 394 and 332 <it>Drosophila</it> and 220 and 156 <it>C. elegans</it> sequences, respectively. Examination of the datasets showed the E &lt; 10 datasets to contain some other types of zinc fingers (for example ring fingers), and that the E &lt; 1 dataset excluded some genuine C2H2 ZNFs. We used this method rather than relying on previous identifications of C2H2 ZNF genes (see for example [<abbr bid="B11">11</abbr>]) as we wanted to be confident we had identified all members of this superfamily. Such stringent criteria could not be applied to identification of human sequences, where many genes are currently represented only by short or fragmented sequences in genomic or EST databases. Inclusion of such sequences in the dataset could potentially have biased our preliminary analyses because of their short length. Instead, human sequences were identified using the listing provided by the SMART database [<abbr bid="B12">12</abbr>], and edited to remove short sequences (&lt;100 amino acids). This provided a sufficiently large and diverse dataset of long sequences for our preliminary analyses, but raised the possibility that human orthologs of <it>Drosophila</it> and <it>C. elegans</it> sequences might be missed because of their exclusion from our human dataset. We circumvented this by using FASTA comparisons of all <it>Drosophila</it> and <it>C. elegans</it> C2H2 ZNF protein sequences against all available human genomic DNA and EST sequences to identify orthologs absent from our human dataset. Molecular phylogenetic analyses were performed using the maximum likelihood method with one fixed and eight gamma-distributed rates, implemented by Puzzle [<abbr bid="B13">13</abbr>].</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data files are available: <supplr sid="S1">alignments</supplr> for all orthology groups and <supplr sid="S40">datasets</supplr>.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Alignments for all orthology groups</p>
            </caption>
            <text>
               <p>Alignments for all orthology groups</p>
            </text>
            <file name="gb-2001-2-5-research0016-S1.doc">
               <p>SP family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S2.doc">
               <p>Zic family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S3.doc">
               <p>Ovo family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S4.doc">
               <p>Snail family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S5.doc">
               <p>Gli family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S6.doc">
               <p>Egr family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S7.doc">
               <p>KLF family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S8.doc">
               <p>Zfh1 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S9.doc">
               <p>Zfh2 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S10.doc">
               <p>Odd like family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S11.doc">
               <p>Spalt family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S12.doc">
               <p>Disco family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S13.doc">
               <p>GFI family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S14.doc">
               <p>YY1 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S15.doc">
               <p>Blimp 1 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S16.doc">
               <p>Zep family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S17.doc">
               <p>Insulinoma Associated family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S18.doc">
               <p>Evi-1 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S19.doc">
               <p>SAP61 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S20.doc">
               <p>SP62 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S21.doc">
               <p>Kin-17 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S22.doc">
               <p>Hindsight family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S23.doc">
               <p>MTF family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S24.doc">
               <p>ZNF207 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S25.doc">
               <p>ZNF 277 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S26.doc">
               <p>Fez family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S27.doc">
               <p>OAZ family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S28.doc">
               <p>Zfam1 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S29.doc">
               <p>Zfam2 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S30.doc">
               <p>Zfam3 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S31.doc">
               <p>Zfam4 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S32.doc">
               <p>Zfam5 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S33.doc">
               <p>Zfam6 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S34.doc">
               <p>Zfam7 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S35.doc">
               <p>Zfam8 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S36.doc">
               <p>Zfam9 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S37.doc">
               <p>Zfam10 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S38.doc">
               <p>Zfam11 family</p>
            </file>
            <file name="gb-2001-2-5-research0016-S39.doc">
               <p>Zfam12 family</p>
            </file>
         </suppl>
         <suppl id="S40">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Datasets</p>
            </caption>
            <text>
               <p>Datasets</p>
            </text>
            <file name="gb-2001-2-5-research0016-S40.txt">
               <p>Drosophila E1 dataset</p>
            </file>
            <file name="gb-2001-2-5-research0016-S41.txt">
               <p>Drosophila E10 dataset</p>
            </file>
            <file name="gb-2001-2-5-research0016-S42.txt">
               <p>Human dataset</p>
            </file>
            <file name="gb-2001-2-5-research0016-S43.txt">
               <p>Human over 100aa</p>
            </file>
            <file name="gb-2001-2-5-research0016-S44.txt">
               <p>Worm E1 dataset</p>
            </file>
            <file name="gb-2001-2-5-research0016-S45.txt">
               <p>Worm E10 dataset</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Homology, a personal view on some of the problems.</p>
            </title>
            <aug>
               <au>
                  <snm>Fitch</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>227</fpage>
            <lpage>231</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10782117</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A genomic approach to protein families.</p>
            </title>
            <aug>
               <au>
                  <snm>Tatusov</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>278</volume>
            <fpage>631</fpage>
            <lpage>637</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9381173</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The COG database: a tool for genome-scale analysis of protein functions and evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Tatusov</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Galperin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>33</fpage>
            <lpage>36</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102395</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592175</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The new animal phylogeny: Reliability and implications.</p>
            </title>
            <aug>
               <au>
                  <snm>Adoutte</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Balavoine</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lartillot</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lespinet</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Prud'homme</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>de Rosa</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>4453</fpage>
            <lpage>4456</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">34321</pubid>
                  <pubid idtype="pmpid" link="fulltext">10781043</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Improved tools for biological sequence comparison.</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Lipmann</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>2444</fpage>
            <lpage>2448</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3162770</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Variations of the C2H2 zinc finger motif in the yeast genome and classification of yeast zinc finger proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Bohm</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Frishman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mewes</snm>
                  <fnm>HW</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>2464</fpage>
            <lpage>2469</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146766</pubid>
                  <pubid idtype="pmpid" link="fulltext">9171100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The evolutionarily conserved Kruppel-associated box domain defines a subfamily of eukaryotic multifingered proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Bellefroid</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Poncelet</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lecocq</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Revelant</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Martial</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1991</pubdate>
            <volume>88</volume>
            <fpage>3608</fpage>
            <lpage>3612</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">51501</pubid>
                  <pubid idtype="pmpid" link="fulltext">2023909</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Evolutionary conserved modules associated with zinc fingers in <it>Xenopus laevis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Knochel</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Poting</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Koster</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>el Baradi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nietfeld</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Bouwmeester</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Piele</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1989</pubdate>
            <volume>86</volume>
            <fpage>6097</fpage>
            <lpage>6100</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2503827</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The FAR domain defines a new <it>Xenopus laevis</it> zinc finger protein subfamily with specific RNA homopolymer binding activity.</p>
            </title>
            <aug>
               <au>
                  <snm>Klocke</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Koster</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hille</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bouwmeester</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bohm</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pieler</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Knochel</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Biochim Biophys Acta</source>
            <pubdate>1994</pubdate>
            <volume>1217</volume>
            <fpage>81</fpage>
            <lpage>89</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7506934</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The Pfam Protein Families Database.</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Howe</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>263</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102420</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592242</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Zinc fingers in <it>Caenorhabditis elegans</it>: finding families and probing pathways.</p>
            </title>
            <aug>
               <au>
                  <snm>Clarke</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Berg</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <fpage>2018</fpage>
            <lpage>2022</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.282.5396.2018</pubid>
                  <pubid idtype="pmpid" link="fulltext">9851917</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>SMART: A Web-based tool for the study of genetically mobile domains.</p>
            </title>
            <aug>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Copley</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Doerks</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>231</fpage>
            <lpage>134</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102444</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592234</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies.</p>
            </title>
            <aug>
               <au>
                  <snm>Strimmer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>von Hassler</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1996</pubdate>
            <volume>13</volume>
            <fpage>964</fpage>
            <lpage>969</lpage>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The Berkeley <it>Drosophila</it> Genome Project</p>
            </title>
            <url>http://www.fruitfly.org/</url>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The <it>C. elegans</it> Protein Database Wormpep</p>
            </title>
            <url>http://www.sanger.ac.uk/Projects/C_elegans/wormpep/</url>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Crystal structure of a five-finger GLI-DNA complex: new perspectives on zinc fingers.</p>
            </title>
            <aug>
               <au>
                  <snm>Pavletich</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Pavo</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1993</pubdate>
            <volume>261</volume>
            <fpage>1701</fpage>
            <lpage>1707</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8378770</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
