<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-6-83</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>(TG/CA)<sub>n </sub>repeats in human gene families: abundance and selective patterns of distribution according to function and gene length</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Sharma</snm>
               <mi>K</mi>
               <fnm>Vineet</fnm>
               <insr iid="I1"/>
               <email>vsharma@igib.res.in</email>
            </au>
            <au id="A2">
               <snm>Brahmachari</snm>
               <mi>K</mi>
               <fnm>Samir</fnm>
               <insr iid="I1"/>
               <email>skb@igib.res.in</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Ramachandran</snm>
               <fnm>Srinivasan</fnm>
               <insr iid="I1"/>
               <email>ramu@igib.res.in</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>G.N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2005</pubdate>
         <volume>6</volume>
         <issue>1</issue>
         <fpage>83</fpage>
         <url>http://www.biomedcentral.com/1471-2164/6/83</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15935094</pubid>
               <pubid idtype="doi">10.1186/1471-2164-6-83</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>22</day>
               <month>10</month>
               <year>2004</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>03</day>
               <month>6</month>
               <year>2005</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>03</day>
               <month>6</month>
               <year>2005</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2005</year>
         <collab>Sharma et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Creation of human gene families was facilitated significantly by gene duplication and diversification. The (TG/CA)<sub>n </sub>repeats exhibit length variability, display genome-wide distribution, and are abundant in the human genome. Accumulation of evidences for their multiple functional roles including regulation of transcription and stimulation of recombination and splicing elect them as functional elements. Here, we report analysis of the distribution of (TG/CA)<sub>n </sub>repeats in human gene families.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The 1,317 human gene families were classified into six functional classes. Distribution of (TG/CA)<sub>n </sub>repeats were analyzed both from a global perspective and from a stratified perspective based on their biological properties. The number of genes with repeats decreased with increasing repeat length and several genes (53%) had repeats of multiple types in various combinations. Repeats were positively associated with the class of Signaling and communication whereas, they were negatively associated with the classes of Immune and related functions and of Information. The proportion of genes with (TG/CA)<sub>n </sub>repeats in each class was proportional to the corresponding average gene length. The repeat distribution pattern in large gene families generally mirrored the global distribution pattern but differed particularly for <it>Collagen </it>gene family, which was rich in repeats. The position and flanking sequences of the repeats of <it>Collagen </it>genes showed high conservation in the Chimpanzee genome. However the majority of these repeats displayed length polymorphism.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Positive association of repeats with genes of Signaling and communication points to their role in modulation of transcription. Negative association of repeats in genes of Information relates to the smaller gene length, higher expression and fundamental role in cellular physiology. In genes of Immune and related functions negative association of repeats perhaps relates to the smaller gene length and the directional nature of the recombinogenic processes to generate immune diversity. Thus, multiple factors including gene length, function and directionality of recombinogenic processes steered the observed distribution of (TG/CA)<sub>n </sub>repeats. Furthermore, the distribution of repeat patterns is consistent with the current model that long repeats tend to contract more than expand whereas, the reverse dynamics operates in short repeats.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The evolution of organisms with increasing complexity was significantly facilitated by duplication of genes and genomes followed by diversification <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Gene duplication <it>per se </it>produces two identical copies. Subsequently, one of the copies may either accumulate beneficial changes to give rise to a functionally diversified gene or accrue deleterious mutations to end up as a pseudogene, while the other copy retains its original function. The former mechanism leads to the creation of 'gene families' capable of carrying out diverse functions <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>.</p>
         <p>The classification of genes into gene families by Human Gene Nomenclature Committee (HGNC) on the basis of sequence similarity of the encoded proteins <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and the availability of human genome sequence <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> allow us to carry out a comprehensive survey of a class of important functional element, namely the (TG/CA)<sub>n </sub>repeats. Analysis of the distribution of (TG/CA)<sub>n</sub>repeats within genes in 'present day' gene families holds the potential to provide insights into the factors steering their abundance and selective distribution. Although the characteristic property of (TG/CA)<sub>n </sub>repeats exhibiting length polymorphism has been widely used in genetic mapping <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, a growing body of evidence accumulating over several years point to their multiple functional roles in various biological processes.</p>
         <p>The (TG/CA)<sub>n </sub>repeats have a propensity to undergo structural transitions <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp> and have been shown to modulate transcription in several genes including rat <it>&#945;-lactalbumin </it><abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, rat <it>prolactin </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, <it>MMP-9 </it><abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, <it>IFN-&#947; </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, <it>EGFR </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, <it>HSD11B2 </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, tilipia <it>prolactin1 </it><abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and human housekeeping genes <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Furthermore, the (TG)<sub>n </sub>tracts have been observed to act as stimulator in recombination and in mRNA splicing <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>.</p>
         <p>In the current study, the analysis of distribution of (TG/CA)<sub>n </sub>repeats in human gene families affords assessment of the distribution of these repeats by examining for positive association or negative association with respect to gene length and function.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Characteristics of human gene families and their functional classification</p>
            </st>
            <p>Each of the 1,317 gene families included members with similar functional roles. The family sizes varied in a wide range between 2 to 223 members (Figure <figr fid="F1">1</figr>). The number of gene families was found to bear an inverse exponential relation to family size. About two-fifths of the gene families were duplex. Only three gene families had more than 100 members per family: Immunoglobulin heavy chain (162 genes), Zinc finger proteins (200 genes) and Solute carrier (223 genes).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Distribution pattern of human gene families with respect to family sizes</p>
               </caption>
               <text>
                  <p>Distribution pattern of human gene families with respect to family sizes. X axis: family size (number of genes in each gene family). Y axis: number of gene families corresponding to various family sizes. Note the inverse exponential relationship.</p>
               </text>
               <graphic file="1471-2164-6-83-1" hint_layout="double"/>
            </fig>
            <p>The functional classification of 1,317 gene families comprising 7,928 genes in the six functional classes unveiled that the Signaling and communication is largest with 529 families and 3,072 genes (Figure <figr fid="F2">2</figr>). The Cell cycle is the smallest with 82 families and 470 genes.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Global distribution of gene families, genes and proportion of genes containing (TG/CA)<sub>n </sub>repeats classified into the six functional classes</p>
               </caption>
               <text>
                  <p>Global distribution of gene families, genes and proportion of genes containing (TG/CA)<sub>n </sub>repeats classified into the six functional classes. The numbers correspond to the height of the vertical bars in each group.</p>
               </text>
               <graphic file="1471-2164-6-83-2" hint_layout="double"/>
            </fig>
            <p>Of the 1,317 gene families, 131 were entirely intrachromosomal. Chromosome 1 had the largest number with 17 families followed by chromosomes 19 and 11 with 13 and 12 families respectively. The remaining chromosomes had less than 10 intrachromosomal gene families per chromosome. The functional classification of these 131 intrachromosomal gene families revealed that the highest number (45) belonged to the class of 'Immune and related functions' closely followed by the class of Signaling and communication with 40 families. The remaining classes had the following distribution of gene families: Metabolism (24), Information (15), Structure and motility (5) and Cell cycle (2). These observations indicate that the creation of intrachromosomal human gene families was driven by large number of duplications followed by divergence in selected functional classes.</p>
         </sec>
         <sec>
            <st>
               <p>Global distribution of (TG/CA)<sub>n </sub>repeats (n &#8805; 6 units) in gene families</p>
            </st>
            <p>Of the 1,317 gene families, 732 families had (TG/CA)<sub>n </sub>repeats in at least one of their members and 326 families had repeats in all their members. Of the 7,928 genes in 1,317 families, 3,986 genes had intragenic (TG/CA)<sub>n </sub>repeats of length greater than or equal to 6 units. All 3,986 genes had repeats in their introns. Only 277 genes had (TG/CA)<sub>n </sub>repeats in exons indicating that these repeats are mainly present in introns.</p>
            <p>The distribution of genes with (TG/CA)<sub>n </sub>repeats in the six functional classes is displayed in Figure <figr fid="F2">2</figr>. It is apparent that the class of Signaling and communication had the highest number of genes with (TG/CA)<sub>n </sub>repeats. Comparison of the proportion of genes with repeats in each class with the global proportion showed that the class of Signaling and communication had significantly higher than the expected proportion (p &lt; 0.0001, Binomial test). In contrast, the classes of Immune and related functions and Information had significantly lower than the expected proportion of genes with repeats (p &lt; 0.0001 and p &lt; 0.0002 respectively). The proportion of genes with repeats was not significantly different from the global proportion in the Cell cycle, Metabolism and Structure and motility classes. These observations show that the (TG/CA)<sub>n </sub>repeats exhibit positive association with the genes belonging to Signaling and communication whereas, they are negatively associated with the genes belonging to Immune and related functions and Information.</p>
            <p>It has been shown that the human genome has an isochore structure that varies in GC content <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. This variation raises the possibility that the observed selective distribution of (TG/CA)<sub>n </sub>repeats might have arisen due to fluctuations in the local %(G+C) content of the genomic region as opposed to function. We examined this by comparing the average %(G+C) content of the genes in the six functional classes with the corresponding proportions of genes with repeats. The average %(G+C) content was observed to be in the narrow range (47&#8211;49%) in the six functional classes whereas, the proportion of genes with repeats varies widely in the range 29.6&#8211;61%. These observations indicate that the proportion of genes with repeats is significantly determined by function instead of small fluctuations in %(G+C) content.</p>
         </sec>
         <sec>
            <st>
               <p>Correlation between gene length, function and global distribution of (TG/CA)<sub>n </sub>repeats</p>
            </st>
            <p>Comparison of the proportion of genes containing (TG/CA)<sub>n </sub>repeats with the average lengths of genes in each of the six functional classes revealed a linear relationship (Figure <figr fid="F3">3</figr>, correlation coefficient R = 0.93, p &lt; 0.007). The signaling and communication class had the longest average gene length (74.07 kb) along with the highest proportion of genes with (TG/CA)<sub>n </sub>repeats (61.23%). The class of Immune and related functions had the shortest average gene length (21.26 kb) with the lowest proportion of genes with (TG/CA)<sub>n </sub>repeats (29.65%). These observations show that the proportion of genes with (TG/CA)<sub>n </sub>repeats bears a linear relationship to the length of genes.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Relationship between proportion of genes with (TG/CA)<sub>n </sub>repeats in each functional class and the average gene length in the corresponding functional classes</p>
               </caption>
               <text>
                  <p>Relationship between proportion of genes with (TG/CA)<sub>n</sub> repeats in each functional class and the average gene length in the corresponding functional classes.  X axis: Proportion of genes with (TG/CA)<sub>n</sub> repeats (%); Y axis: Average gene length (kb)  (CC: Cell cycle; IN: Information; IR: Immune and related functions; MET: Metabolism; SC: Signaling and communication; STM: Structure and motility) </p>
               </text>
               <graphic file="1471-2164-6-83-3" hint_layout="double"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Trinity of (TG/CA)<sub>n </sub>repeats in gene families</p>
            </st>
            <p>In order to examine the characteristics of distribution of (TG/CA)<sub>n </sub>repeats with respect to multiple functional roles principally governed by their length, we analysed the repeats stratified into three categories: type I (6 &#8804; n &lt; 12), type II (12 &#8804; n &lt; 23) and type III (n &#8805; 23). The results are displayed in Figure <figr fid="F4">4</figr>. The number of genes containing (TG/CA)<sub>n </sub>repeats decreases with increasing repeat length. It is also apparent that several genes (53% of the total) have multiple types of repeats in various combinations.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>A Venn diagram of the genes with trinity of intragenic (TG/CA)<sub>n </sub>repeats (type I, II and III)</p>
               </caption>
               <text>
                  <p>A Venn diagram of the genes with trinity of intragenic (TG/CA)<sub>n </sub>repeats (type I, II and III). Note that several genes (shaded area) have multiple types of repeats in various combinations.</p>
               </text>
               <graphic file="1471-2164-6-83-4" hint_layout="double"/>
            </fig>
            <p>Classification of the distribution of genes with (TG/CA)<sub>n </sub>repeats stratified into three categories into six functional classes is shown in Figure <figr fid="F5">5</figr>. It is evident that the proportion of genes containing repeats decreases in the order I > II > III in all classes. The proportion of genes containing (TG/CA)<sub>n </sub>repeats of Signaling and communication were significantly higher than the expected proportion in all three categories of repeats (p &lt; 0.0001, type I, II and III). On the other hand, the proportion of genes with (TG/CA)<sub>n </sub>repeats of Immune and related functions and Information were significantly lower than expected proportion in all three categories: Immune and related functions (p &lt; 0.0001, type I, II and III), Information (p &lt; 0.0001, type I and II, p &lt; 0.004, type III). The proportion of genes with type III repeats was marginally lower than the expected proportion in Metabolism class (p &lt; 0.01) and marginally higher than the expected proportion in Structure and motility class (p &lt; 0.02). The proportion of genes with repeats in the three categories was not significantly different from the expected value in the class of Cell cycle. These observations show that repeats of all types are positively associated with the genes of Signaling and communication whereas they are negatively associated with the genes of Immune and related functions and Information.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Distribution of proportion of genes with three types of (TG/CA)<sub>n </sub>repeats in the six functional classes</p>
               </caption>
               <text>
                  <p>Distribution of proportion of genes with three types of (TG/CA)<sub>n </sub>repeats in the six functional classes.</p>
               </text>
               <graphic file="1471-2164-6-83-5" hint_layout="double"/>
            </fig>
            <p>The distribution of average number of (TG/CA)<sub>n </sub>repeats per gene in the three categories in the six functional classes is displayed in Figure <figr fid="F6">6</figr>. Comparison of the average number of repeats per gene in the three categories with the global distribution pattern revealed that in most cases the observed number was significantly lower than the expected value, except for the genes belonging to Signaling and communication and Structure and motility, which had significantly higher average number of repeats per gene than the expected value (p &lt; 0.0004 in all three categories, both classes). The average number of type III repeats per gene in the class of Cell cycle was not significantly different from the expected value. These observations show that the repeat densities were higher in the genes belonging to Signaling and communication and Structure and motility classes whereas, the genes belonging to other classes had lower repeat densities.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Distribution of the densities of three types of (TG/CA)<sub>n </sub>repeats in the genes of six functional classes</p>
               </caption>
               <text>
                  <p>Distribution of the densities of three types of (TG/CA)<sub>n </sub>repeats in the genes of six functional classes.</p>
               </text>
               <graphic file="1471-2164-6-83-6" hint_layout="double"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Large gene families</p>
            </st>
            <p>As a special case of this study, we examined the distribution of (TG/CA)<sub>n </sub>repeats in the top 2% large families (27). The family sizes of this category varied widely from 32 to 223 members. Functional classification of these large families revealed the following distribution: Immune and related functions (9), Signaling and communication (8), Information (6), Metabolism (2), Structure and motility (1) and Cell cycle (1).</p>
            <p>The proportion of genes with (TG/CA)<sub>n </sub>repeats in large families is displayed in Table <tblr tid="T1">1</tblr>. Comparison with the global distribution showed that the proportion of genes with repeats was significantly higher than expected value in the Signaling and communication and Structure and motility classes (p &lt; 0.0001, Binomial test). There was no significant difference between the observed and the expected proportion of genes with repeats in the class of Metabolism. In the remaining classes, the proportion of genes with repeats was significantly lower than the expected value (p &lt; 0.0001, Binomial test). As observed with all gene families, a linear relationship was observed between gene lengths and proportion of genes with (TG/CA)<sub>n </sub>repeats (correlation coefficient R = 0.79, p &lt; 0.0001).</p>
            <tbl id="T1" hint_layout="double">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Distribution of (TG/CA)<sub>n </sub>repeats in large gene families</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Functional class and gene families</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Chromosomal Distribution</b>
                           <sup>a</sup>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Average Gene Length (kb)</b>
                           <sup>b</sup>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Genes in the family</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Proportion of Genes with (TG/CA)<sub>n </sub>repeats</b>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Number of Genes with (TG/CA)<sub>n </sub>repeats in three categories</b>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Average number of repeats per gene in three categories</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Type I </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Type II</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Type III</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Type I</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Type II</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Type III</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="11" ca="left">
                        <p>
                           <b>Cell cycle class</b>
                           <sup>c</sup>
                           <b>(1)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Histone proteins family</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>3.55</p>
                     </c>
                     <c ca="center">
                        <p>76</p>
                     </c>
                     <c ca="center">
                        <p>7.9</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>1.7</p>
                     </c>
                     <c ca="right">
                        <p>0.7</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11" ca="left">
                        <p>
                           <b>Immune and related functions class (9)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Interleukins</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>14.67</p>
                     </c>
                     <c ca="center">
                        <p>43</p>
                     </c>
                     <c ca="center">
                        <p>32.6</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>1.3</p>
                     </c>
                     <c ca="right">
                        <p>0.9</p>
                     </c>
                     <c ca="right">
                        <p>0.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Serine (or cysteine) proteinase inhibitor family</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>17.87</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>50</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>1.4</p>
                     </c>
                     <c ca="right">
                        <p>0.8</p>
                     </c>
                     <c ca="right">
                        <p>0.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tumor necrosis factor (ligand) superfamily</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>23.49</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>63.2</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>1.9</p>
                     </c>
                     <c ca="right">
                        <p>0.9</p>
                     </c>
                     <c ca="right">
                        <p>0.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CD antigens</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>26.96</p>
                     </c>
                     <c ca="center">
                        <p>54</p>
                     </c>
                     <c ca="center">
                        <p>46.3</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>1.9</p>
                     </c>
                     <c ca="right">
                        <p>1.3</p>
                     </c>
                     <c ca="right">
                        <p>0.3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Immunoglobulin heavy chains</p>
                     </c>
                     <c ca="center">
                        <p>Intrachromosomal</p>
                     </c>
                     <c ca="center">
                        <p>0.38</p>
                     </c>
                     <c ca="center">
                        <p>162</p>
                     </c>
                     <c ca="center">
                        <p>0.6</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Immunoglobulin kappa chains</p>
                     </c>
                     <c ca="center">
                        <p>Intrachromosomal</p>
                     </c>
                     <c ca="center">
                        <p>0.55</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>5.5</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.3</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Immunoglobulin lambda chains</p>
                     </c>
                     <c ca="center">
                        <p>Intrachromosomal</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Interleukin receptors family</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>29.77</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>59.4</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>3.4</p>
                     </c>
                     <c ca="right">
                        <p>0.9</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>T cell receptor beta chains</p>
                     </c>
                     <c ca="center">
                        <p>84 Intrachromosomal, 9 Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>0.42</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>9.6</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>0.8</p>
                     </c>
                     <c ca="right">
                        <p>0.3</p>
                     </c>
                     <c ca="right">
                        <p>0.1</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11" ca="left">
                        <p>
                           <b>Information class (6)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Homeo box</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>5.48</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>1.2</p>
                     </c>
                     <c ca="right">
                        <p>0.9</p>
                     </c>
                     <c ca="right">
                        <p>0.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Eukaryotic translation initiation factor</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>36.54</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>45.5</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>2.1</p>
                     </c>
                     <c ca="right">
                        <p>0.9</p>
                     </c>
                     <c ca="right">
                        <p>0.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Zinc finger protein family</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>30.33</p>
                     </c>
                     <c ca="center">
                        <p>200</p>
                     </c>
                     <c ca="center">
                        <p>42.5</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="right">
                        <p>2.2</p>
                     </c>
                     <c ca="right">
                        <p>1.1</p>
                     </c>
                     <c ca="right">
                        <p>0.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptides</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>43.44</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>62.5</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>1.5</p>
                     </c>
                     <c ca="right">
                        <p>0.6</p>
                     </c>
                     <c ca="right">
                        <p>0.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ribosomal protein genes</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>4.94</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>6.3</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mitochondrial ribosomal protein genes</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>16.92</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>1.8</p>
                     </c>
                     <c ca="right">
                        <p>0.9</p>
                     </c>
                     <c ca="right">
                        <p>0.1</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11" ca="left">
                        <p>
                           <b>Metabolism class (2)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cytochrome P450 superfamily</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>31.34</p>
                     </c>
                     <c ca="center">
                        <p>45</p>
                     </c>
                     <c ca="center">
                        <p>46.7</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>1.9</p>
                     </c>
                     <c ca="right">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>0.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Proteasome subunit genes</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>25.64</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>32.5</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>1.5</p>
                     </c>
                     <c ca="right">
                        <p>0.8</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11" ca="left">
                        <p>
                           <b>Signaling and Communication class (8)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>G protein-coupled receptor family</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>24.71</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>33.7</p>
                     </c>
                     <c ca="center">
                        <p>26</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="right">
                        <p>2.3</p>
                     </c>
                     <c ca="right">
                        <p>1.5</p>
                     </c>
                     <c ca="right">
                        <p>0.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tripartite motif-containing family</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>29.29</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>60</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>1.6</p>
                     </c>
                     <c ca="right">
                        <p>0.8</p>
                     </c>
                     <c ca="right">
                        <p>0.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Solute carrier family</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>59.19</p>
                     </c>
                     <c ca="center">
                        <p>223</p>
                     </c>
                     <c ca="center">
                        <p>62.8</p>
                     </c>
                     <c ca="center">
                        <p>134</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="right">
                        <p>2.9</p>
                     </c>
                     <c ca="right">
                        <p>1.5</p>
                     </c>
                     <c ca="right">
                        <p>0.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RAS oncogene family</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>39.92</p>
                     </c>
                     <c ca="center">
                        <p>60</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>1.8</p>
                     </c>
                     <c ca="right">
                        <p>0.7</p>
                     </c>
                     <c ca="right">
                        <p>0.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ATP-binding cassette transporters gene family</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>73.85</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>68.2</p>
                     </c>
                     <c ca="center">
                        <p>29</p>
                     </c>
                     <c ca="center">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>3.6</p>
                     </c>
                     <c ca="right">
                        <p>2.5</p>
                     </c>
                     <c ca="right">
                        <p>0.3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Guanine nucleotide binding protein (G protein) polypeptide genes</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>58.83</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>59.4</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="right">
                        <p>3.3</p>
                     </c>
                     <c ca="right">
                        <p>1.9</p>
                     </c>
                     <c ca="right">
                        <p>0.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Potassium voltage-gated channel genes</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>104.95</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>57.9</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="right">
                        <p>8.6</p>
                     </c>
                     <c ca="right">
                        <p>4.4</p>
                     </c>
                     <c ca="right">
                        <p>0.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Protein phosphatase subunit genes</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>65.62</p>
                     </c>
                     <c ca="center">
                        <p>59</p>
                     </c>
                     <c ca="center">
                        <p>57.6</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="right">
                        <p>2.9</p>
                     </c>
                     <c ca="right">
                        <p>1.8</p>
                     </c>
                     <c ca="right">
                        <p>0.3</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11" ca="left">
                        <p>
                           <b>Structure and motility class (1)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Collagen family</p>
                     </c>
                     <c ca="center">
                        <p>Dispersed</p>
                     </c>
                     <c ca="center">
                        <p>132.83</p>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                     <c ca="center">
                        <p>86.5</p>
                     </c>
                     <c ca="center">
                        <p>29</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="right">
                        <p>5.7</p>
                     </c>
                     <c ca="right">
                        <p>2.3</p>
                     </c>
                     <c ca="right">
                        <p>0.4</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>a</sup>: Chromosomal distribution of the members of gene families. 'Dispersed' indicates that members are distributed on different chromosomes.</p>
                  <p><sup>b</sup>: average gene length (in kb) for each gene family.</p>
                  <p><sup>c</sup>:Numbers in parentheses show the number of large sized gene families in each functional class.</p>
               </tblfn>
            </tbl>
            <p>The large <it>Collagen </it>gene family belonging to the class of Structure and motility had the highest proportion of genes containing repeats (86.5%). In order to analyze this further, we examined the sequence conservation of the region flanking 200 bases upstream and downstream in addition to the repeats by comparing the human sequence with the available genome sequence of Chimpanzee (<it>Pan troglodytes</it>), a nearest ancestor to human <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. We observed, that of the 268 sequence segments including repeats from human <it>Collagen </it>genes, 244 were conserved with greater than 92% identity in the chimpanzee. Of these 244 repeats in human <it>Collagen </it>genes, 73 repeats were identical in length, 142 repeats displayed length polymorphism in the chimpanzee, 27 repeats had point mutations and in 2 cases there were no repeats in the corresponding segments in the chimpanzee. These observations show that both human and chimpanzee <it>Collagen </it>genes have high repeat content, high conservation of position and flanking sequences of the repeats. However, majority of repeats exhibited length polymorphisms, which is consistent with their characteristic property <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The inverse relationship between the number of gene families and their corresponding sizes, resulting in a large number of small sized gene families, suggests that several duplicated copies may have been lost during the first round of genome duplication itself, considering the hypothesis of two rounds of genome duplication in vertebrate evolution <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. The non-uniform distribution of the number of gene families across the six functional classes suggests that widespread gene duplication across gene families spanning a wide range of functions may have been less productive in attaining higher levels of complexity. An alternate course involving large amount of duplications followed by divergence producing a wide range of functions in selected classes might have been favorable. The support for the latter hypothesis emerges from the fact that large sized gene families, inherently low in number, mainly belong to Immune and related functions (required to tackle a wide range of infections), Signaling and communication (required to respond to diverse environmental stimuli) and Information class (required to implement complex molecular processes through supramolecular assemblies or organelles). A few members of large sized gene families of Metabolism class function in bioenergetics and xenobiotic metabolism and of Cell cycle class function in packaging of nuclear DNA. Similarly the large <it>Collagen </it>gene family of Structure and motility class offers a useful repertoire for the formation of multiple tissues <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
         <p>It is apparent that short repeats are abundant in human genes and long repeats are rare. Our findings are consistent with the observations by Whittaker et al. (2003), who showed that longer repeats are more likely to contract than expand <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Accordingly, contraction of long repeats in time would result in accumulation of higher number of short repeats.</p>
         <p>Of the six functional classes, the Signaling and communication class was the richest in repeats including the proportion of genes with repeats and repeat densities. Many of the genes belonging to this class function at the interface between the body and its environment that appears to be a distinct feature of eukaryotes <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> to confer species-specific advantages <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B41">41</abbr></abbrgrp>. The positive association of (TG/CA)<sub>n </sub>repeats associated with genes of this class strongly argues for a positive temporal regulatory role that could provide for variations in gene expression to complement the enormous diversity characteristic of this class. Compared to this class, the anciently evolved gene families of Information and Cell cycle are poor in repeats. Considering the fact that these genes are highly conserved <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp> and are involved in implementing the molecular processes acting at the core of cellular physiology, these observations suggest that repeats are negatively associated with these genes to avoid unpredictable consequences for the normal functioning of the cell.</p>
         <p>Another argument in favor of these inferences stems from the linear relationship between the average gene length of gene families belonging to the respective functional classes and the proportion of genes with repeats in these classes. The average length of genes belonging to Information class was short and this factor aids in obtaining high levels of expression of these genes <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. This requirement, however, generates a space constraint to accommodate additional elements. This situation contrasts to that of genes of Signaling and communication class with higher average gene length offering more space for accommodating other regulatory elements. The analysis of <it>Collagen </it>gene family belonging to large sized families presents itself as an interesting case. Most of the members of this family have (TG/CA)<sub>n </sub>repeats. Sequence comparisons of repeat containing regions of human <it>Collagen </it>genes with the nearest ancestor to humans, the Chimpanzee, revealed that although there is high conservation in terms of content and position of repeats, majority of repeats were polymorphic, which is consistent with their characteristic property <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Among repeats that displayed polymorphism between human and Chimpanzee, nearly equal proportions of human repeats were either contracted or expanded in Chimpanzee. These results are also consistent with the Whittaker's model <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
         <p>Strikingly, the genes of Immune and related functions class are poor in (TG/CA)<sub>n </sub>repeats in general and in type III repeats in particular. A characteristic trend of this class is to have large sized families with their genes arranged juxtaposed on the same chromosomal locations. This arrangement increases the possibility of these gene families to display more uniform sequence characteristics <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Further, these genes have the smallest average gene length indicating a compact arrangement, which is likely to act as a space constraint in the accommodation of (TG/CA)<sub>n </sub>repeats. In addition, the negative association of type III (TG/CA)<sub>n </sub>repeats in these genes may have a directional role. The immunoglobulin genes use the 7 bp and 9 bp repeats for generation of variants through VDJ recombination <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Accommodation of type III (TG/CA)<sub>n </sub>repeats (n &#8805; 23) might introduce variations in this process and could result in loss of directional recombination essential to generate diversity in immunoglobulins and T cell receptor chains in an ordered manner.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The (TG/CA)<sub>n </sub>repeat distribution pattern observed in human gene families is consistent with Whittaker's model of repeat expansion and contraction. It appears that multiple factors including gene length, function and directionality of recombination processes steered the observed selective patterns of distribution of (TG/CA)<sub>n </sub>repeats in human gene families.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Sequence retrieval and mapping of (TG/CA)<sub>n </sub>repeats</p>
            </st>
            <p>Sequences of 35,114 human genes (build number 33) were retrieved from LocusLink <url>http://www.ncbi.nlm.nih.gov/LocusLink/</url><abbrgrp><abbr bid="B43">43</abbr></abbrgrp> using a JavaScript program. A sum of 192 genes could not be retrieved because of either inaccessibility to the LocusLink page or absence of the link for retrieving the gene sequence. A gene in this analysis is considered as the nucleotide sequence from the start of first exon to the end of last exon. If alternate splicing was reported, the gene length considered was the start of first exon to the last known exon including all alternatively spliced products for that gene.</p>
            <p>Perl scripts, '<it>SimRep</it>' and '<it>RepGene</it>' were written for the identification and mapping of perfect intragenic (TG/CA)<sub>n </sub>repeats of length n &#8805; 6 units in genes <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Throughout this work we have used n &#8805; 6 units as the minimum cut-off to identify (TG/CA)<sub>n </sub>repeats. All repeats were scored in the intragenic region (exons and introns only).</p>
         </sec>
         <sec>
            <st>
               <p>Categorization of (TG/CA)<sub>n </sub>repeats</p>
            </st>
            <p>We grouped (TG/CA)<sub>n </sub>repeats into three categories (types I, II and III), according to their length and biological properties. Type I (TG/CA)<sub>n </sub>repeats, in the range 6 &#8804; n &lt;12 units, are short repeats based on the observation that a repeat length of 8 units (n = 8) is minimum to be likely polymorphic <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>. Type II (TG/CA)<sub>n </sub>repeats comprise of 12 &#8804; n &lt; 23 units and is based on the observation that more than 93% of the (CA)<sub>n </sub>repeats of n &#8805; 12 units are polymorphic <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Further, repeats of this length have also been shown to have preferential binding to nuclear factors compared to short repeats <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> and can also stimulate mRNA splicing <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. Type III repeats consist of relatively long reiterations of (TG/CA)<sub>n </sub>(n &#8805; 23 units) and have propensity to adopt structures such as Z DNA <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B37">37</abbr></abbrgrp>. Other studies have shown that (TG/CA)<sub>n </sub>repeats of length greater than 22.5 units can stimulate recombination <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Clustering of genes into gene families</p>
            </st>
            <p>Functional roles of a large number of human genes are not well known. Presently, these genes are assigned hypothetical annotations. Genes labeled as 'LOC', 'DFKZP', 'FLJ', 'HSPC', 'HSU', 'HT', 'KIAA', 'ORF', 'hypothetical', 'PRO' and 'pseudogenes' without clear functional details were filtered out. A total of 22,688 genes were removed in this filtering exercise. Out of the remaining 12,426 genes, a total of 8,778 genes (25% of total) were clustered into gene families based on their gene root symbols as defined in the guidelines of Human Gene Nomenclature Committee (2002) <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. The remaining 3,648 genes could not be clustered into gene families and are solitary.</p>
            <p>The HGNC guidelines consider sequence and functional similarity of proteins encoded by genes while grouping them into gene families <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. A root symbol signifies a gene family. The family members are designated by Arabic numerals placed immediately after the gene root symbol, for example <it>GPR1</it>, <it>GPR2</it>, <it>GPR3 </it>for genes of the G protein-coupled receptor family. A Perl script namely <it>Clustergene </it>was written to cluster 8,778 human genes into 1,556 gene families. The Perl script called <it>ChromoCluster </it>was written to report gene families located on the same chromosome. Subsequently these gene families were classified into the six functional classes as described below.</p>
         </sec>
         <sec>
            <st>
               <p>Functional Classification of gene families for comparative analysis</p>
            </st>
            <p>The gene families were classified into six functional classes namely, 'Information', 'Cell cycle', 'Metabolism', 'Signaling and communication', 'Immune and related functions' and 'Structure and motility' based on the scheme defined by Adams et al. <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. We combined the functional classes of replication, transcription, RNA processing and translation into 'Information' class based on Andrade et al. <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>.</p>
            <p>'Cell cycle' includes cell cycle, apoptosis, chromosomal structure and DNA repair; 'Immune and related functions' includes immunology, homeostasis, carrier proteins/membrane transport and stress response; 'Information' includes protein synthesis, translation factors, ribosomal proteins, post-translational modification/targeting, protein degradation, tRNA synthesis/metabolism, RNA synthesis, transcription factors, RNA polymerase, RNA processing, RNA degradation, DNA synthesis/replication and DNA repair; 'Metabolism' includes amino acids, nucleotides, sugars, lipids, cofactors, protein modification, energy and carrier proteins/membrane transport; 'Signaling and communication' includes receptors, hormone/growth factors, intracellular transducers, effectors/modulators, metabolism, cell adhesion and channels/transport proteins; 'Structure and motility' includes cytoskeletal, microtubule-associated proteins/motors and extracellular matrix.</p>
            <p>Assignment of gene families to each of the functional classes was carried out according to their annotations in Gene Ontology <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> and LocusLink <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> databases. Out of the total 1,556 gene families, 1,317 could be classified into any of the six functional classes. The remaining 239 families could not be classified unambiguously due to limited information on gene function. Subsequent analysis, with respect to functional classification and distribution of (TG/CA)<sub>n </sub>repeats, presented here is from 1,317 gene families comprising of 7,928 genes.</p>
         </sec>
         <sec>
            <st>
               <p>Alignment of human (TG/CA)<sub>n </sub>repeats and flanking sequences with Chimpanzee genome sequence</p>
            </st>
            <p>The repeats present in human <it>Collagen </it>genes were aligned with Chimpanzee (<it>Pan troglodytes</it>) genome by using 'BLAT' software available at UCSC Genome Bioinformatics Site <url>http://www.genome.ucsc.edu/cgi-bin/hgBlat</url><abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. Nucleotide segments including the repeats and containing 200 nucleotides upstream of the start and 200 nucleotides downstream from the end of each of the (TG/CA)<sub>n </sub>repeat were extracted for human <it>Collagen </it>genes <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. These segments were aligned with the Chimpanzee genome (Build 1, version 1, Nov 2003) using BLAT. Only those segments that showed more than 92% identity were noted as conserved.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical methods</p>
            </st>
            <p>Significance of the differences between the proportions of genes containing repeats and repeats densities in the six functional classes compared with global distribution was tested using Binomial proportions test. The observed proportion in each class was tested against the expected proportion, which was computed assuming no preference with respect to function. Correlation coefficient (R) was computed to examine the relationship between average gene length of gene families belonging to a functional class and the proportion of genes with (TG/CA)<sub>n </sub>repeats in the corresponding functional classes. The 'Interactive Statistical Calculation Pages' website <url>http://members.aol.com/johnp71/javastat.html</url> was used to perform the statistical tests.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>VKS conceived of the idea, developed algorithms in Perl, carried out the analysis and wrote the manuscript. SKB gave scientific suggestions for improving the quality of the work and participated in manuscript preparation. SR is the group leader, gave scientific suggestions, helped in the statistical analysis, critical examination, presentation, writing and manuscript preparation.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>VKS is a recipient of Senior Research Fellowship from CSIR. We thank Pankaj Bhatnagar for help in writing programs and the anonymous reviewers for their insightful comments. SKB and SR thank CSIR for funding support in the form of a grant (CMM0017) Task Force on "In Silico Biology for Drug target development".</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions</p>
            </title>
            <aug>
               <au>
                  <snm>Meyer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schart</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Current Opinion in Cell Biology</source>
            <pubdate>1999</pubdate>
            <volume>11</volume>
            <fpage>699</fpage>
            <lpage>704</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0955-0674(99)00039-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">10600714</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Selection and gene duplication: a view from the genome</p>
            </title>
            <aug>
               <au>
                  <snm>Wagner</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <issue>5</issue>
            <note>reviews1012</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2002-3-5-reviews1012</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>A Dictionary of Genetics</p>
            </title>
            <aug>
               <au>
                  <snm>King</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Stansfield</snm>
                  <fnm>WD</fnm>
               </au>
            </aug>
            <publisher>Oxford University Press</publisher>
            <pubdate>1990</pubdate>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Guidelines for human gene nomenclature</p>
            </title>
            <aug>
               <au>
                  <snm>Wain</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Bruford</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Lovering</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Lush</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Wright</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Povey</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2002</pubdate>
            <volume>79</volume>
            <issue>4</issue>
            <fpage>464</fpage>
            <lpage>470</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/geno.2002.6748</pubid>
                  <pubid idtype="pmpid" link="fulltext">11944974</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Initial sequencing and analysis of the human genome</p>
            </title>
            <aug>
               <au>
                  <cnm>International Human Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>860</fpage>
            <lpage>921</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35057062</pubid>
                  <pubid idtype="pmpid" link="fulltext">11237011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A comprehensive genetic map of the human genome based on 5,264 microsatellites</p>
            </title>
            <aug>
               <au>
                  <snm>Dib</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Faure</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fizames</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Samson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Drouot</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Vignal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Millasseau</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Marc</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hazan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Seboun</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lathrop</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gyapay</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Morissette</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Weissenbach</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1996</pubdate>
            <volume>380</volume>
            <fpage>152</fpage>
            <lpage>4</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/380152a0</pubid>
                  <pubid idtype="pmpid">8600387</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Simple repetitive sequences in the genome: structure and functional significance</p>
            </title>
            <aug>
               <au>
                  <snm>Brahmachari</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Meera</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sarkar</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Balagurumoorthy</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tripathi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Raghavan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shaligram</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Pataskar</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Electrophoresis</source>
            <pubdate>1995</pubdate>
            <volume>16</volume>
            <issue>9</issue>
            <fpage>1705</fpage>
            <lpage>14</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/elps.11501601283</pubid>
                  <pubid idtype="pmpid">8582360</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The sequence (dC-dA)n X (dG-dT)n forms left-handed Z-DNA in negatively supercoiled plasmids</p>
            </title>
            <aug>
               <au>
                  <snm>Nordheim</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rich</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci</source>
            <pubdate>1983</pubdate>
            <volume>80</volume>
            <fpage>1821</fpage>
            <lpage>1825</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">393701</pubid>
                  <pubid idtype="pmpid" link="fulltext">6572943</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Zintrons in rat &#945;-lactalbuman gene</p>
            </title>
            <aug>
               <au>
                  <snm>Meera</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ramesh</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Brahmachari</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>1989</pubdate>
            <volume>251</volume>
            <fpage>245</fpage>
            <lpage>249</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0014-5793(89)81463-2</pubid>
                  <pubid idtype="pmpid">2753162</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Distribution of simple repetitive (TG/CA)n and (CT/AG)n sequences in human and rodent genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Tripathi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Brahmachari</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>J Biomol Struct Dyn</source>
            <pubdate>1991</pubdate>
            <volume>9</volume>
            <issue>2</issue>
            <fpage>387</fpage>
            <lpage>97</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1741969</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>d(TG)n.d(CA)n sequences upstream of the rat prolactin gene form Z-DNA and inhibit gene transcription</p>
            </title>
            <aug>
               <au>
                  <snm>Naylor</snm>
                  <fnm>LH</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1990</pubdate>
            <volume>18</volume>
            <fpage>1595</fpage>
            <lpage>1601</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">330531</pubid>
                  <pubid idtype="pmpid">2158081</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Shortened microsatellite d(CA)21 sequence down-regulates promoter activity of matrix metalloproteinase 9 gene</p>
            </title>
            <aug>
               <au>
                  <snm>Shimajiri</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Arima</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Tanimoto</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Murata</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hamada</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Sasaguri</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>1999</pubdate>
            <volume>455</volume>
            <fpage>70</fpage>
            <lpage>4</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0014-5793(99)00863-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">10428474</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>In vitro production of IFN-gamma correlates with CA repeat polymorphism in the human IFN-gamma gene</p>
            </title>
            <aug>
               <au>
                  <snm>Pravica</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Asderakis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Perrey</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hajeer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sinnott</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Hutchinson</snm>
                  <fnm>IV</fnm>
               </au>
            </aug>
            <source>Eur J Immunogenet</source>
            <pubdate>1999</pubdate>
            <volume>26</volume>
            <fpage>1</fpage>
            <lpage>3</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2370.1999.00122.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">10068907</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Modulation of epidermal growth factor receptor gene transcription by a polymorphic dinucleotide repeat in intron 1</p>
            </title>
            <aug>
               <au>
                  <snm>Gebhardt</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Zanker</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Brandt</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1999</pubdate>
            <volume>274</volume>
            <fpage>13176</fpage>
            <lpage>13180</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.274.19.13176</pubid>
                  <pubid idtype="pmpid" link="fulltext">10224073</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>CA-Repeat polymorphism in intron 1 of <it>HSD11B2</it>: effects on gene expression and salt sensitivity</p>
            </title>
            <aug>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Giacchetti</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lavery</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Nikkila</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Palermo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ricketts</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>McTernan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bianchi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Manunta</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Strazzullo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mantero</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>PM</fnm>
               </au>
            </aug>
            <source>Hypertension</source>
            <pubdate>2000</pubdate>
            <volume>36</volume>
            <fpage>187</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10948076</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Microsatellite variation associated with prolactin expression and growth of salt-challenged tilapia</p>
            </title>
            <aug>
               <au>
                  <snm>Streelman</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Kocher</snm>
                  <fnm>TD</fnm>
               </au>
            </aug>
            <source>Physiol Genomics</source>
            <pubdate>2002</pubdate>
            <volume>9</volume>
            <fpage>1</fpage>
            <lpage>4</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11948285</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>(TG/CA)<sub>n </sub>repeats in human housekeeping genes</p>
            </title>
            <aug>
               <au>
                  <snm>Sharma</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>B-Rao</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sharma</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brahmachari</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Ramachandran</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Biomol Struct Dyn</source>
            <pubdate>2003</pubdate>
            <volume>21</volume>
            <issue>2</issue>
            <fpage>303</fpage>
            <lpage>10</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12956614</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The preference for GT-rich DNA by the yeast Rad51 protein defines a set of universal pairing sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Tracy</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Baumohl</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Kowalczykowski</snm>
                  <fnm>SC</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>1997</pubdate>
            <volume>11</volume>
            <issue>24</issue>
            <fpage>3423</fpage>
            <lpage>31</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">316816</pubid>
                  <pubid idtype="pmpid" link="fulltext">9407034</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>(GT)<sub>n </sub>repetitive tracts affect several stages of RecA-promoted recombination</p>
            </title>
            <aug>
               <au>
                  <snm>Dutreix</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>273</volume>
            <issue>1</issue>
            <fpage>105</fpage>
            <lpage>13</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.1293</pubid>
                  <pubid idtype="pmpid" link="fulltext">9367750</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>GT Repeats are associated with recombination on human chromosome 22</p>
            </title>
            <aug>
               <au>
                  <snm>Majewski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ott</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <issue>8</issue>
            <fpage>1108</fpage>
            <lpage>14</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310935</pubid>
                  <pubid idtype="pmpid" link="fulltext">10958629</pubid>
                  <pubid idtype="doi">10.1101/gr.10.8.1108</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A polymorphic GT repeat from the human cardiac Na <sup>+ </sup>Ca2<sup>+ </sup>exchanger intron 2 activates splicing</p>
            </title>
            <aug>
               <au>
                  <snm>Gabellini</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Eur J Biochem</source>
            <pubdate>2001</pubdate>
            <volume>268</volume>
            <issue>4</issue>
            <fpage>1076</fpage>
            <lpage>83</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1432-1327.2001.01974.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">11179974</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>HnRNP L stimulates splicing of the eNOS gene by binding to variable-length CA repeats</p>
            </title>
            <aug>
               <au>
                  <snm>Hui</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stangl</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lane</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>Bindereif</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <issue>1</issue>
            <fpage>33</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsb875</pubid>
                  <pubid idtype="pmpid" link="fulltext">12447348</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Evolution of cellular DNA content in teleost fishes</p>
            </title>
            <aug>
               <au>
                  <snm>Hinegardner</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Am Nat</source>
            <pubdate>1968</pubdate>
            <volume>102</volume>
            <fpage>517</fpage>
            <lpage>523</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1086/282564</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Selection in the evolution of gene duplications</p>
            </title>
            <aug>
               <au>
                  <snm>Fyodor</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <issue>2</issue>
            <note>RESEARCH0008</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2002-3-2-research0008</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Collagens &#8211; structure, function, and biosynthesis</p>
            </title>
            <aug>
               <au>
                  <snm>Gelse</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Poschl</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Aigner</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Adv Drug Deliv Rev</source>
            <pubdate>2003</pubdate>
            <volume>55</volume>
            <issue>12</issue>
            <fpage>1531</fpage>
            <lpage>46</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.addr.2003.08.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">14623400</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Conserved noncoding sequences are reliable guides to regulatory elements</p>
            </title>
            <aug>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>369</fpage>
            <lpage>372</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02081-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">10973062</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Embryonic epsilon and gamma globin genes of a prosimian primate (<it>Galago crassicaudatus</it>). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints</p>
            </title>
            <aug>
               <au>
                  <snm>Tagle</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Koop</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Goodman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Slightom</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Hess</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>RT</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1988</pubdate>
            <volume>203</volume>
            <fpage>439</fpage>
            <lpage>455</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(88)90011-3</pubid>
                  <pubid idtype="pmpid">3199442</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Genomic evidence for two functionally distinct gene classes</p>
            </title>
            <aug>
               <au>
                  <snm>Rivera</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Jain</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Lake</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>6239</fpage>
            <lpage>6244</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">27643</pubid>
                  <pubid idtype="pmpid" link="fulltext">9600949</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.11.6239</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Selection for short introns in highly expressed genes</p>
            </title>
            <aug>
               <au>
                  <snm>Castillo-Davis</snm>
                  <fnm>CI</fnm>
               </au>
               <au>
                  <snm>Mekhedov</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>FA</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>31</volume>
            <fpage>415</fpage>
            <lpage>418</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12134150</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Functional Conservation between the Human, Nematode, and Yeast CK2 Cell Cycle Genes</p>
            </title>
            <aug>
               <au>
                  <snm>Dotan</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Ziv</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dafni</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Beckman</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>McCann</snm>
                  <fnm>RO</fnm>
               </au>
               <au>
                  <snm>Glover</snm>
                  <fnm>CV</fnm>
               </au>
               <au>
                  <snm>Canaani</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Biochem Biophys Res Commun</source>
            <pubdate>2001</pubdate>
            <volume>288</volume>
            <fpage>603</fpage>
            <lpage>609</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/bbrc.2001.5804</pubid>
                  <pubid idtype="pmpid" link="fulltext">11676486</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Evolution of genetic redundancy for advanced players</p>
            </title>
            <aug>
               <au>
                  <snm>Dover</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>1993</pubdate>
            <volume>3</volume>
            <issue>6</issue>
            <fpage>902</fpage>
            <lpage>10</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0959-437X(93)90012-E</pubid>
                  <pubid idtype="pmpid">8118216</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Structure of collagen</p>
            </title>
            <aug>
               <au>
                  <snm>Ramachandran</snm>
                  <fnm>GN</fnm>
               </au>
               <au>
                  <snm>Sasisekharan</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1961</pubdate>
            <volume>190</volume>
            <fpage>1004</fpage>
            <lpage>5</lpage>
            <xrefbib>
               <pubid idtype="pmpid">13739287</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>VDJ recombination</p>
            </title>
            <aug>
               <au>
                  <snm>Alt</snm>
                  <fnm>FW</fnm>
               </au>
               <au>
                  <snm>Oltz</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gorman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Taccioli</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Immunol Today</source>
            <pubdate>1992</pubdate>
            <volume>13</volume>
            <issue>8</issue>
            <fpage>306</fpage>
            <lpage>14</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0167-5699(92)90043-7</pubid>
                  <pubid idtype="pmpid">1510813</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Computerized polymorphic marker identification: experimental validation and a predicted human polymorphism catalog</p>
            </title>
            <aug>
               <au>
                  <snm>Fondon</snm>
                  <fnm>JW</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Mele</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Brezinschek</snm>
                  <fnm>RI</fnm>
               </au>
               <au>
                  <snm>Cummings</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Pande</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wren</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>O'Brien</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Kupfer</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Lerman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Minna</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Garner</snm>
                  <fnm>HR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>7514</fpage>
            <lpage>7519</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">22669</pubid>
                  <pubid idtype="pmpid" link="fulltext">9636181</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.13.7514</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Abundant raw material for <it>Cis </it>&#8211; regulatory evolution in humans</p>
            </title>
            <aug>
               <au>
                  <snm>Rockman</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Wray</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>1991</fpage>
            <lpage>2004</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12411608</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>On simple repetitive DNA sequences and complex diseases</p>
            </title>
            <aug>
               <au>
                  <snm>Epplen</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Santos</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Maueler</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>van Helden</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Epplen</snm>
                  <fnm>JT</fnm>
               </au>
            </aug>
            <source>Electrophoresis</source>
            <pubdate>1997</pubdate>
            <volume>18</volume>
            <issue>9</issue>
            <fpage>1577</fpage>
            <lpage>85</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/elps.1150180916</pubid>
                  <pubid idtype="pmpid">9378125</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Paranemic structures of DNA and their role in DNA unwinding</p>
            </title>
            <aug>
               <au>
                  <snm>Yagil</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Crit Rev Biochem Mol Biol</source>
            <pubdate>1991</pubdate>
            <volume>26</volume>
            <issue>5</issue>
            <fpage>475</fpage>
            <lpage>559</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1662125</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>P450 superfamily: update on new sequences, gene mapping, accession numbers and nomenclature</p>
            </title>
            <aug>
               <au>
                  <snm>Nelson</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Koymans</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kamataki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Stegeman</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Feyereisen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Waxman</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Waterman</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Gotoh</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Coon</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Estabrook</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Gunsalus</snm>
                  <fnm>IC</fnm>
               </au>
               <au>
                  <snm>Nebert</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>Pharmacogenetics</source>
            <pubdate>1996</pubdate>
            <volume>6</volume>
            <fpage>1</fpage>
            <lpage>42</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8845856</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>The UDP glycosyltransferase gene superfamily: recommended nomenclature update based on evolutionary divergence</p>
            </title>
            <aug>
               <au>
                  <snm>Mackenzie</snm>
                  <fnm>PI</fnm>
               </au>
               <au>
                  <snm>Owens</snm>
                  <fnm>IS</fnm>
               </au>
               <au>
                  <snm>Burchell</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bock</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Belanger</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fournel-Gigleux</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hum</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Iyanagi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Louisot</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Magdalou</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chowdhury</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Ritter</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Schachter</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tephly</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Tipton</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Nebert</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>Pharmacogenetics</source>
            <pubdate>1997</pubdate>
            <volume>7</volume>
            <fpage>255</fpage>
            <lpage>69</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9295054</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Fuldner</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Kirkness</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>KG</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Brandon</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Chiu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Cline</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Cotton</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Earle-Hughes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fine</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>FitzGerald</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>FitzHugh</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Fritchman</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Geoghagen</snm>
                  <fnm>NSM</fnm>
               </au>
               <au>
                  <snm>Glodek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gnehm</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Hanna</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Hedblom</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hinkle</snm>
                  <fnm>PS</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Klimek</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Marmaros</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Merrick</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Moreno-Palanques</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Pellegrino</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Ryder</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Saudek</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Shirley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Small</snm>
                  <fnm>KV</fnm>
               </au>
               <au>
                  <snm>Spriggs</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Utterbach</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Weidman</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Barthlow</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bednarik</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Cao</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cepeda</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Coleman</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dimke</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ferrie</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fischer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hastings</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Huddleston</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Greene</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Gruber</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hudson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kozak</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Kunsch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ji</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Meissner</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Raymond</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Wing</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ruben</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Dillon</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Fannon</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Rosen</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Haseltine</snm>
                  <fnm>WA</fnm>
               </au>
               <au>
                  <snm>Fields</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1995</pubdate>
            <volume>377</volume>
            <fpage>3</fpage>
            <lpage>174</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7566098</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Functional classes in the three domains of life</p>
            </title>
            <aug>
               <au>
                  <snm>Andrade</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Ouzounis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tamames</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Valencia</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1999</pubdate>
            <volume>49</volume>
            <fpage>551</fpage>
            <lpage>557</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10552036</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>The Gene Ontology (GO) database and informatics resource</p>
            </title>
            <aug>
               <au>
                  <snm>Harris</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ireland</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lomax</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Foulger</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eilbeck</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Richter</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dolan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Drabkin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Eppig</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Ni</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ringwald</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Balakrishnan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Christie</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Costanzo</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Dwight</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Engel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fisk</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Hirschman</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Hong</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Nash</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Sethuraman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Theesfeld</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Feierbach</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Berardini</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Mundodi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rhee</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Camon</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dimmer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Chisholm</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gaudet</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kibbe</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Kishore</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schwarz</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Sternberg</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gwinn</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hannick</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Berriman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wood</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>de la Cruz</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Tonellato</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Jaiswal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Seigfried</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <cnm>Gene Ontology Consortium</cnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <fpage>D258</fpage>
            <lpage>61</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308770</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681407</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Database resources of the National Center for Biotechnology Information: update</p>
            </title>
            <aug>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Edgar</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Federhen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Helmberg</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Pontius</snm>
                  <fnm>JU</fnm>
               </au>
               <au>
                  <snm>Schuler</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Schriml</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Sequeira</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Suzek</snm>
                  <fnm>TO</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Wagner</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>Database</issue>
            <fpage>D35</fpage>
            <lpage>40</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308807</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681353</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh073</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Duplicate genes increase gene expression diversity within and between species</p>
            </title>
            <aug>
               <au>
                  <snm>Gu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Rifkin</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Wen-Hsiung</fnm>
               </au>
            </aug>
            <source>Nature Genetics</source>
            <pubdate>2004</pubdate>
            <volume>36</volume>
            <fpage>577</fpage>
            <lpage>579</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1355</pubid>
                  <pubid idtype="pmpid" link="fulltext">15122255</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Gene family evolution and homology: Genomics Meets Phylogenetics</p>
            </title>
            <aug>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>DeSalle</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Annu Rev Genomics Hum Genet</source>
            <pubdate>2000</pubdate>
            <volume>1</volume>
            <fpage>41</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genom.1.1.41</pubid>
                  <pubid idtype="pmpid" link="fulltext">11701624</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Geneticists study chimp-human divergence</p>
            </title>
            <aug>
               <au>
                  <snm>Check</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>428</volume>
            <issue>6980</issue>
            <fpage>242</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15029156</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Likelihood-based estimation of microsatellite mutation rates</p>
            </title>
            <aug>
               <au>
                  <snm>Whittaker</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Harbord</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Boxall</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mackay</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Dawson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sibly</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2003</pubdate>
            <volume>164</volume>
            <fpage>781</fpage>
            <lpage>787</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12807796</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Microsatellite mutation models: insights from a comparison of humans and chimpanzees</p>
            </title>
            <aug>
               <au>
                  <snm>Sainudiin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Durrett</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Aquadro</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2004</pubdate>
            <volume>168</volume>
            <issue>1</issue>
            <fpage>383</fpage>
            <lpage>95</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1534/genetics.103.022665</pubid>
                  <pubid idtype="pmpid" link="fulltext">15454551</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>The UCSC Genome Browser Database</p>
            </title>
            <aug>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>YT</fnm>
               </au>
               <au>
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sugnet</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>University of California Santa</snm>
                  <fnm>Cruz</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>1</issue>
            <fpage>51</fpage>
            <lpage>4</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165576</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519945</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg129</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
