<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-393</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>A probabilistic classifier for olfactory receptor pseudogenes</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Menashe</snm>
               <fnm>Idan</fnm>
               <insr iid="I1"/>
               <email>idan.menashe@weizmann.ac.il</email>
            </au>
            <au id="A2">
               <snm>Aloni</snm>
               <fnm>Ronny</fnm>
               <insr iid="I1"/>
               <email>ronny.aloni@weizmann.ac.il</email>
            </au>
            <au id="A3">
               <snm>Lancet</snm>
               <fnm>Doron</fnm>
               <insr iid="I1"/>
               <email>doron.lancet@weizmann.ac.il</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Molecular Genetics and the Crown Human Genome Center, The Weizmann Institute of Science, Rehovot 76100, Israel</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>393</fpage>
         <url>http://www.biomedcentral.com/1471-2105/7/393</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16939646</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-393</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>20</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>29</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>29</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Menashe et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Olfactory receptors (ORs), the largest mammalian gene superfamily (900&#8211;1400 genes), has >50% pseudogenes in humans. While most of these inactive genes are identified via coding frame (nonsense) disruptions, seemingly intact genes may also be inactive due to other deleterious (missense) mutations. An ultimate assessment of the actual size of the functional human OR repertoire thus requires an accurate distinction between genes and pseudogenes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>To characterize inactive ORs with intact open reading frame, we have developed a probabilistic Classifier for Olfactory Receptor Pseudogenes (CORP). This algorithm is based on deviations from a functionally crucial consensus, constituting sixty highly conserved positions identified by a comparison of two evolutionarily-constrained OR repertoires (mouse and dog) with a small pseudogene fraction. We used a logistic regression analysis to assign appropriate coefficients to the conserved position and thus achieving maximal separation between active and inactive ORs. Consequently, the algorithms identified only 5% of the mouse functional ORs as pseudogenes, setting an upper limit of 0.05 to the false positive detection. Finally we used this algorithm to classify the 384 purportedly intact human OR genes. Of these, 135 were predicted as likely encoding non-functional proteins, and 38 were segregating between active and inactive forms due to missense polymorphisms.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We demonstrated that the CORP algorithm is capable to distinguish between functional and non-functional OR genes with high precision even when the encoded protein would differ by a single amino acid. Using the CORP algorithm, we predict that ~70% of human OR genes are likely non-functional pseudogenes, a much higher number than hitherto suspected. The method we present may be employed for better annotation of inactive members in other gene families as well.</p>
               <p>CORP algorithm is available at: <url>http://bioportal.weizmann.ac.il/HORDE/CORP/</url></p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Pseudogenes, non-functional gene relics, are highly abundant genome-wide, with an estimated count of at least ~20,000 in the human genome <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. A majority of these (~70%), are processed pseudogenes generated by reverse transcription of mRNAs followed by random genomic integration and thus, resulting in promoter region loss. The remainder non-processed pseudogenes are the result of gene duplication followed by mutational inactivation of one of the redundant copies. Pseudogenes are typically under represented and poorly annotated in the public genome databases. This is because they are considered less interesting, are less easily detected by gene finding programs due to short open reading frames and because some are minimally disrupted and are mistaken for intact genes. Although, pseudogenes do not have evident molecular function, accurate annotation of these interesting genomic loci is valuable for many evolutionary and genetic studies. Consequently new methods are needed for better pseudogenes classification.</p>
         <p>Olfaction, the sense of smell, is a versatile and sensitive mechanism for detecting volatile odorous molecules (odorants). Many organisms rely on olfactory cues for a wide range of activities such as food acquisition, reproduction, migration and predator alarming. Accordingly, the olfactory system is capable of detecting and discriminating thousands of low molecular mass compounds. The remarkable sensitivity and specificity of the olfactory system is mediated by olfactory receptors (ORs) <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp> that are expresses in the olfactory epithelium. OR genes constitute the largest gene superfamily in the mammalian genome, comprising 900&#8211;1400 genes <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. This remarkable repertoire, which is essential for chemosensory acuity has evolved through genomic duplications followed by sequence diversification. In the course of recent primate evolution (~10&#8211;20 million years) considerable loss of OR genes has taken place, probably as a result of relaxed selective pressure, as species became less dependent on olfactory cues <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. As a result, more than a half of the human ORs are currently annotated as non-processed pseudogenes, containing 1&#8211;20 frame-disrupting mutations <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr></abbrgrp>. It is however likely that some seemingly intact genes are in fact non-functional due to missense deleterious mutations emerging due to the same evolutionary process. In this article OR pseudogenes denote such cases that disallow the production of a functional protein, irrespective of transcription status. Some OR pseudogenes may have a stable transcript <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>but in other cases their mRNA may undergo nonsense-mediated decay <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
         <p>Precise characterization of these inactive OR genes is essential for various genetic and functional studies <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. To this end, we have developed a probabilistic Classifier for Olfactory Receptor Pseudogenes (CORP), which quantitatively assesses the probability of an OR gene to be inactive based on the deviation of its inferred protein sequence from a functionally crucial consensus. CORP demonstrated remarkable discrimination, and predicts that >1/3 of the human ORs hitherto considered intact are likely non-functional, some of which show intact-pseudogene segregation in the human population.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>CORP is a probabilistic algorithm that distinguishes between functional and non-functional OR genes (<supplr sid="S1">Additional file 1</supplr>). It is based on the notion that functionally crucial residues are subject to strong selective constraints in active genes, and hence are highly conserved evolutionary. Once the constraint is removed, deleterious mutations are accumulated at these critical positions by neutral drift. Thus, the extent of deviation from a consensus sequence might be a good indicator of the functional status of an OR gene. The algorithm encompasses three consecutive modules: (i) construction of a conservation matrix for constrained OR positions in mouse and dog intact OR learning set, based on the SIFT algorithm <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>; (ii) modifying the conservation matrix by ascribing enhanced weights to position that better distinguish human OR pseudogenes from intact ORs in the other mammals; and (iii) in a test set of intact human genes, computing the cumulative deviation from the weighted conservation matrix, thus assessing the probability of these OR genes to be non-functional.</p>
         <suppl id="S1">
            <title>
               <p>Additional file 1</p>
            </title>
            <text>
               <p>The CORP algorithm webpage. Most recent version avaliable at <url>http://bioportal.weizmann.ac.il/HORDE/CORP/</url></p>
            </text>
            <file name="1471-2105-7-393-S1.htm">
               <p>Click here for file</p>
            </file>
         </suppl>
         <sec>
            <st>
               <p>Identifying putative functionally crucial residues in OR genes</p>
            </st>
            <p>For the characterization of the functionally crucial consensus we assumed: (i) that all ORs have a common ancestral origin <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B10">10</abbr></abbrgrp>and hence the most conserved residues in a multiple alignment comparison would be those that are subjected to the strongest functional constraints; (ii) that residue conservation likely represents functional or structural aspects such as signal transduction processes or disulphide bridges shared by all ORs. Accordingly, we analyzed OR genes of dog and mouse, two macrosmatic mammals that still heavily rely on their sense of smell for survival. Intact OR genes in these species are likely to be under strong evolutionary pressure which would tend to eliminate deleterious mutations. We identified residues that were significantly more conserved than expected (<it>P </it>&lt; 0.05) both in orthologs and in paralogs (Fig. <figr fid="F1">1</figr>). This was aimed to eliminate positions conserved only in orthologous pairs and not among paralogs, which are candidate contact residues for odorant ligands <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. Also eliminated are residues that are highly conserved in only one of the species (species-specific conservation).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Positional conservation along OR sequences</p>
               </caption>
               <text>
                  <p><b>Positional conservation along OR sequences</b>. Positions above the horizontal line are such that exceed the conservation score threshold (<it>P </it>&lt; 0.05 of chi square and FDR correction) for orthologous pairs (75 positions, Pink) and for paralogous pairs (96 positions, Blue). The overlapping conservation core set includes 65 positions. The conservation scores are normalized per the statistical significance threshold for each of the two sets.</p>
               </text>
               <graphic file="1471-2105-7-393-1"/>
            </fig>
            <p>Based on this analysis (Fig. <figr fid="F1">1</figr>), 65 positions were found to be highly conserved both among orthologs and paralogs, hence constructing a conservation core which might be related to functionalities shared by all or most ORs. Inspection of the relative position of these conserved residues along the inferred transmembrane helix topology of the OR protein <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> revealed a trend towards an intracellular localization (Fig. <figr fid="F2">2</figr>). This finding supports other studies indicating higher amino acid conservation in the intracellular portion of OR genes <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B21">21</abbr></abbrgrp>, related to interaction with downstream transduction molecules.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Localization of conservation core residues in the framework of a 2-dimensional OR diagram</p>
               </caption>
               <text>
                  <p><b>Localization of conservation core residues in the framework of a 2-dimensional OR diagram</b>. The inferred locations 65 conservation core residues are shown in blue. For reference the 22 highly variable putative OR complementarity determining residues (CDRs) [19] are also show in red.</p>
               </text>
               <graphic file="1471-2105-7-393-2"/>
            </fig>
            <p>To distinguish between tolerant and deleterious amino acids in the conservation core, we employed the SIFT algorithm <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, a sequence homology-based tool that predicts whether an amino acid substitution would have a phenotypic effect. The same dog and mouse intact OR dataset was used, and SIFT was applied separately to class I and class II ORs <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr></abbrgrp> to accommodate class-specific amino acid preferences. Consequently, a class-related SIFT matrix for OR genes was constructed (Fig. <figr fid="F3">3</figr>).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>A SIFT matrix for the OR conservation core residues</p>
               </caption>
               <text>
                  <p><b>A SIFT matrix for the OR conservation core residues</b>. Amino acid indicators (rows) for each of the 65 conservation core positions (columns) are colored if they were characterized by the SIFT algorithm as tolerant for class I ORs (blue), class II ORs (orange) or both classes (black).</p>
               </text>
               <graphic file="1471-2105-7-393-3"/>
            </fig>
            <p>To validate the SIFT matrix results, we examined its agreement with the above pairwise conservation core analysis. For each of the 65 conserved positions, SIFT-intolerant mutations were counted in both mouse and dog intact ORs and the results were compared to their corresponding pairwise conservation scores. Overall, a good agreement was seen (r = 0.73, <it>P </it>&lt; 10<sup>-5</sup>) (Fig. <figr fid="F4">4A</figr>). In five positions we observed large differences in the intolerant mutation frequency between the two methods (Fig. <figr fid="F4">4A</figr>) and hence conservatively removed these positions from further analysis.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>SIFT matrix validation</p>
               </caption>
               <text>
                  <p><b>SIFT matrix validation</b>. A. The number of SIFT intolerant mutations in each conserved positions (X axis) are plotted vs. their corresponding pair-wise mismatches count of the positional conservation score calculation. Except the large differences in five positions (49, 123, 149, 185 and 262 marked in circles) there is a high agreement between the two methods (r = 0.73 <it>P </it>&lt; 10<sup>-5</sup>). (B) The number of SIFT intolerant mutations (X axis) in each of 245 human OR pseudogenes are plotted against the number of coding frame disruptions in their sequences (filled squares) Overall, a high positive correlation is observed even when only frame disruptions between TM1-TM7 are considered (empty circles) (r = 0.75 <it>P </it>&lt; 10<sup>-5</sup>, r = 0.73 <it>P </it>&lt; 10<sup>-5 </sup>respectively). Thus, the data in these plots suggests that the SIFT matrix is a good indicator for deviation from selective constraints.</p>
               </text>
               <graphic file="1471-2105-7-393-4"/>
            </fig>
            <p>A key hypothesis of the present study is that a protein's cumulative SIFT intolerance score is a predictor of its pseudogene status. We used the set of 245 non-redundant (&lt; 80% similarity) full-length human OR pseudogenes to obtain support for this notion. For each OR pseudogene the total number of SIFT-intolerant amino acids was compared to the number of frame disruptions (nonsense and in/del mutations). We observed high correlation between these two types of deleterious sequence alterations (r = 0.75, <it>P </it>&lt; 10<sup>-5</sup>) (Fig. <figr fid="F4">4B</figr>), suggesting that the cumulative SIFT deviation is an adequate indicator for pseudogenizing sequence disruptions. We also investigated the effect of the position of the coding frame disruption. For this, we reanalyzed the data with only considering frame disruptions between TM1-TM7, as the amino and carboxy termini are the most likely to be dispensable in shortened open reading frame (Fig. <figr fid="F4">4B</figr>). The results show that the observed correlation is not significantly changed.</p>
         </sec>
         <sec>
            <st>
               <p>Discriminating functional from non-functional ORs</p>
            </st>
            <p>The central goal of this study is to predict whether an OR is functional or not according to its deviation form a protein consensus sequence. For that we used the resulted SIFT matrix to generate a binary vector for each OR gene, indicating a tolerant (0) or intolerant (1) amino acids along the 60 conserved position. To achieve the best separation between presumed active and inactive OR genes, these vectors were subjected to a logistic regression analysis <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, a supervised learning-based classification routine.</p>
            <p>The training step of the logistic regression procedure is aimed at computing weighting coefficients for the predictor variables that would subsequently produce likelihood score for belonging to one of the two classes. Here, functional genes were represented by a non-redundant set (&lt; 80% similarity) of 598 mouse intact OR genes, and non-functional genes by 295 human OR pseudogenes. We considered genes with a pseudogene likelihood score <it>&#968;</it><sub><it>L </it></sub>&#8804; 0.5 as functional and genes with <it>&#968;</it><sub><it>L </it></sub>> 0.5 as pseudogenes. The resulting logistic model correctly characterizing 96.3% of the learning set intact ORs and 69.8% of its human OR pseudogenes (<it>&#967;</it><sup>2 </sup>= 541.5, <it>P </it>&lt; 10<sup>-8</sup>). We further performed a cross-validation jackknifing analysis, whereby the original training set of 893 genes (both intact and inactive) was divided into 8 equal groups, and learning was performed for different 7/8 subsets, followed by testing on the remained 1/8. Correct identification was obtained for 65% &#177; 7% of the human pseudogenes and 95% &#177; 3% of mouse intact genes (P &lt; 10<sup>-8</sup>) (see <supplr sid="S2">Additional file 2</supplr>).</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>An excel table with the logistic regression analyses information for the 598 and 245 non-redundant set of mouse intact and human pseudogene ORs respectively as well as for the 384 human and 279 chimpanzee ORs with intact ORF. The information for the various species is given in different worksheets. Human intact gene with missense SNPs in their highly conserved positions are highlighted in orange color and the results of their inferred alleles (marked with a or b at the end of the gene ID) are also indicated. The first column gives the gene ID as appears in the HORDE database <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The next 60 columns indicate the tolerant (0) and intolerant (1) residues of each gene according to the SIFT matrix (Fig. <figr fid="F3">3</figr>). The three last columns give the results of the logistic regression analysis.</p>
               </text>
               <file name="1471-2105-7-393-S2.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Subsequently, we used the derived logistic model weight parameters to assess the propensity of human OR genes, nominally classified as functional, to encode non-functional proteins. The test set constituted 384 human OR genes with full length open reading frame (> 280 amino acids and including all 7TM regions) labeled as intact in the HORDE database <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The procedure resulted in the re-classification of 98 purportedly intact OR genes as non-functional (see <supplr sid="S2">Additional file 2</supplr>). Taking into account the false positive (~5%) and false negative (~35%) values of our logistic model in correctly identifying true pseudogenes (see above), the extrapolated number of intact human ORs that likely encode non-functional proteins is expected to be ~135. This brings the total potential number of human OR pseudogenes to ~70%. Recently, Gilad et al <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> have examined the evolution of OR genes in primates by a genomic comparisons of human to chimpanzee (<it>Pan troglodytes</it>). Among other things, they assessed the rate at which neutral gene disruptions accumulate in human OR genes. This led to a subsequent estimation that ~135 human intact OR genes evolve under no selective constraints <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and Y. Gilad, private communication), which is in agreement with our inference. Thus, both analyses suggest that while a large group of pseudogenes have both missense functional disruptions as well as in-frame stop codons, a significant number of genes (which is in agreement with statistically-based expectations) have only missense-induced loss of function, without coding frame disruptions.</p>
            <p>We also utilized CORP to analyze the recently published chimpanzee OR subgenome <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Consequently, of 279 chimpanzee full-length intact OR genes, 59 were predicted to be non-functional. The fraction of putatively inactive ORs with intact open reading frame in chimpanzee (0.204) was slightly smaller than observed in human (0.255), as expected according to the similar difference in the frame-disrupted pseudogenes of these species <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Such differences suggest that the chimpanzee OR genes evolved under stronger selective constrains than the human OR genes since the speciation of these two higher apes 5&#8211;10 million years ago.</p>
         </sec>
         <sec>
            <st>
               <p>Identifying OR segregating pseudogenes</p>
            </st>
            <p>CORP could also be utilized to predict the functional status of allelic variants in OR genes. In this realm, nonsynonymous SNPs that exchange between tolerant and intolerant amino acids in a highly conserved position may result in segregation between active and inactive alleles. To examine this, we performed a database search for missense SNPs at the 60 highly conserved sequence positions of the OR conservation core of all intact genes. A total of 91 such SNPs were identified in 71 intact OR genes. To assess the potential functional impact of the polymorphisms in these genes, we applied all possible allelic states of these ORs to CORP. Consequently, we found 30 human OR genes segregated between active and inactive states in the human population, arising from mutation-like variations at highly conserved positions. Another 16 ORs that were among the 98 ORs already annotated as non-functional according to other fixed intolerant amino acid substitutions did not change their functional status due to these SNPs. Conversely, 25 ORs were predicted as functional in all their allelic states despite the potentially damaging SNPs in their sequences. Notably, due to the false negative rate of pseudogene identification of our algorithm (~0.3), it is likely that it failed in detecting functional segregation in additional 8 genes. Thereby, it brings the current number of missense segregating OR genes to 38 which more than doubling the known count of such important human genetic variation loci <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>In this paper we present the CORP algorithm that was designed to tackle an important dilemma of many functional genetic studies: is a gene with an intact ORF necessarily functional? The answer is clearly negative, as mutations in promoters or other regulatory regions as well as changes in crucial protein residues may impair the gene's activity without any obvious sequence disruption. This issue is particularly relevant for human OR genes, a majority of which lost their function in recent human history <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Using CORP we evaluated the probability of OR genes to encode an active protein by examining their deviation from an OR functionally-crucial sequence consensus. It is important to note that CORP does not consider the functional consequence of each amino acid substitutions in isolation, but rather the overall number and conservation level of the positions with intolerant mutations. Thus, the resulting pseudogene likelihood score (<it>&#968;</it><sub><it>L</it></sub>) is a reflection of the evolutionary status of the relevant OR gene, a score that is demonstrated here to be a very good predictor of functional status. Since both the conservation core and SIFT matrix were characterized probabilistically, some false positives signals may accrue. Such inaccuracies become less significant through the use of logistic regression analysis that takes into account many other variables. An exception could be an OR that deviates from the conservation core by accumulating so called intolerant mutation to acquire another function not yet identified.</p>
         <p>A key parameter in this algorithm is the accurate characterization of potential deleterious amino acid substitutions in highly constrained positions along the OR protein sequence. Conserved sequence motifs of OR genes were previously characterized in various studies <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B21">21</abbr></abbrgrp>. However, the delineation of these conservation elements was based on human OR genes, many of which have evolve under minimal selective constraints <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B25">25</abbr></abbrgrp>. Therefore, these sequence motifs may not accurately reflect the functionally crucial positions. Another study that used a comparison of two genome assemblies of the mouse to characterize conserved motifs in OR genes <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> is also inadequate for the present purpose, since its invariable motifs are likely masked by species-specific conservation. In contrast, we have constructed an OR gene conservation profile by comparing both OR orthologs and paralogs of mouse and dog. These two species still rely on their sense of smell for survival, augmenting the likelihood of positional conservation. Moreover, these two mammals are sufficiently divergent (~100 Mya) so as to allow better distinction between conserved and variable residues. Therefore it is likely that the resulted conservation core is a good reflection of the functionally crucial mammalian OR positions.</p>
         <p>The CORP algorithm is better in correctly identifying functional genes (95% success) than in predicting the inactivation of frame-disrupted pseudogenes (65% success). The failure to identify ~1/3 of the pseudogenes as non-functional is rationalized by the observation that the large majority (>95%) of the misclassified pseudogenes had only &#8804; 3 frame disruptions in their sequences suggesting that they are recently-formed pseudogenes (Fig. <figr fid="F5">5</figr>). Such recently-formed pseudogenes may not have had time to sufficiently deviate from their conservation core.</p>
         <fig id="F5">
            <title>
               <p>Figure 5</p>
            </title>
            <caption>
               <p>Frame disruption counts of human OR pseudogenes</p>
            </caption>
            <text>
               <p><b>Frame disruption counts of human OR pseudogenes</b>. The cumulative frequencies of OR pseudogenes with respect to their number of coding frame disruptions. Continuous line, ORs that are annotated by CORP as 'functional'; Broken line, ORs that are annotated by CORP as 'non-functional'.</p>
            </text>
            <graphic file="1471-2105-7-393-5"/>
         </fig>
         <p>A previous study <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> assessed the conservation level of a gene via the Ka/Ks ratio according to its divergence from its inferred ancestral sequence. The sequence in question is compared to its two closest homologs (one ortholog and one paralog). A low value of Ka/Ks is taken as indicative of Darwinian purifying selection, hence of its functional importance. Applying this method to our training dataset revealed that it correctly identified 77% of the human pseudogenes and 74% of mouse intact genes. While this method performs slightly better in detecting true pseudogenes (67% in our method), it was significantly worse in identifying intact genes (95% in our method). Furthermore, the receiver operating characteristic (ROC) curves were compared for both methods (Fig. <figr fid="F6">6</figr>), indicating a significant advantage of our method.</p>
         <fig id="F6">
            <title>
               <p>Figure 6</p>
            </title>
            <caption>
               <p>Receiver operating curve for CORP and Ka/Ks</p>
            </caption>
            <text>
               <p><b>Receiver operating curve for CORP and Ka/Ks</b>. The OR pseudogene classification efficiency is indicated by the false positive/true positive ratio. The larger area under the continuous line (93.7% vs. 84.4%) suggests that our method performs better than the Ka/Ks method (solid line) in OR pseudogene classification (A classifier which picks pseudogenes at random would result in a line along the x = y diagonal).</p>
            </text>
            <graphic file="1471-2105-7-393-6"/>
         </fig>
         <p>Another method <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> compares the query sequence to a consensus motif from the Pfam database <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>and calculates whether the deviations from the motif are consistent with a neutral drift model. This algorithm (PSILC), similarly to ours, is based on sequence conservation signals. However, because it utilizes a specific Pfam domain (7TM1) from which ORs deviate considerably it classifies a large majority of intact ORs as pseudogenes. This situation could potentially be improved by a future definition of an OR-specific 7TM Pfam domain. Another potential problem with the application of PSILC to OR sequences is that OR genes are subjected to positive selection <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B29">29</abbr></abbrgrp>, which may lead to the misclassification of functional genes as pseudogenes <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. The new version of PSILC which addresses this issue (R. Durbin, private communication) could alleviate this problem. In summary, we have demonstrated that the CORP algorithm is an effective means for <it>in silico </it>OR pseudogene identification. It is likely that the same procedure will be applicable to other gene families with similar evolutionary features (e.g. taste or vomeronasal receptor genes). In contrast, in cases of small gene families or single genes it might be preferable to use one of the other existing pseudogene annotation methods.</p>
         <p>The ultimate validation for CORP would be experimental examination of the activity of putatively active and inactive OR genes by expression methodologies. Recently, Gaillard et. al <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> demonstrated, that individual amino acid substitutions can abolish the function of particular OR gene. In this experiment they examined the activity of OR 912&#8211;93 of several species (OR5G1P in human) and found that it is inactive in orangutan and human despite their intact open reading frame (in human they corrected the existing single in-frame stop codon). Applying the OR sequence of these two species to CORP revealed that both of them were predicted to be non-functional with <it>&#968;</it><sub><it>L </it></sub>= 0.76 and <it>&#968;</it><sub><it>L </it></sub>= 0.67 for human and orangutan respectively. In contrast, the sequences of the active ORs in the other 6 species from this study received very low pseudogene likelihoods scores by our method (an average of <it>&#968;</it><sub><it>L </it></sub>= 0.06) suggesting that they are functional. Interestingly, the function of the two inactive receptors was restored by restoration of the highly conserved Arginine of the DRY motif (located in the interface of TM3 and the 2<sup>nd </sup>intracellular loop) which is common to many GPCRs and is one of the 60 conserved residues in our conservation matrix. When we introduced the same His-> Arg (orangutan) and Cys->Arg (human) correction to the OR sequences of these species, they were predicted as functional by our algorithm, with pseudogene likelihoods scores of <it>&#968;</it><sub><it>L </it></sub>= 0.15 and <it>&#968;</it><sub><it>L </it></sub>= 0.10 for human and orangutan respectively. This demonstrates the ability of CORP to distinguish between functional and non-functional ORs even if they differ by only one amino acid residue, and provides a limited experimental validation. Despite this supporting evidence for the validity of our algorithm, further studies would help to assess and improve the prediction efficacy of this algorithm.</p>
         <p>The validation of functional activity could be based on a number of roles ascribed to OR proteins. The most widely used of these assays is odorant responses <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B17">17</abbr></abbrgrp>, but other functional roles include plasma membrane targeting <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, protein-mediate negative feedback mechanism that underlies clonal exclusion of OR expression <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp> as well as axonal guidance in olfactory bulb glomerular targeting <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Obviously, an OR may become inactivated by mutations at sites related to one or more of the above functions, in other words inactive ORs may still show undisturbed odorant binding. An advantage of the presently proposed sequence-based functional classification is that it is global, namely will show a high value of <it>&#936;</it><sub><it>L </it></sub>irrespective of the site or mode of inactivation.</p>
         <p>Another major benefit of CORP is its ability to differentiate between functional and non-functional alleles. Here we used this capacity to predict the potential dichotomous functional status of 30 OR allele pairs in the human population. These more than double the known count of OR segregating pseudogenes (SPGs) in the human genome <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, providing additional ground for future genetic studies <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Interestingly, 15 of these segregating OR loci included a polymorphism in the conserved Arg<sub>130</sub>. This residue is part of the highly conserved MAYDRY motif <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. The relatively high number of polymorphisms in Arg<sub>130 </sub>has been previously attributed to the suggestion that it is less functionally important than its neighboring conserved residues (e.g. A<sub>129 </sub>and M<sub>126</sub>) and hence is less constrained by evolutionary selection <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. However, in our logistic regression analysis this residue received the highest coefficient weight in the comparison of functional and non-functional OR genes, thus suggesting that other biological mechanisms are responsible to the highly polymorphic nature of this residue.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this paper we present CORP, a probabilistic algorithm which distinguishes between functional and non-functional OR genes with high precision (<supplr sid="S1">Additional file 1</supplr>). This novel method suggests that the degree of human OR repertoire diminution is considerably higher than thus far been suspected. We demonstrate that CORP can predict, in some cases, the functional consequences of single amino acid substitutions, crucial information for genetic and functional studies. The method and data presented contribute to the improved annotation of pseudogenes, thus helping to further understand these important genomic relics.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>OR gene dataset construction</p>
            </st>
            <p>The initial data set comprised 1913 full-length (>280 amino acids) OR protein sequences. These included 753 human genes from build 40 of the HORDE database <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, 1039 mouse <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and 121 dog <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> OR genes with full-length open reading frame with up to two frame disruptions. Each of these sequences was aligned to a well-curated multiple alignment of mouse and human ORs <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, and the amino acid positions enumerated from 1 to 301 as per the original multiple alignment. Pseudogenes were regarded as genes with at least one frame disrupting mutation between the initiating Methionine and amino acid number 280. The alignments were constructed using Clustal X <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> in a profile alignment mode, with default parameters. Sequences too disrupted to be aligned were discarded. The final data set contained an alignment of 1039, 753 and 83, mouse, human and dog sequences respectively.</p>
            <p>We used a previous definition based on mutual best hit <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> for identifying 83 pairs of mouse-dog full-length orthologous OR sequences. The 433 mouse OR paralog pairs were retrieved from <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Calculation of positional conservation scores</p>
            </st>
            <p>The positional conservation scores were calculated separately for pairwise alignments of 83 mouse-dog OR orthologs, and for a similar alignments of 433 mouse OR paralogous pairs. Only residues between TM1 and TM7 (positions 32&#8211;301) of the OR sequences were considered, due to poor alignment in the N-terminal and C-terminal ends of the protein. The conservation scores at each residue position <it>i </it>was calculated as:</p>
            <p>
               <m:math name="1471-2105-7-393-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>C</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>i</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mo>&#8722;</m:mo>
                        <m:mi>log</m:mi>
                        <m:mo>&#8289;</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mi>m</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>i</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>n</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>i</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                        </m:mfrac>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGdbWqcqGGOaakcqWGPbqAcqGGPaqkcqGH9aqpcqGHsislcyGGSbaBcqGGVbWBcqGGNbWzdaWcaaqaaiabd2gaTjabcIcaOiabdMgaPjabcMcaPaqaaiabd6gaUjabcIcaOiabdMgaPjabcMcaPaaacaWLjaGaaCzcamaabmGabaGaeGymaedacaGLOaGaayzkaaaaaa@438A@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where <it>m(i) </it>is the number of mismatches between pairs at position <it>i </it>and <it>n(i) </it>is the total number of comparisons at the same positions. The statistical significance of the positional conservation score (<it>C(i)</it>) was assessed by a one-sided chi-square test with one degree of freedom, using the dataset average conservation score as a reference. Finally, we applied a false detection rate (FDR) analysis <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> to eliminate possible false positives due to multiple tests.</p>
            <p>To distinguish between tolerant and deleterious amino acids along the OR protein sequence we applied the SIFT algorithm <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> to the dog and mouse OR datasets. Class I and class II ORs were subjected to SIFT separately to eliminate spurious classifications of amino acid substitutions due to the relatively high divergence of these OR classes <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Logistic regression analysis</p>
            </st>
            <p>Logistic regression was utilized to distinguish between functional and non-functional OR genes according to their deviations from a functionally crucial consensus. This statistical approach, that attempts to distinguish between classes using the most parsimonious model, employs a set of variables with a known class-identifier to create a model that includes the weighted predictor variables, so as to provide the best separation between classes. Logistic regression was preferred to other statistical methods (e.g. linear discrimination analysis) because it makes no assumptions about the distribution of the independent variables. All the logistic regression analyses in this research were processed by the JMP logistic regression package <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> with its default parameters.</p>
            <p>To build the logistic model i.e. to assign the appropriate weight to each conserved position, we used a non-redundant training set (&lt; 80% similarity) of 598 mouse intact OR genes and 295 human OR pseudogenes representing active and inactive OR proteins respectively. According to the resulting model, the probability of belonging to one of the two classes was indicated by a pseudogene likelihood score (<it>&#968;</it><sub><it>L</it></sub>) ranging between 0 (active) and 1 (inactive). Consequently, we considered genes with <it>&#968;</it><sub><it>L </it></sub>&#8804; 0.5 as functional and genes with <it>&#968;</it><sub><it>L </it></sub>>0.5 as non-functional.</p>
         </sec>
         <sec>
            <st>
               <p>Databases acquisition of OR SNPs</p>
            </st>
            <p>Missense SNPs in the 60 highly conserved positions were taken from the HORDE databases <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> for all human intact OR genes. SNP information in HORDE is extracted from the NCBI dbSNP database <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> in reference to the OR genomic location. Currently HORDE contains 3395 SNPs of which, 1089 are missense SNPs in intact ORs and hence have a potential functional consequence.</p>
         </sec>
         <sec>
            <st>
               <p>Ka/Ks calculation</p>
            </st>
            <p>For each of the human and mouse ORs we matched the closest ortholog and paralog according to sequence identity. Then we inferred the ancestral sequence of the gene according to the sequence consensus of this triplet. Subsequently, a value of Ka/Ks was calculated for each gene based on the sequence divergence between the gene and its inferred ancestral sequence. This procedure was performed using GCG package <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>IM &#8211; Conceived and designed the study, carried out the statistical analyses and drafted the ms</p>
         <p>RA &#8211; Carried out the bioinformatics analyses and helped to draft the ms</p>
         <p>DL &#8211; Is the PI of the study and prepared the ms.</p>
         <p>All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Dr. Tsviya Olender for help with the CORP webpage. This work was supported by the National Institute of Health grant DC000298 and by the Crown Human Genome Center. IM has been supported by the Eshkol fellowship from the Ministry of Science &#8211; Israel. DL holds the Ralph and Lois Silver Chair in Human Genomics.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22</p>
            </title>
            <aug>
               <au>
                  <snm>Harrison</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Hegyi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Balasubramanian</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Luscombe</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Bertone</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Echols</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>2</issue>
            <fpage>272</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155275</pubid>
                  <pubid idtype="pmpid" link="fulltext">11827946</pubid>
                  <pubid idtype="doi">10.1101/gr.207102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A genome-wide survey of human pseudogenes</p>
            </title>
            <aug>
               <au>
                  <snm>Torrents</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Suyama</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zdobnov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>12</issue>
            <fpage>2559</fpage>
            <lpage>2567</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403797</pubid>
                  <pubid idtype="pmpid" link="fulltext">14656963</pubid>
                  <pubid idtype="doi">10.1101/gr.1455503</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>A novel multigene family may encode odorant receptors: a molecular basis for odor recognition</p>
            </title>
            <aug>
               <au>
                  <snm>Buck</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Axel</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1991</pubdate>
            <volume>65</volume>
            <issue>1</issue>
            <fpage>175</fpage>
            <lpage>187</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(91)90418-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">1840504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Vertebrate olfactory reception</p>
            </title>
            <aug>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Annu Rev Neurosci</source>
            <pubdate>1986</pubdate>
            <volume>9</volume>
            <fpage>329</fpage>
            <lpage>355</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.ne.09.030186.001553</pubid>
                  <pubid idtype="pmpid" link="fulltext">2423007</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Molecular biology of odorant receptors in vertebrates</p>
            </title>
            <aug>
               <au>
                  <snm>Mombaerts</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Rev Neurosci</source>
            <pubdate>1999</pubdate>
            <volume>22</volume>
            <fpage>487</fpage>
            <lpage>509</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1146/annurev.neuro.22.1.487</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The canine olfactory subgenome</p>
            </title>
            <aug>
               <au>
                  <snm>Olender</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fuchs</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Linhart</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Kalush</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Khen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2004</pubdate>
            <volume>83</volume>
            <fpage>361</fpage>
            <lpage>372</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ygeno.2003.08.009</pubid>
                  <pubid idtype="pmpid" link="fulltext">14962662</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The complete human olfactory subgenome</p>
            </title>
            <aug>
               <au>
                  <snm>Glusman</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Yanai</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <issue>5</issue>
            <fpage>685</fpage>
            <lpage>702</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.171001</pubid>
                  <pubid idtype="pmpid" link="fulltext">11337468</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The olfactory receptor gene superfamily of the mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Firestein</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nat Neurosci</source>
            <pubdate>2002</pubdate>
            <volume>5</volume>
            <issue>2</issue>
            <fpage>124</fpage>
            <lpage>133</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11802173</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The human olfactory receptor repertoire</p>
            </title>
            <aug>
               <au>
                  <snm>Zozulya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Echeverri</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <issue>6</issue>
            <fpage>RESEARCH0018</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">33394</pubid>
                  <pubid idtype="pmpid" link="fulltext">11423007</pubid>
                  <pubid idtype="doi">10.1186/gb-2001-2-6-research0018</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Different evolutionary processes shaped the mouse and human olfactory receptor gene families</p>
            </title>
            <aug>
               <au>
                  <snm>Young</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Ross</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Tonnes-Priddy</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Trask</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2002</pubdate>
            <volume>11</volume>
            <issue>14</issue>
            <fpage>1683</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/hmg/11.14.1683</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Human specific loss of olfactory receptor genes</p>
            </title>
            <aug>
               <au>
                  <snm>Gilad</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Man</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Paabo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">152291</pubid>
                  <pubid idtype="pmpid" link="fulltext">12612342</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Loss of olfactory receptor genes coincides with the acquisition of full trichromatic vision in primates</p>
            </title>
            <aug>
               <au>
                  <snm>Gilad</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Wiebe</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Przeworski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Paabo</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <issue>1</issue>
            <fpage>E5</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">314465</pubid>
                  <pubid idtype="pmpid" link="fulltext">14737185</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0020005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Widespread ectopic expression of olfactory receptor genes</p>
            </title>
            <aug>
               <au>
                  <snm>Feldmesser</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Olender</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Khen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yanai</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Ophir</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>1</issue>
            <fpage>121</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1508154</pubid>
                  <pubid idtype="pmpid" link="fulltext">16716209</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-7-121</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Nonsense-mediated mRNA decay in mammals</p>
            </title>
            <aug>
               <au>
                  <snm>Maquat</snm>
                  <fnm>LE</fnm>
               </au>
            </aug>
            <source>J Cell Sci</source>
            <pubdate>2005</pubdate>
            <volume>118</volume>
            <issue>Pt 9</issue>
            <fpage>1773</fpage>
            <lpage>1776</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1242/jcs.01701</pubid>
                  <pubid idtype="pmpid" link="fulltext">15860725</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Identification of ligands for olfactory receptors by functional expression of a receptor library</p>
            </title>
            <aug>
               <au>
                  <snm>Krautwurst</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Yau</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Reed</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1998</pubdate>
            <volume>97</volume>
            <issue>7</issue>
            <fpage>917</fpage>
            <lpage>926</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0092-8674(00)81716-X</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Different noses for different people</p>
            </title>
            <aug>
               <au>
                  <snm>Menashe</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Man</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gilad</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2003</pubdate>
            <volume>34</volume>
            <issue>2</issue>
            <fpage>143</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1160</pubid>
                  <pubid idtype="pmpid" link="fulltext">12730696</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Functional expression of a mammalian odorant receptor [see comments]</p>
            </title>
            <aug>
               <au>
                  <snm>Zhao</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ivic</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Otaki</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Hashimoto</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mikoshiba</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Firestein</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>279</volume>
            <issue>5348</issue>
            <fpage>237</fpage>
            <lpage>242</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.279.5348.237</pubid>
                  <pubid idtype="pmpid" link="fulltext">9422698</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>SIFT: Predicting amino acid changes that affect protein function</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3812</fpage>
            <lpage>3814</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168916</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824425</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg509</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Prediction of the odorant binding site of olfactory receptor proteins by human-mouse comparisons</p>
            </title>
            <aug>
               <au>
                  <snm>Man</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Gilad</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <issue>1</issue>
            <fpage>240</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.03296404</pubid>
                  <pubid idtype="pmpid" link="fulltext">14691239</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The variable and conserved interfaces of modeled olfactory receptor proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Pilpel</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Protein Science</source>
            <pubdate>1999</pubdate>
            <volume>8</volume>
            <fpage>969</fpage>
            <lpage>977</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10338007</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Motif-based construction of a functional map for mammalian olfactory receptors</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Stolovitzky</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Califano</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Firestein</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2003</pubdate>
            <volume>81</volume>
            <issue>5</issue>
            <fpage>443</fpage>
            <lpage>456</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0888-7543(03)00022-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">12706103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Applied Logistic Regression</p>
            </title>
            <aug>
               <au>
                  <snm>Hosmer</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Lemeshow</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <publisher>John Wiley &amp; Sons, Inc</publisher>
            <edition>2</edition>
            <pubdate>2000</pubdate>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10886529</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The Human Olfactory Receptor Data Exploratorium</p>
            </title>
            <url>http://bip.weizmann.ac.il/HORDE/</url>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A comparison of the human and chimpanzee olfactory receptor gene repertoires</p>
            </title>
            <aug>
               <au>
                  <snm>Gilad</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Man</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Glusman</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>2</issue>
            <fpage>224</fpage>
            <lpage>230</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">546523</pubid>
                  <pubid idtype="pmpid" link="fulltext">15687286</pubid>
                  <pubid idtype="doi">10.1101/gr.2846405</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Natural selection on the olfactory receptor gene family in humans and chimpanzees</p>
            </title>
            <aug>
               <au>
                  <snm>Gilad</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bustamante</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Paabo</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Am J Hum Genet</source>
            <pubdate>2003</pubdate>
            <volume>73</volume>
            <issue>3</issue>
            <fpage>489</fpage>
            <lpage>501</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1180675</pubid>
                  <pubid idtype="pmpid" link="fulltext">12908129</pubid>
                  <pubid idtype="doi">10.1086/378132</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Odorant and vomeronasal receptor genes in two mouse genome assemblies</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Rodriguez</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Mombaerts</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Firestein</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2004</pubdate>
            <volume>83</volume>
            <issue>5</issue>
            <fpage>802</fpage>
            <lpage>811</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ygeno.2003.10.009</pubid>
                  <pubid idtype="pmpid" link="fulltext">15081110</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Improved techniques for the identification of pseudogenes</p>
            </title>
            <aug>
               <au>
                  <snm>Coin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>Suppl 1</issue>
            <fpage>I94</fpage>
            <lpage>I100</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth942</pubid>
                  <pubid idtype="pmpid" link="fulltext">15262786</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Pfam database</p>
            </title>
            <url>http://www.sanger.ac.uk/Software/Pfam</url>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios</p>
            </title>
            <aug>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Glanowski</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Kejariwal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Todd</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Tanenbaum</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Civello</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>B</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <issue>5652</issue>
            <fpage>1960</fpage>
            <lpage>1963</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1088821</pubid>
                  <pubid idtype="pmpid" link="fulltext">14671302</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Amino-acid changes acquired during evolution by olfactory receptor 912-93 modify the specificity of odorant recognition</p>
            </title>
            <aug>
               <au>
                  <snm>Gaillard</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Rouquier</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chavanieu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mollard</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Giorgi</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <issue>7</issue>
            <fpage>771</fpage>
            <lpage>780</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/ddh086</pubid>
                  <pubid idtype="pmpid" link="fulltext">14962981</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Structural determinants for membrane trafficking and G protein selectivity of a mouse olfactory receptor</p>
            </title>
            <aug>
               <au>
                  <snm>Katada</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tanaka</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Touhara</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>J Neurochem</source>
            <pubdate>2004</pubdate>
            <volume>90</volume>
            <issue>6</issue>
            <fpage>1453</fpage>
            <lpage>1463</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1471-4159.2004.02619.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15341529</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>A feedback mechanism regulates monoallelic odorant receptor expression</p>
            </title>
            <aug>
               <au>
                  <snm>Lewcock</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Reed</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <issue>4</issue>
            <fpage>1069</fpage>
            <lpage>1074</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">327152</pubid>
                  <pubid idtype="pmpid" link="fulltext">14732684</pubid>
                  <pubid idtype="doi">10.1073/pnas.0307986100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Negative feedback regulation ensures the one receptor-one olfactory neuron rule in mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Serizawa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Miyamichi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nakatani</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Saito</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yoshihara</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sakano</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <issue>5653</issue>
            <fpage>2088</fpage>
            <lpage>2094</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1089122</pubid>
                  <pubid idtype="pmpid" link="fulltext">14593185</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Axon guidance of mouse olfactory sensory neurons by odorant receptors and the beta2 adrenergic receptor</p>
            </title>
            <aug>
               <au>
                  <snm>Feinstein</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bozza</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rodriguez</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Vassalli</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mombaerts</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2004</pubdate>
            <volume>117</volume>
            <issue>6</issue>
            <fpage>833</fpage>
            <lpage>846</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2004.05.013</pubid>
                  <pubid idtype="pmpid" link="fulltext">15186782</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The genetic basis of olfactory deficits</p>
            </title>
            <aug>
               <au>
                  <snm>Menashe</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Feldmesser</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Humana press Inc</source>
            <pubdate>2006</pubdate>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The structural basis of G-protein-coupled receptor function and dysfunction in human diseases</p>
            </title>
            <aug>
               <au>
                  <snm>Schoneberg</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schulz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gudermann</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Rev Physiol Biochem Pharmacol</source>
            <pubdate>2002</pubdate>
            <volume>144</volume>
            <fpage>143</fpage>
            <lpage>227</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11987825</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Plewniak</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Jeanmougin</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <issue>24</issue>
            <fpage>4876</fpage>
            <lpage>4882</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147148</pubid>
                  <pubid idtype="pmpid" link="fulltext">9396791</pubid>
                  <pubid idtype="doi">10.1093/nar/25.24.4876</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hochberg</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J R Stat Soc B Met</source>
            <pubdate>1995</pubdate>
            <volume>57</volume>
            <fpage>289</fpage>
            <lpage>300</lpage>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Design of experiment and data analysis by JMP (SAS institute) in analytical method validation</p>
            </title>
            <aug>
               <au>
                  <snm>Ye</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ren</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Okafo</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>J Pharm Biomed Anal</source>
            <pubdate>2000</pubdate>
            <volume>23</volume>
            <issue>2&#8211;3</issue>
            <fpage>581</fpage>
            <lpage>589</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0731-7085(00)00335-6</pubid>
                  <pubid idtype="pmpid">10933552</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>NCBI Single-Nucleotide Polymorphism Database</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/SNP/</url>
         </bibl>
         <bibl id="B41">
            <title>
               <p>GCG: The Wisconsin Package of sequence analysis programs</p>
            </title>
            <aug>
               <au>
                  <snm>Womble</snm>
                  <fnm>DD</fnm>
               </au>
            </aug>
            <source>Methods Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>132</volume>
            <fpage>3</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10547828</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
