<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2006-7-10-r96</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Method</dochead>
      <bibl>
         <title>
            <p>Mining the <it>Arabidopsis thaliana </it>genome for highly-divergent seven transmembrane receptors</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Moriyama</snm>
               <mi>N</mi>
               <fnm>Etsuko</fnm>
               <insr iid="I1"/>
               <email>emoriyama2@unl.edu</email>
            </au>
            <au id="A2">
               <snm>Strope</snm>
               <mi>K</mi>
               <fnm>Pooja</fnm>
               <insr iid="I1"/>
               <email>poojastrope@gmail.com</email>
            </au>
            <au id="A3">
               <snm>Opiyo</snm>
               <mi>O</mi>
               <fnm>Stephen</fnm>
               <insr iid="I2"/>
               <email>sopiyo@unlserve.unl.edu</email>
            </au>
            <au id="A4">
               <snm>Chen</snm>
               <fnm>Zhongying</fnm>
               <insr iid="I3"/>
               <email>zchen@email.unc.edu</email>
            </au>
            <au id="A5">
               <snm>Jones</snm>
               <mi>M</mi>
               <fnm>Alan</fnm>
               <insr iid="I3"/>
               <email>alanjones@bio.unc.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>School of Biological Sciences and Plant Science Initiative, University of Nebraska-Lincoln, Lincoln, NE 68588-0660, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583-0915, USA</p>
            </ins>
            <ins id="I3">
               <p>Departments of Biology and Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>10</issue>
         <fpage>R96</fpage>
         <url>http://genomebiology.com/2006/7/10/R96</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17064408</pubid>
               <pubid idtype="doi">10.1186/gb-2006-7-10-r96</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>28</day>
               <month>6</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>24</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>25</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>25</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Moriyama et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p><it>Arabidopsis </it>putative seven transmembrane proteins</p>
      </shorttitle>
      <shortabs>
         <p>A combination of multiple protein classification methods is described and used to identify a minimum set of 54 candidate seven transmembrane receptors in <it>Arabidopsis thaliana</it>.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>To identify divergent seven-transmembrane receptor (7TMR) candidates from the <it>Arabidopsis thaliana </it>genome, multiple protein classification methods were combined, including both alignment-based and alignment-free classifiers. This resolved problems in optimally training individual classifiers using limited and divergent samples, and increased stringency for candidate proteins. We identified 394 proteins as 7TMR candidates and highlighted 54 with corresponding expression patterns for further investigation.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010019">Plant biology</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Seven-transmembrane (7TM)-region containing proteins constitute the largest receptor superfamily in vertebrates and other metazoans. These cell-surface receptors are activated by a diverse array of ligands, and are involved in various signaling processes, such as cell proliferation, neurotransmission, metabolism, smell, taste, and vision. They are the central players in eukaryotic signal transduction. They are commonly referred to as G protein-coupled receptors (GPCRs) because most transduce extracellular signals into cellular physiological responses through the activation of heterotrimeric guanine nucleotide binding proteins (G proteins) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. However, an increasing number of alternative 'G protein-independent' signaling mechanisms have been associated with groups of these 7TM proteins <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. Thus, for precision and clarity, we refer to these proteins simply as 7TM receptors (7TMRs), and candidate proteins in organisms greatly divergent to humans are designated here as 7TM putative receptors (7TMpRs).</p>
         <p>The human genome encodes approximately 800 or more 7TMRs, both with and without known cognate ligands (the latter are so-called orphan GPCRs); they thus constitute >1% of the gene complement <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. More than 1,000 genes or 5% of the <it>Caenorhabditis elegans </it>genome are predicted to encode 7TMRs; the majority of them appear to be chemoreceptors <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Approximately 300 7TMR-encoding genes (about 1% to 2% of the genome) have been recognized in the <it>Drosophila melanogaster </it>genome <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Compared to such large numbers of 7TMRs found in animal genomes, very few 7TMpRs have been reported in plants and fungi. Only 22 <it>Arabidopsis </it>7TMpRs have been described so far. Fifteen of them constitute the 'mildew resistance locus O' (MLO) family, whose direct interaction with the G-protein &#945; subunit (G&#945;) has not been shown <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. While another 7TMpR, GCR1 <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, directly interacts with the plant G&#945; subunit GPA1 <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, it has been shown that GCR1 can act independently of the heterotrimeric G-protein complex as well <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Hsieh and Goodman <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> recently reported five expressed proteins predicted to have 7TM regions (heptahelical transmembrane proteins (HHPs) 1 to 5) but these, like the other 16, do not have candidate ligands. Finally, an unusual Regulator of G Signaling (RGS) protein (designated AtRGS1) has been predicted to have 7TM regions <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. RGS proteins function as a GTPase activating protein (GAP) to de-sensitize signaling by de-activating the G&#945; subunits of the heterotrimeric complex. Because <it>Arabidopsis </it>seedlings lacking AtRGS1 have reduced sensitivity to D-glucose <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>, the possibility exists that AtRGS1 is a novel D-glucose receptor having an agonist-regulated GAP function. Although we designate them 7TMpRs here, it should be noted that neither a ligand nor a full signaling cascade has been demonstrated yet for any of these plant proteins, and only for a barley MLO protein has the 7TM topology been experimentally confirmed <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <p>None of the reported <it>Arabidopsis </it>7TMpR proteins share substantial sequence similarity with known metazoan GPCRs constituting six different subfamilies. It appears that plant 7TMpRs dramatically diverged from known metazoan GPCRs over the 1.6 billion years since the plant and metazoan lineages bifurcated. It should be noted that <it>Arabidopsis </it>GCR1 shares weak but significant similarity with the cyclic AMP receptor, CAR1, found in the slime mold <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B11">11</abbr><abbr bid="B16">16</abbr></abbrgrp>. There is also very weak similarity to the Class B Secretin family GPCRs. However, other than GCR1, currently used search methods have not robustly identified plant 7TMpR proteins as candidate GPCRs. This great sequence divergence highlights the need for new approaches to identify divergent 7TMR candidates in non-metazoan genomes.</p>
         <p>The human genome contains 16 G&#945;, 5 G&#946;, and 12 G&#947; genes. In stark contrast, both fungi and plants have much simpler G-protein coupled signaling systems. For example, the <it>Arabidopsis </it>genome contains one canonical G&#945;, one G&#946;, and two G&#947; genes <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Similarly, a small number of G-proteins are found in fungi; there are two G&#945;, one G&#946;, and one G&#947; in <it>Saccharomyces cerevisiae </it><abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp> while <it>Neurospora crassa </it>and some fungi have more genes encoding each subunit <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Therefore, it may be reasonable to assume that plants and fungi have fewer GPCRs than human, and while approximately 200 <it>Arabidopsis </it>proteins were predicted to have 7TM regions, sequence divergence precludes unequivocal assignment of any as an orphan GPCR <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. However, at least 61 7TMpRs have been recently predicted from the plant pathogenic fungus <it>Magnaporthe grisea </it>genome <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, raising the possibility that more divergent groups of 7TMpR proteins likely remain undiscovered in non-metazoan taxa.</p>
         <p>In this report, we describe our comprehensive computational strategy for identifying 7TMpR candidates from the entire protein sequence set predicted from the <it>A. thaliana </it>genome, and compile their tissue-specific expression and co-expression patterns with G-proteins. To take advantage of different approaches, we combined multiple protein classification methods, including more specific (conservative) alignment-based classifiers and more sensitive alignment-free classifiers, to predict candidate 7TMpRs in divergent genomes more effectively.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Identifying 7TMpR candidates using various protein classification methods</p>
            </st>
            <p>Among many protein classification methods commonly used, the current state-of-the-art and most used is the profile hidden Markov models (profile HMMs) <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. It is used to construct protein family databases such as Pfam <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>, SMART <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>, and Superfamily <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. However, profile HMMs and other currently used classification methods such as PROSITE <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp> and PRINTS <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp> share an important weakness. These methods rely on multiple alignments for generating their models (patterns, profile HMMs, and so on). Generating robust multiple alignments is difficult or impossible when extremely diverged sequences are included in the analysis; 7TMRs are one such protein family whose sequence similarities between subgroups can be lower than 25%. Furthermore, alignments are generated only from known related proteins (positive samples), and, therefore, no information from negative samples (unrelated protein sequences) is directly incorporated in the model building process. Identifiable 'hits' are, therefore, constrained by initial sampling bias, which becomes reinforced when models are iteratively rebuilt from accumulated sequences. Consequently, the predictive power, especially the sensitivity, of these classifiers decreases when they are applied against extremely diverged protein families.</p>
            <p>To overcome this disadvantage and to increase sensitivities against such non-alignable similarities, several 'alignment-free' methods have been proposed recently. These methods quantify various properties of amino acid sequences and convert them into a descriptor array. Once multiple sequences with different lengths are transformed into a uniform matrix, various multivariate analysis methods can be applied. Kim <it>et al</it>. <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> and Moriyama and Kim <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> used parametric and non-parametric discriminant function analysis methods. Karchin <it>et al</it>. <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> incorporated profile HMMs with support vector machines (SVMs) using the Fisher kernel (SVM-Fisher) so that negative sample information can be taken into account when training the classifier. SVMs can be applied with completely 'alignment-free' sequence descriptors, for example, amino acid and dipeptide compositions. Such alignment-free classifiers are shown to outperform profile HMMs as well as Karchin <it>et al</it>.'s SVM-Fisher <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp> (PK Strope and EN Moriyama, submitted). Another multivariate method, partial least squares (PLS) regression, was used by Lapinsh <it>et al</it>. <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> with physico-chemical properties of amino acids. We recently re-evaluated the descriptors used with PLS and optimized them to discriminate 7TMRs from other proteins <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>.</p>
            <p>We applied these methods against the entire predicted protein sequence set derived from the <it>A. thaliana </it>genome. As shown in Table <tblr tid="T1">1</tblr>, among the 28,952 protein sequences, the Sequence Alignment and Modeling system (SAM), a profile HMM method, predicted only 16 (excluding one alternatively spliced gene sequence) as 7TMpR candidates. Fifteen of them are identified as MLO or similar to MLO and one as GCR1 in The Arabidopsis Information Resource (TAIR) <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. It clearly shows that SAM is highly specific (discriminating) with no false positive, assuming that current annotations are correct. SAM failed to identify only one known MLO (MLO4: At1g11000). This protein, as well as AtRGS1 and five recently predicted 7TM proteins (HHP1-5), were among the 16 previously predicted <it>Arabidopsis </it>7TMpRs not included in the randomly sampled 500 7TMR training sequences (see Materials and methods). Thus, we concluded that the predictive power of SAM alone is insufficient to identify highly diverged and potentially novel 7TMpR sequences.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Numbers of 7TMpR candidates identified by various methods from the <it>A. thaliana </it>genome</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>Methods</p>
                     </c>
                     <c ca="center">
                        <p>Number of 7TMpR candidates*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HMMTOP</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>7TMs<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>236 (201)</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>6-8 TM<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>633 (545)</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>5-9 TMs<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>1,091 (957)</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>5-10 TMs<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>1,343 (1,179)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SAM</p>
                     </c>
                     <c ca="center">
                        <p>16 (15)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LDA</p>
                     </c>
                     <c ca="center">
                        <p>3,211 (2,935)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>QDA</p>
                     </c>
                     <c ca="center">
                        <p>2,006 (1,820)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LOG</p>
                     </c>
                     <c ca="center">
                        <p>2,626 (2,394)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>KNN (<it>K </it>= 5)</p>
                     </c>
                     <c ca="center">
                        <p>3,125 (2,839)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>KNN (<it>K </it>= 10)</p>
                     </c>
                     <c ca="center">
                        <p>3,202 (2,906)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>KNN (<it>K </it>= 15)</p>
                     </c>
                     <c ca="center">
                        <p>3,298 (3,004)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>KNN (<it>K </it>= 20)</p>
                     </c>
                     <c ca="center">
                        <p>3,347 (3,043)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SVM-AA</p>
                     </c>
                     <c ca="center">
                        <p>2,263 (2,043)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SVM-di</p>
                     </c>
                     <c ca="center">
                        <p>2,004 (1,807)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PLS-ACC</p>
                     </c>
                     <c ca="center">
                        <p>2,671 (2,466)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*The numbers in parentheses show 7TMpR candidates after removing proteins derived from alternative splicing. <sup>&#8224;</sup>The numbers of TM regions predicted by HMMTOP.</p>
               </tblfn>
            </tbl>
            <p>The results obtained by SAM were compared with those obtained by alignment-free methods. As shown in Table <tblr tid="T1">1</tblr>, alignment-free methods (LDA, QDA, LOG, KNN, SVM with amino acid composition (SVM-AA), SVM with dipeptide composition (SVM-di), and PLS with amino acid properties (PLS-ACC)) predicted 2,000 to 3,400 proteins as 7TMpR candidates, which is about 10% of the entire predicted <it>Arabidopsis </it>proteome and about 30% to 50% of all possible transmembrane proteins (6,475 proteins) <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. These alignment-free methods clearly call many false positives, and need further optimization to improve their discrimination power.</p>
            <p>One advantage of alignment-free methods to be noted is their sensitivity against short or partial sequences <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>. Many of the 28,952 protein sequences used in this study are based only on <it>ab initio </it>gene prediction results, and hence are likely to contain various types of errors. If only a part of a 7TMR protein is predicted correctly, alignment-free methods could have a better chance to identify it.</p>
            <p>Table <tblr tid="T1">1</tblr> lists <it>Arabidopsis </it>proteins that were predicted to have five to ten transmembrane regions and bins them by the number of transmembrane regions. HMMTOP 2.0 <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp> predicted 201 proteins as having 7TM regions. This number is close to a previous prediction (184 proteins) <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. We should note, however, that no single method predicts 7TM regions from all known 7TMRs exactly (see Materials and methods). As mentioned above, it is also possible that some deduced <it>Arabidopsis </it>proteins we analyzed do not contain the entire correct coding region. There were 952 <it>Arabidopsis </it>proteins predicted to have five to nine TM regions. Based on the distribution of predicted TM numbers obtained from the entire GPCRDB entries, this range (5 to 9 TM regions) could cover almost all of the 7TMR candidates (99.1%; see Figure <figr fid="F1">1</figr> and Materials and methods). The 22 previously predicted <it>Arabidopsis </it>7TMpRs were predicted to have seven to ten TM regions (Figure <figr fid="F1">1</figr>). If we extend the range to 5 to 10 TM regions, the number of <it>Arabidopsis </it>7TMpR candidates becomes 1,179 proteins.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Distribution of transmembrane numbers predicted by HMMTOP (black bars) and TMHMM (gray bars) from the 500 7TMR sample sequences</p>
               </caption>
               <text>
                  <p>Distribution of transmembrane numbers predicted by HMMTOP (black bars) and TMHMM (gray bars) from the 500 7TMR sample sequences. Proportions (%) of the proteins predicted to have six to eight and five to nine TM regions by HMMTOP are shown at the top. The percentages shown in parentheses were obtained from the entire 7,674 7TMR dataset in GPCRDB. The numbers shown on the top of black bars are the number of previously predicted 22 <it>Arabidopsis </it>7TMpR proteins.</p>
               </text>
               <graphic file="gb-2006-7-10-r96-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Choosing 7TMpR candidates by combining prediction results</p>
            </st>
            <p>Among the ten alignment-free classifiers, LOG misclassified seven previously predicted <it>Arabidopsis </it>7TMpRs. KNN with <it>K </it>set at 5, 10, and 15 missed one, while KNN with <it>K </it>set at 20 classified them all correctly (see Materials and methods on KNN). To reduce the number of false positives (non-7TMRs predicted as 7TMRs) as well as false negatives (7TMRs predicted as non-7TMRs) and to obtain a set of 7TMpR candidates with higher confidence, we examined combinations of the prediction results by the remaining six alignment-free methods (LDA, QDA, KNN with <it>K </it>= 20, SVM-AA, SVM-di, and PLS-ACC). There were 652 proteins predicted as 7TMpR candidates by all six methods (by choosing the strict intersection). Using the number of predicted TM regions to be 5 to 10, 394 (342 after removing duplicated entries due to alternative splicing) proteins were identified as 7TMR candidates. These <it>Arabidopsis </it>proteins are listed in Additional data file 1. Of the 22 previously predicted 7TMpRs, 20 were found in this list. Although HHP4 and HHP5 were not included in this list, both were identified by two of the alignment-free methods: KNN and SVM-AA. Note that RGS1 and five HHP (as well as nine MLO and GCR1) sequences were excluded from the training set, and these six were not identified as candidate 7TMpRs by SAM.</p>
            <p>A further restriction to protein topology of exactly 7TM regions and an amino-terminus located extracellularly reduced the candidate number to 64 (54 excluding duplications due to alternative splicing). This set included nine of the 22 previously predicted 7TMpRs. These 54 7TMpR candidates are the first targets for our further analysis and are summarized in Table <tblr tid="T2">2</tblr> (also listed in Additional data file 2). Eighteen are described as simply 'expressed proteins' in the TAIR database (except for AT3G26090, which encodes RGS1). Interestingly, one of them (AT5G27210) is known to have weak similarity to a mouse orphan 7TMR. While others are known to belong to certain protein families (for example, MtN3 family), in many cases, their molecular functions have not been identified, and further investigation on these 7TMpR candidates is warranted.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Summary of the 54 7TMpR candidates identified in this study<sup>1</sup></p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>Groups*</p>
                     </c>
                     <c ca="left">
                        <p>TAIR locus IDs</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Multiple members from gene families</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Nodulin MtN3 family proteins (8/17)</p>
                     </c>
                     <c ca="left">
                        <p>At1g21460, At3g16690, At3g28007, At3g48740, At4g25010, At5g13170, At5g23660, At5g50800</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>MLO proteins (7/15)</p>
                     </c>
                     <c ca="left">
                        <p>At1g11000 (MLO4), At1g26700 (MLO14), At1g42560 (MLO9), At2g33670 (MLO5), At2g44110 (MLO15), At4g24250 (MLO13), At5g53760 (MLO11)</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Expressed protein family 1 (2/6)</p>
                     </c>
                     <c ca="left">
                        <p>At1g77220, At4g21570</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GNS1/SUR4 membrane family proteins (3/4)</p>
                     </c>
                     <c ca="left">
                        <p>At1g75000, At3g06470, At4g36830</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Perl1-like family protein (2/2)</p>
                     </c>
                     <c ca="left">
                        <p>At1g16560, At5g62130</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>TOM3 family proteins (3/3)</p>
                     </c>
                     <c ca="left">
                        <p>At1g14530, At2g02180, At4g21790</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Expressed protein family 2 (3/5)</p>
                     </c>
                     <c ca="left">
                        <p>At1g10660, At2g47115, At5g62960</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Expressed protein family 3 (2/4)</p>
                     </c>
                     <c ca="left">
                        <p>At3g09570, At5g42090</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Expressed protein family 4 (2/5)</p>
                     </c>
                     <c ca="left">
                        <p>At1g49470, At5g19870</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Expressed protein family 5 (2/5)</p>
                     </c>
                     <c ca="left">
                        <p>At3g63310, At4g02690</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Single copy genes (8)</p>
                     </c>
                     <c ca="left">
                        <p>At1g48270 (GCR1), At1g57680, At2g41610, At2g31440, At3g04970, At3g26090 (RGS1), At3g59090, At4g20310</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Single member from small gene families (8)</p>
                     </c>
                     <c ca="left">
                        <p>At2g01070, At3g19260, At2g35710, At2g16970, At1g15620, At1g63110, At4g36850, At5g27210</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Single member from big gene families (4)</p>
                     </c>
                     <c ca="left">
                        <p>At1g71960, At3g01550, At5g23990, At5g37310</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*The number of candidates identified in this study belonging to each group is shown in parentheses (the number of all proteins in each group is given after '/'). More detailed information is given in Additional data file 2.</p>
               </tblfn>
            </tbl>
            <p>The 54 proteins were grouped into families based on similarities to known protein sequences. Eight of the 54 7TMpR candidates, including GCR1 and RGS1, are encoded by single copy genes. In addition to the seven MLO proteins identified, there are eight MtN3 family members, two proteins of an unnamed family consisting of six expressed proteins, as well as multiple (two to three) members from smaller gene families (five or less). All members of the TOM3 family and the Perl1-like family, as well as the majority of the GNS/SUR4 family and an unnamed family consisting of five expressed proteins (expressed protein family 2) were included in the list. The identification of multiple members from these gene families using our alignment-free methods supported the consistency of this approach. However, for most of these families, not all members were found. Additionally, eight single representatives of small protein families consisting of two to five members and four single representatives of large protein families were found in the list. Some of these proteins, especially those from large protein families, may represent false positives as 7TMpR candidates. This 7TMR mining method can be refined, for example, by re-training models as well as using more flexible hierarchical classification.</p>
            <p>The five predicted heptahelical proteins (HHP1-5) reported by Hsieh and Goodman <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> were identified by sequence similarity to human adiponectin receptors (AdipoRs) and membrane progestin receptors (mPRs) that share little sequence similarity to known GPCRs. HHP1-3 were identified in our initial list of 394 but were culled from the final list of 54 <it>Arabidopsis </it>7TMpR candidates. This is because HMMTOP predicted HHP1, HHP2, HHP4, and HHP5 to have seven TM regions and intracellular amino termini, in contrast to known GPCRs. This unusual structural topology was also found in AdipoRs <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B48">48</abbr></abbrgrp>. HHP3 had eight predicted TM regions. Of the 15 MLO proteins, 8 were also predicted to have 8 to 10 TM regions by HMMTOP (Figure <figr fid="F1">1</figr>). Recently, Benton <it>et al</it>. <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> experimentally showed that <it>Drosophila </it>odorant receptors, another extremely diverged 7TMR family, have intracellular amino termini. Among our 394 candidate list, 23 proteins were predicted to have seven TM regions and intracellular amino termini (Additional data file 1). Therefore, we consider these 54 as a minimum working set of 7TMpR candidates, and many of the other proteins included in the list of 394 should be examined in the second stage.</p>
         </sec>
         <sec>
            <st>
               <p>Expression patterns of genes encoding the 7TMpR candidates and G-protein subunits</p>
            </st>
            <p>We utilized the Meta-Analyzer server of the Genevestigator web site to study spatial expression patterns of <it>Arabidopsis </it>genes encoding the 7TMpR candidates and G-protein subunits. Note that the expression of MLO genes were not included in this analysis since we reported them recently <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. As is shown in Figure <figr fid="F2">2</figr>, expression patterns of analyzed 7TMpR candidates can be divided into two major groups; about half of them show distinct tissue specificity, whereas the other half either exhibit less distinct expression patterns or display ubiquitous expression. All genes encoding G-protein subunits fall into the latter major group. Ubiquitous expression of genes encoding G-protein subunits allows overlap with genes in both groups, and makes, in principle, co-functioning of G-proteins with these 7TMpR candidates spatially and temporally possible. All eight genes encoding the MtN3 family proteins appear to have distinct tissue specific expression. Among them, At3g48740 and At4g25010 have the highest sequence similarities to At5g23660 and At5g50800, respectively. Both pairs of genes share similar or overlapping expression patterns, suggesting relatedness/similarity of their functions. Confirming the actual functions of the 7TMpR candidates as GPCRs requires further extensive testing. A possible involvement of these candidate proteins in 'G protein-independent' signaling mechanisms also needs to be explored.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Expression patterns of <it>Arabidopsis </it>genes encoding 7TMpR candidates and G-protein subunits among tissues</p>
               </caption>
               <text>
                  <p>Expression patterns of <it>Arabidopsis </it>genes encoding 7TMpR candidates and G-protein subunits among tissues. The figure was modified from an output of the Meta-Analyzer of Genevestigator (last updated in November 2005), which illustrates expression levels of each gene in different organs. Relative expression levels of a gene in different organs/tissues are given as heat maps in blue-scale coding that reflects absolute signal values, where darker colors represent stronger expression. All gene-level profiles are normalized for coloring such that, for each gene, the highest signal intensity obtains a value 100% (shown in the darkest blue and marked with an asterisk) and absence of signal obtains a value 0% (shown in white). All GeneChip data was processed using Affymetrix MAS5.0. Special precaution is required for gene expression in certain cell types (for example, pollen), since difference in normalization may achieve different results. Probe-sets of five 7TMpR candidates (At1g15620. At1g75000, At4g21570, At4g36850, and At5g23990) were not present in the 22K chip, and, therefore, their tissue-specific expression could not be assessed. For At2g35710, two probe-sets (265797_at<sup>a </sup>and 265841_at<sup>b</sup>) were designed on the chip. Gene names for those belonging to the MtN3 family are shown in boldface and marked with an asterisk. Genes encoding G-protein subunits (<it>AGB1</it>, <it>GPA1</it>, <it>AGG1</it>, and <it>AGG2</it>) as well as two reported 7TMpRs (<it>RGS1 </it>and <it>GCR1</it>) are labeled accordingly in boldface.</p>
               </text>
               <graphic file="gb-2006-7-10-r96-2"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We show that the profile HMM protein classification method, currently one of the most used, is overly specific (conservative) when applied to extremely diverged 7TMpR proteins. Our premise is that there are more 7TMpRs yet to be identified in the <it>A. thaliana </it>and other genomes divergent to humans. The limitations were that the lack of available samples limits the effectiveness of profile HMM methods, and while alignment-free methods are more sensitive, they have high rates for false positives. The candidate 7TMpR proteins provided in this study, for example, can be included to expand the training set and re-iteration using refined training sets can be done to reduce false positive rates. However, this is possible only after these new candidates are confirmed as true positives experimentally.</p>
         <p>The strategy we described here overcomes the 'chicken-or-egg' problem; predictions by multiple protein classification methods and the number of predicted transmembrane regions were used to identify a more likely reduced set of 7TMR candidates. By setting up various methods as hierarchical multiple filters, one can prioritize target protein sets for further experimental confirmation of their functions.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p><it>Arabidopsis </it>protein data</p>
            </st>
            <p>We downloaded 28,952 protein sequences from TIGR (<it>Arabidopsis thaliana </it>database release 5, dated 10 June 2004) <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. Among the 28,952 proteins, 2,760 are derived from alternative splicing.</p>
         </sec>
         <sec>
            <st>
               <p>Training data preparation for protein classification</p>
            </st>
            <p>Positive training samples (known 7TMR sequences) were obtained from GPCRDB (Information System for G Protein-Coupled Receptors, Release 9.0, last updated on 28 June 28 2005) <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. In the GPCRDB, 2,030 7TMRs (originally collected from the Swiss-Prot protein database) were grouped into six major classes (classes A to E plus the Frizzled/Smoothened family) and six putative families (ocular albinism proteins, insect odorant receptors, plant MLO receptors, nematode chemoreceptors, vomeronasal receptors, and taste receptors). Five hundred 7TMR sequences were randomly sampled and used as the positive samples. Note that 'putative/unclassified' (orphan) 7TMRs and bacteriorhodopsins were not included in this dataset. These 500 7TMRs included six of the15 known <it>Arabidopsis </it>MLO proteins. Among the 22 currently known <it>Arabidopsis </it>7TMpRs, in addition to the nine MLO proteins, GCR1 as well as six recently identified <it>Arabidopsis </it>7TMpRs (AtRGS1 and HHP1-5; GPCRDB does not list these proteins) were not included in the random 500 7TMR samples. Note that the 15 <it>Arabidopsis </it>7TMpRs not included in the training set can be used to assess the classifier performance as test cases.</p>
            <p>For negative samples, 500 non-7TMR sequences longer than 100 amino acids were randomly sampled from the Swiss-Prot section of the UniProt Knowledgebase <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>. The average length of the 500 non-7TMR sequences was 401 amino acids (with a maximum length of 2,512 amino acids). Positive and negative samples were combined to create a training dataset. Note that only positive samples were used to train the profile HMM classifier, SAM (see below).</p>
         </sec>
         <sec>
            <st>
               <p>Protein classification methods used</p>
            </st>
            <p>One alignment-based method (profile HMM) and four types of alignment-free multivariate methods were included in our analysis.</p>
            <sec>
               <st>
                  <p>Profile hidden Markov models</p>
               </st>
               <p>Profile HMMs are full probabilistic representations of sequence profiles <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Sample sequences need to be alignable, and thus only positive samples can be used for training. Two programs in SAM (version 3.4) <abbrgrp><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr></abbrgrp> were used: <it>buildmodel </it>to build profile HMMs with the nine-component Dirichlet mixture priors <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>, and <it>hmmscore </it>to calculate scores and e-values. The 'calibration' option (for more accurate e-value calculation) and the fully local scoring option (-sw 2) were used. The e-value threshold was set at 0.01 for choosing 7TMR candidates.</p>
            </sec>
            <sec>
               <st>
                  <p>Discriminant function analysis</p>
               </st>
               <p>Moriyama and Kim <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> described the three parametric (linear, quadratic, logistic) and nonparametric K-nearest neighbor methods that were shown to perform better than the profile HMM method. Therefore, we included these four alignment-free methods (LDA, QDA, LOG, and KNN) in our analysis. For KNN, <it>K </it>was set at 5, 10, 15, or 20, where <it>K </it>is the number of neighbors. The four variables used (amino acid index and three periodicity statistics) were described in Kim <it>et al</it>. <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. S-PLUS statistical package (Insightful Corporation, Seattle, WA, USA, version 6.1.2 for Linux) with the MASS module <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> was used for the classifier development.</p>
            </sec>
            <sec>
               <st>
                  <p>Support vector machines with amino acid composition</p>
               </st>
               <p>SVMs are learning machines that make binary classifications based on a hyperplane separating a remapped instance space <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. A kernel function can be chosen so that the remapped instances on a multidimensional feature space are linearly separable. The radial basis kernel, <it>exp</it>(-&#947;||<it>x </it>- <it>y</it>||<sup>2</sup>), was used in this study. The parameter &#947; was set to 102 based on the median of Euclidean distances between positive examples and the nearest negative example as described in Jaakkola <it>et al</it>. <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>. Simple 19 amino acid frequencies (the 20th amino acid frequency can be explained completely by the other 19) of each protein sequence were used as an input vector for SVMs. Programs <it>svm_learn </it>and <it>svm_classify </it>of the SVM<sup>light </sup>package version 5.0 <abbrgrp><abbr bid="B60">60</abbr></abbrgrp> were used for training and classification, respectively, by SVM. The default value of the regulatory parameter C (0.5006) was used with <it>svm-learn</it>. Our comparative analysis showed that SVM-AA performs better than profile HMMs when they are applied to remote similarity identification, the same problem we deal with in this study (PK Strope and EN Moriyama, submitted).</p>
            </sec>
            <sec>
               <st>
                  <p>Support vector machines with dipeptide composition</p>
               </st>
               <p>We also included an SVM classifier with dipeptide composition <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. The SVM<sup>light </sup>package version 5.0 <abbrgrp><abbr bid="B60">60</abbr></abbrgrp> was used for training and classification as before. The regulatory parameter C = 1 and the radial basis kernel function parameter &#947; = 90 were chosen by the grid analysis using 5-fold cross-validation.</p>
            </sec>
            <sec>
               <st>
                  <p>Partial least squares with amino acid properties</p>
               </st>
               <p>PLS regression is a projection method that takes into account correlations between independent and dependent variables <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. We used the pls.pcr package, an R implementation developed by Wehrens and Mevik <abbrgrp><abbr bid="B62">62</abbr><abbr bid="B63">63</abbr></abbrgrp>, with the SIMPLS method, four latent variables, and cross-validation options. Each amino acid in the protein sequences was first converted to a set of 5 principal component scores developed from 12 physico-chemical properties. The auto/cross covariance (ACC) method developed by Wold <it>et al</it>. <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> was then applied to each of the converted sequences. ACC describes the average correlations between two residues a certain lag (amino acids) apart. The lag size of 30 was chosen for optimal classification performance. We found that the performance of PLS-ACC is robust even when only a small number of positive samples (5 or 10) are available for training. In contrast, the performance of profile HMMs suffered extremely when positive sample size was small. The 12 physico-chemical properties used and more details on the use of PLS in protein classification are described elsewhere <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. The cutoff value of 0.4999 was used for choosing 7TMR candidates in this study, which was determined as the average of the minimum error points <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> obtained from 500 replications of 10-fold cross-validation analysis using the training dataset.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Transmembrane region prediction</p>
            </st>
            <p>HMMTOP 2.0 <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp> and TMHMM (originally as in <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> but implemented as S-TMHMM by <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>) were used for predicting transmembrane regions. Figure <figr fid="F1">1</figr> shows the numbers of TM regions predicted by the two methods for the 500 7TMR sequences used for classifier training. HMMTOP predicted 7TM regions from 433 7TMRs (86.6%), while only 165 7TMRs (33%) were predicted to have 7TM regions by TMHMM. HMMTOP predicted 97% or more of 7TMRs to have 6 to 8 TM regions, and with 5 to 9 TM regions more than 99% of 7TMRs were included. Using TMHMM, in order to include 97% of 7TMRs, the range of predicted TM numbers needs to be between 4 and 10. Therefore, we decided to use HMMTOP in our further analysis. With HMMTOP using the range of five to nine TM regions, we should be able to cover almost all possible 7TM proteins.</p>
         </sec>
         <sec>
            <st>
               <p>Grouping of the candidate proteins</p>
            </st>
            <p>The candidate proteins were grouped based on the e-values obtained by BLASTP protein similarity search <abbrgrp><abbr bid="B67">67</abbr><abbr bid="B68">68</abbr></abbrgrp> against the <it>Arabidopsis </it>protein database using the default parameter set (for example, BLOSUM62) at the TAIR web site <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. The e-value threshold of 10<sup>-20 </sup>was used to identify protein families similar to the candidate proteins.</p>
         </sec>
         <sec>
            <st>
               <p>Expression patterns of genes encoding 7TMR candidates and G-protein subunits</p>
            </st>
            <p>Expression patterns of genes encoding 7TMpR candidates and G-protein subunits among tissues was studied by using the Meta-Analyzer server of the Genevestigator web site (last updated in November 2005) <abbrgrp><abbr bid="B69">69</abbr><abbr bid="B70">70</abbr></abbrgrp>. All data were generated using the 22K Affymetrix ATH1 <it>Arabidopsis </it>Genome array. Gene expression profiles based on microarray data were clustered according to similarity in expression patterns. Hierarchical clustering results were generated by default settings using pairwise Euclidean distances and the average linkage method.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data files are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is the list of the 394 <it>Arabidopsis thaliana </it>7TMpR candidates. Additional data file <supplr sid="S2">2</supplr> lists the 54 7TMpR candidates identified in this study. These 7TMpR candidates were grouped based on their similarities with known protein families. HTML versions of the candidate lists with TAIR links and other supplementary data are available at <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>The 394 <it>A. thaliana </it>7TMpR candidates</p>
            </caption>
            <text>
               <p>The 394 <it>A. thaliana </it>7TMpR candidates.</p>
            </text>
            <file name="gb-2006-7-10-r96-S1.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>The 54 7TMpR candidates identified in this study</p>
            </caption>
            <text>
               <p>These 7TMpR candidates were grouped based on their similarities with known protein families. HTML versions of the candidate lists with TAIR links and other supplementary data are available at [72].</p>
            </text>
            <file name="gb-2006-7-10-r96-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was partly funded by Nebraska EPSCoR Women in Science and NSF EPSCoR Type II grants (to ENM); Bioinformatics Interdisciplinary Research Scholars sponsored by NSF EPSCoR Infrastructure Improvement grant: Bioinformatics Research Laboratory (to PKS and SOO); and grants from the NIGMS (GM65989-01), the DOE (DE-FG02-05er15671), and the NSF (MCB-0209711) (to AMJ).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Seven-transmembrane receptors.</p>
            </title>
            <aug>
               <au>
                  <snm>Pierce</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Premont</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Lefkowitz</snm>
                  <fnm>RJ</fnm>
               </au>
            </aug>
            <source>Nat Rev Mol Cell Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>639</fpage>
            <lpage>650</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrm908</pubid>
                  <pubid idtype="pmpid" link="fulltext">12209124</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>GCR1 can act independently of heterotrimeric G-protein in response to brassinosteroids and gibberellins in <it>Arabidopsis</it> seed germination.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Pandey</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Alonso</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Ecker</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Assmann</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2004</pubdate>
            <volume>135</volume>
            <fpage>907</fpage>
            <lpage>915</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">514125</pubid>
                  <pubid idtype="pmpid" link="fulltext">15181210</pubid>
                  <pubid idtype="doi">10.1104/pp.104.038992</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The signal to move: <it>D. discoideum </it>go orienteering.</p>
            </title>
            <aug>
               <au>
                  <snm>Kimmel</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Parent</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>300</volume>
            <fpage>1525</fpage>
            <lpage>1527</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1085439</pubid>
                  <pubid idtype="pmpid" link="fulltext">12791977</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Transduction of receptor signals by beta-arrestins.</p>
            </title>
            <aug>
               <au>
                  <snm>Lefkowitz</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Shenoy</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>308</volume>
            <fpage>512</fpage>
            <lpage>517</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1109237</pubid>
                  <pubid idtype="pmpid" link="fulltext">15845844</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Molecular mechanisms of ligand binding, signaling, and regulation within the superfamily of G-protein-coupled receptors: molecular modeling and mutagenesis approaches to receptor structure and function.</p>
            </title>
            <aug>
               <au>
                  <snm>Kristiansen</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Pharmacol Ther</source>
            <pubdate>2004</pubdate>
            <volume>103</volume>
            <fpage>21</fpage>
            <lpage>80</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.pharmthera.2004.05.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">15251227</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>GPCRDB information system for G protein-coupled receptors.</p>
            </title>
            <aug>
               <au>
                  <snm>Horn</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bettler</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Campagne</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>FE</fnm>
               </au>
               <au>
                  <snm>Vriend</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>294</fpage>
            <lpage>297</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165550</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520006</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>GPCRDB: Information System for G Protein-coupled Receptors</p>
            </title>
            <url>http://www.gpcr.org/7tm/</url>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Neurobiology of the <it>Caenorhabditis elegans </it>genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Bargmann</snm>
                  <fnm>CI</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <fpage>2028</fpage>
            <lpage>2033</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.282.5396.2028</pubid>
                  <pubid idtype="pmpid" link="fulltext">9851919</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Topology, subcellular localization, and sequence diversity of the Mlo family in plants.</p>
            </title>
            <aug>
               <au>
                  <snm>Devoto</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Piffanelli</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Nilsson</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Wallin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Panstruga</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schulze-Lefert</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1999</pubdate>
            <volume>274</volume>
            <fpage>34993</fpage>
            <lpage>35004</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.274.49.34993</pubid>
                  <pubid idtype="pmpid" link="fulltext">10574976</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Molecular phylogeny and evolution of the plant-specific seven-transmembrane MLO family.</p>
            </title>
            <aug>
               <au>
                  <snm>Devoto</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hartmann</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Piffanelli</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Elliott</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Simmons</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Taramino</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Goh</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>FE</fnm>
               </au>
               <au>
                  <snm>Emerson</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Schulze-Lefert</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2003</pubdate>
            <volume>56</volume>
            <fpage>77</fpage>
            <lpage>88</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00239-002-2382-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">12569425</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Cloning of a putative G-protein-coupled receptor from <it>Arabidopsis thaliana</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Josefsson</snm>
                  <fnm>LG</fnm>
               </au>
               <au>
                  <snm>Rask</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Eur J Biochem</source>
            <pubdate>1997</pubdate>
            <volume>249</volume>
            <fpage>415</fpage>
            <lpage>420</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1432-1033.1997.t01-1-00415.x</pubid>
                  <pubid idtype="pmpid">9370348</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The <it>Arabidopsis </it>putative G protein-coupled receptor GCR1 interacts with the G protein alpha subunit GPA1 and regulates abscisic acid signaling.</p>
            </title>
            <aug>
               <au>
                  <snm>Pandey</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Assmann</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>2004</pubdate>
            <volume>16</volume>
            <fpage>1616</fpage>
            <lpage>1632</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">490050</pubid>
                  <pubid idtype="pmpid" link="fulltext">15155892</pubid>
                  <pubid idtype="doi">10.1105/tpc.020321</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>A novel gene family in <it>Arabidopsis </it>encoding putative heptahelical transmembrane proteins homologous to human adiponectin receptors and progestin receptors.</p>
            </title>
            <aug>
               <au>
                  <snm>Hsieh</snm>
                  <fnm>M-H</fnm>
               </au>
               <au>
                  <snm>Goodman</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>J Exp Bot</source>
            <pubdate>2005</pubdate>
            <volume>56</volume>
            <fpage>3137</fpage>
            <lpage>3147</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/jxb/eri311</pubid>
                  <pubid idtype="pmpid" link="fulltext">16263907</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>A seven-transmembrane RGS protein that modulates plant cell proliferation.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J-G</fnm>
               </au>
               <au>
                  <snm>Willard</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chasse</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Siderovski</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>301</volume>
            <fpage>1728</fpage>
            <lpage>1731</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1087790</pubid>
                  <pubid idtype="pmpid" link="fulltext">14500984</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Role of a heterotrimeric G protein in regulation of <it>Arabidopsis </it>seed germination.</p>
            </title>
            <aug>
               <au>
                  <snm>Ullah</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2002</pubdate>
            <volume>129</volume>
            <fpage>897</fpage>
            <lpage>907</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">161710</pubid>
                  <pubid idtype="pmpid" link="fulltext">12068128</pubid>
                  <pubid idtype="doi">10.1104/pp.005017</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Evidence for kinship between diverse G-protein coupled receptors.</p>
            </title>
            <aug>
               <au>
                  <snm>Josefsson</snm>
                  <fnm>LG</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1999</pubdate>
            <volume>239</volume>
            <fpage>333</fpage>
            <lpage>340</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(99)00392-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">10548735</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Plants: the latest model system for G-protein research.</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Assmann</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>Embo Rep</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>572</fpage>
            <lpage>578</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1299082</pubid>
                  <pubid idtype="pmpid" link="fulltext">15170476</pubid>
                  <pubid idtype="doi">10.1038/sj.embor.7400174</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Occurrence in <it>Saccharomyces cerevisiae </it>of a gene homologous to the cDNA coding for the alpha subunit of mammalian G proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Nakafuku</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kaziro</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1987</pubdate>
            <volume>84</volume>
            <fpage>2140</fpage>
            <lpage>2144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">304604</pubid>
                  <pubid idtype="pmpid" link="fulltext">3031665</pubid>
                  <pubid idtype="doi">10.1073/pnas.84.8.2140</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Isolation of a second yeast <it>Saccharomyces cerevisiae </it>gene (GPA2) coding for guanine nucleotide-binding regulatory protein: studies on its structure and possible functions.</p>
            </title>
            <aug>
               <au>
                  <snm>Nakafuku</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Obara</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kaibuchi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Miyajima</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Miyajima</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Arai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Matsumoto</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kaziro</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>1374</fpage>
            <lpage>1378</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">279773</pubid>
                  <pubid idtype="pmpid" link="fulltext">2830616</pubid>
                  <pubid idtype="doi">10.1073/pnas.85.5.1374</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The STE4 and STE18 genes of yeast encode potential beta and gamma subunits of the mating factor receptor-coupled G protein.</p>
            </title>
            <aug>
               <au>
                  <snm>Whiteway</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hougan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Dignard</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>DY</fnm>
               </au>
               <au>
                  <snm>Bell</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Saari</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>Grant</snm>
                  <fnm>FJ</fnm>
               </au>
               <au>
                  <snm>O'Hara</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>MacKay</snm>
                  <fnm>VL</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1989</pubdate>
            <volume>56</volume>
            <fpage>467</fpage>
            <lpage>477</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(89)90249-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">2536595</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Overlapping functions for two G protein alpha subunits in <it>Neurospora crassa</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Baasiri</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Borkovich</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1997</pubdate>
            <volume>147</volume>
            <fpage>137</fpage>
            <lpage>145</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1208097</pubid>
                  <pubid idtype="pmpid">9286674</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Identification of a G protein alpha subunit from <it>Neurospora crassa </it>that is a member of the Gi family.</p>
            </title>
            <aug>
               <au>
                  <snm>Turner</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Borkovich</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1993</pubdate>
            <volume>268</volume>
            <fpage>14805</fpage>
            <lpage>14811</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8325859</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The genome sequence of the filamentous fungus <it>Neurospora crassa</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Galagan</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Calvo</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Borkovich</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Selker</snm>
                  <fnm>EU</fnm>
               </au>
               <au>
                  <snm>Read</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Jaffe</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>FitzHugh</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Smirnov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Purcell</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>422</volume>
            <fpage>859</fpage>
            <lpage>868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01554</pubid>
                  <pubid idtype="pmpid" link="fulltext">12712197</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>ARAMEMNON, a novel database for <it>Arabidopsis</it> integral membrane proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Schwacke</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>van der Graaff</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Fischer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Catoni</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Desimone</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Frommer</snm>
                  <fnm>WB</fnm>
               </au>
               <au>
                  <snm>Flugge</snm>
                  <fnm>UI</fnm>
               </au>
               <au>
                  <snm>Kunze</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2003</pubdate>
            <volume>131</volume>
            <fpage>16</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">166783</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529511</pubid>
                  <pubid idtype="doi">10.1104/pp.011577</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>ARAMEMNON: Plant Membrane Protein Database</p>
            </title>
            <url>http://aramemnon.botanik.uni-koeln.de</url>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Novel G-protein-coupled receptor-like proteins in the plant pathogenic fungus <it>Magnaporthe grisea</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Kulkarni</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Thon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Dean</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>R24</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1088943</pubid>
                  <pubid idtype="pmpid" link="fulltext">15774025</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-3-r24</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <aug>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mitchison</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids</source>
            <publisher>Cambridge: Cambridge University Press</publisher>
            <pubdate>1998</pubdate>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The Pfam protein families database.</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Coin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Finn</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Hollich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Khanna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Moxon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D138</fpage>
            <lpage>141</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308855</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681378</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh121</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Pfam: Database of Protein Families and HMMs</p>
            </title>
            <url>http://pfam.janelia.org/</url>
         </bibl>
         <bibl id="B30">
            <title>
               <p>SMART 4.0: towards genomic data integration.</p>
            </title>
            <aug>
               <au>
                  <snm>Letunic</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Copley</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Schmidt</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ciccarelli</snm>
                  <fnm>FD</fnm>
               </au>
               <au>
                  <snm>Doerks</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D142</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308822</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681379</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>SMART 4.0</p>
            </title>
            <url>http://smart.embl.de/</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.</p>
            </title>
            <aug>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Karplus</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hughey</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>313</volume>
            <fpage>903</fpage>
            <lpage>919</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5080</pubid>
                  <pubid idtype="pmpid" link="fulltext">11697912</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The PROSITE database.</p>
            </title>
            <aug>
               <au>
                  <snm>Hulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bulliard</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cerutti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>De Castro</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Langendijk-Genevaux</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Pagni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sigrist</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D227</fpage>
            <lpage>230</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347426</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381852</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>PROSITE: Database of Protein Families and Domains</p>
            </title>
            <url>http://www.expasy.org/prosite/</url>
         </bibl>
         <bibl id="B35">
            <title>
               <p>PRINTS and its automatic supplement, prePRINTS.</p>
            </title>
            <aug>
               <au>
                  <snm>Attwood</snm>
                  <fnm>TK</fnm>
               </au>
               <au>
                  <snm>Bradley</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Flower</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Gaulton</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Maudling</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mitchell</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Moulton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Nordle</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Paine</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>400</fpage>
            <lpage>402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165477</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520033</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg030</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>PRINTS</p>
            </title>
            <url>http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties.</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Moriyama</snm>
                  <fnm>EN</fnm>
               </au>
               <au>
                  <snm>Warr</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Clyne</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>767</fpage>
            <lpage>775</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.9.767</pubid>
                  <pubid idtype="pmpid" link="fulltext">11108699</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Protein family classification with discriminant function analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Moriyama</snm>
                  <fnm>EN</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Exploitation: Data Mining the Genome</source>
            <publisher>New York: Springer</publisher>
            <editor>Gustafson JP, Shoemaker R, Snape JW</editor>
            <pubdate>2005</pubdate>
            <fpage>121</fpage>
            <lpage>132</lpage>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Classifying G-protein coupled receptors with support vector machines.</p>
            </title>
            <aug>
               <au>
                  <snm>Karchin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Karplus</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>147</fpage>
            <lpage>159</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.1.147</pubid>
                  <pubid idtype="pmpid" link="fulltext">11836223</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors.</p>
            </title>
            <aug>
               <au>
                  <snm>Bhasin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Raghava</snm>
                  <fnm>GP</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>W383</fpage>
            <lpage>389</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">441554</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215416</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>GPCRpred</p>
            </title>
            <url>http://www.imtech.res.in/raghava/gpcrpred/</url>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Classification of G-protein coupled receptors by alignment-independent extraction of principal chemical properties of primary amino acid sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Lapinsh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gutcaits</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Prusis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Post</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lundstedt</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wikberg</snm>
                  <fnm>JES</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2002</pubdate>
            <volume>11</volume>
            <fpage>795</fpage>
            <lpage>805</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.2500102</pubid>
                  <pubid idtype="pmpid" link="fulltext">11910023</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Protein family classification with partial least squares.</p>
            </title>
            <aug>
               <au>
                  <snm>Opiyo</snm>
                  <fnm>SO</fnm>
               </au>
               <au>
                  <snm>Moriyama</snm>
                  <fnm>EN</fnm>
               </au>
            </aug>
            <source>J Proteome Res</source>
            <inpress/>
         </bibl>
         <bibl id="B44">
            <title>
               <p>The <it>Arabidopsis </it>Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community.</p>
            </title>
            <aug>
               <au>
                  <snm>Rhee</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Beavis</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Berardini</snm>
                  <fnm>TZ</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dixon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Doyle</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Garcia-Hernandez</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huala</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Montoya</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>224</fpage>
            <lpage>228</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165523</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519987</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg076</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>The <it>Arabidopsis </it>Information Resource</p>
            </title>
            <url>http://www.arabidopsis.org</url>
         </bibl>
         <bibl id="B46">
            <title>
               <p>The HMMTOP transmembrane topology prediction server.</p>
            </title>
            <aug>
               <au>
                  <snm>Tusnady</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>849</fpage>
            <lpage>850</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.9.849</pubid>
                  <pubid idtype="pmpid" link="fulltext">11590105</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>HMMTOP</p>
            </title>
            <url>http://www.enzim.hu/hmmtop</url>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Cloning of adiponectin receptors that mediate antidiabetic metabolic effects.</p>
            </title>
            <aug>
               <au>
                  <snm>Yamauchi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kamon</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ito</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tsuchida</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yokomizo</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kita</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sugiyama</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Miyagishi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hara</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tsunoda</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>423</volume>
            <fpage>762</fpage>
            <lpage>769</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01705</pubid>
                  <pubid idtype="pmpid" link="fulltext">12802337</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Atypical membrane topology and heteromeric function of <it>Drosophila</it> odorant receptors <it>in vivo</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Benton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sachse</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Michnick</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Vosshall</snm>
                  <fnm>LB</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <fpage>e20</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1334387</pubid>
                  <pubid idtype="pmpid" link="fulltext">16402857</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0040020</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Expression analysis of the AtMLO gene family encoding plant-specific seven-transmembrane domain proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Hartmann</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Pulley</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schulze-Lefert</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Panstruga</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>2006</pubdate>
            <volume>60</volume>
            <fpage>583</fpage>
            <lpage>597</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s11103-005-5082-x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16525893</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>The Institute for Genomic Research (TIGR) <it>Arabidopsis thaliana</it> Database ftp site</p>
            </title>
            <url>ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/SEQUENCES/ATH1.pep.gz</url>
         </bibl>
         <bibl id="B52">
            <title>
               <p>The Universal Protein Resource (UniProt).</p>
            </title>
            <aug>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Boeckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ferro</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Magrane</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D154</fpage>
            <lpage>159</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540024</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608167</pubid>
                  <pubid idtype="doi">10.1093/nar/gki070</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>UniProt: The Universal Protein Resource</p>
            </title>
            <url>http://www.uniprot.org</url>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Hidden Markov models for sequence analysis: Extension and analysis of the basic method.</p>
            </title>
            <aug>
               <au>
                  <snm>Hughey</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1996</pubdate>
            <volume>12</volume>
            <fpage>95</fpage>
            <lpage>107</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8744772</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>SAM: Sequence Alignment and Modeling System</p>
            </title>
            <url>http://www.cse.ucsc.edu/research/compbio/sam.html</url>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Dirichlet mixtures: a method for improving detection of weak but significant protein sequence homology.</p>
            </title>
            <aug>
               <au>
                  <snm>Sjolander</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Karplus</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hughey</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mian</snm>
                  <fnm>IS</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1996</pubdate>
            <volume>12</volume>
            <fpage>327</fpage>
            <lpage>345</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8902360</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>S-plus MASS module</p>
            </title>
            <url>http://www.stats.ox.ac.uk/pub/MASS4/</url>
         </bibl>
         <bibl id="B58">
            <aug>
               <au>
                  <snm>Vapnik</snm>
                  <fnm>VN</fnm>
               </au>
            </aug>
            <source>The Nature of Statistical Learning Theory</source>
            <publisher>New York: Springer-Verlag</publisher>
            <edition>2</edition>
            <pubdate>1999</pubdate>
         </bibl>
         <bibl id="B59">
            <title>
               <p>A discriminative framework for detecting remote protein homologies.</p>
            </title>
            <aug>
               <au>
                  <snm>Jaakkola</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>95</fpage>
            <lpage>114</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270050081405</pubid>
                  <pubid idtype="pmpid" link="fulltext">10890390</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Making large-Scale SVM learning practical.</p>
            </title>
            <aug>
               <au>
                  <snm>Joachims</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Advances in Kernel Methods - Support Vector Learning</source>
            <publisher>Cambridge: MIT Press</publisher>
            <editor>Sch&#246;lkopf B, Burges C, Smola A</editor>
            <pubdate>1999</pubdate>
            <fpage>169</fpage>
            <lpage>184</lpage>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Partial least squares regression: A tutorial.</p>
            </title>
            <aug>
               <au>
                  <snm>Geladi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kowalski</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Anal Chim Acta</source>
            <pubdate>1986</pubdate>
            <volume>185</volume>
            <fpage>1</fpage>
            <lpage>17</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0003-2670(86)80028-9</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <aug>
               <au>
                  <cnm>R Development Core Team</cnm>
               </au>
            </aug>
            <source>R: A Language and Environment for Statistical Computing</source>
            <publisher>Vienna, Austria: R Foundation for Statistical Computing</publisher>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B63">
            <title>
               <p>pls: Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR): R package version 1.2-1.</p>
            </title>
            <url>http://mevik.net/work/software/pls.html</url>
         </bibl>
         <bibl id="B64">
            <title>
               <p>DNA and peptide sequences and chemical processes multivariately modeled by principal component analysis and partial least-squares projections to latent structures.</p>
            </title>
            <aug>
               <au>
                  <snm>Wold</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jonsson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sjostrom</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sandberg</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rannar</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Anal Chim Acta</source>
            <pubdate>1993</pubdate>
            <volume>277</volume>
            <fpage>239</fpage>
            <lpage>253</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0003-2670(93)80437-P</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>A hidden Markov model for predicting transmembrane helices in protein sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Int Conf Intell Syst Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>6</volume>
            <fpage>175</fpage>
            <lpage>182</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9783223</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information.</p>
            </title>
            <aug>
               <au>
                  <snm>Viklund</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Elofsson</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <fpage>1908</fpage>
            <lpage>1917</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.04625404</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215532</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Basic local alignment search tool.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
                  <pubid idtype="pmpid" link="fulltext">2231712</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>BLAST</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/BLAST/</url>
         </bibl>
         <bibl id="B69">
            <title>
               <p>GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox.</p>
            </title>
            <aug>
               <au>
                  <snm>Zimmermann</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hirsch-Hoffmann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hennig</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gruissem</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2004</pubdate>
            <volume>136</volume>
            <fpage>2621</fpage>
            <lpage>2632</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">523327</pubid>
                  <pubid idtype="pmpid" link="fulltext">15375207</pubid>
                  <pubid idtype="doi">10.1104/pp.104.046367</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>Genevestigator: Arabidopsis Microarray Database and Analysis Toolbox</p>
            </title>
            <url>https://www.genevestigator.ethz.ch</url>
         </bibl>
         <bibl id="B71">
            <title>
               <p><it>Arabidopsis thaliana </it>7TMR Mining</p>
            </title>
            <url>http://bioinfolab.unl.edu/emlab/at7tmr/index.html</url>
         </bibl>
      </refgrp>
   </bm>
</art>
