<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-12-r264</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Evolution of allostery in the cyclic nucleotide binding module</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Kannan</snm>
               <fnm>Natarajan</fnm>
               <insr iid="I1"/>
               <email>kannan@ucsd.edu</email>
            </au>
            <au id="A2">
               <snm>Wu</snm>
               <fnm>Jian</fnm>
               <insr iid="I1"/>
               <email>jianwu@ucsd.edu</email>
            </au>
            <au id="A3">
               <snm>Anand</snm>
               <mi>S</mi>
               <fnm>Ganesh</fnm>
               <insr iid="I2"/>
               <email>dbsgsa@nus.edu.sg</email>
            </au>
            <au id="A4">
               <snm>Yooseph</snm>
               <fnm>Shibu</fnm>
               <insr iid="I3"/>
               <email>SYooseph@venterinstitute.org</email>
            </au>
            <au id="A5">
               <snm>Neuwald</snm>
               <mi>F</mi>
               <fnm>Andrew</fnm>
               <insr iid="I4"/>
               <email>aneuwald@som.umaryland.edu</email>
            </au>
            <au id="A6">
               <snm>Venter</snm>
               <fnm>J Craig</fnm>
               <insr iid="I3"/>
               <email>jcventer@venterinstitute.org</email>
            </au>
            <au id="A7" ca="yes">
               <snm>Taylor</snm>
               <mi>S</mi>
               <fnm>Susan</fnm>
               <insr iid="I5"/>
               <email>staylor@ucsd.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Chemistry and Biochemistry, University of California, Gilman Drive, La Jolla, California, 92093-0654, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biological Sciences, Science Drive 4, National University of Singapore, Singapore 117543</p>
            </ins>
            <ins id="I3">
               <p>J Craig Venter Institute, Medical Center Drive, Rockville, MD 20850, USA</p>
            </ins>
            <ins id="I4">
               <p>Institute for Genome Sciences and Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, HSF-II, Penn Street, Baltimore, MD 21201, USA</p>
            </ins>
            <ins id="I5">
               <p>Department of Chemistry and Biochemistry, and HHMI, University of California, Gilman Drive, La Jolla, California, 92093-0654, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>12</issue>
         <fpage>R264</fpage>
         <url>http://genomebiology.com/2007/8/12/R264</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18076763</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-12-r264</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>29</day>
               <month>8</month>
               <year>2007</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>18</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>12</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>12</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Kannan et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Evolution of allostery</p>
      </shorttitle>
      <shortabs>
         <p>Analysis of cyclic nucleotide binding (CNB) domains shows that they have evolved to sense a wide variety of second messenger signals; a mechanism for allosteric regulation by CNB domains is proposed.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The cyclic nucleotide binding (CNB) domain regulates signaling pathways in both eukaryotes and prokaryotes. In this study, we analyze the evolutionary information embedded in genomic sequences to explore the diversity of signaling through the CNB domain and also how the CNB domain elicits a cellular response upon binding to cAMP.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Identification and classification of CNB domains in Global Ocean Sampling and other protein sequences reveals that they typically are fused to a wide variety of functional domains. CNB domains have undergone major sequence variation during evolution. In particular, the sequence motif that anchors the cAMP phosphate (termed the PBC motif) is strikingly different in some families. This variation may contribute to ligand specificity inasmuch as members of the prokaryotic cooA family, for example, harbor a CNB domain that contains a non-canonical PBC motif and that binds a heme ligand in the cAMP binding pocket. Statistical comparison of the functional constraints imposed on the canonical and non-canonical PBC containing sequences reveals that a key arginine, which coordinates with the cAMP phosphate, has co-evolved with a glycine in a distal &#946;2-&#946;3 loop that allosterically couples cAMP binding to distal regulatory sites.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our analysis suggests that CNB domains have evolved as a scaffold to sense a wide variety of second messenger signals. Based on sequence, structural and biochemical data, we propose a mechanism for allosteric regulation by CNB domains.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010001">Biochemistry and structural biology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The cyclic nucleotide binding (CNB) domain is a conserved signaling module that has evolved to respond to second messenger signals such as cAMP and cGMP <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. The CNB domain is ubiquitous in eukaryotes and controls a variety of cellular functions in a cAMP/cGMP dependent manner. Some of the well characterized CNB domain containing families in eukaryotes include: the protein kinase A (PKA) regulatory subunit that regulates the activity of PKA <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>; the guanine nucleotide exchange factor that regulates nucleotide exchange in small GTPases <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>; and the ion channels that regulate metal ion gating (reviewed in <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>).</p>
         <p>CNB domains also occur in prokaryotes. The first characterized family containing a CNB domain in prokaryotes is the CAP (catabolite gene activator protein) family of transcriptional regulators <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> that contain a DNA binding helix-turn-helix (HTH) domain covalently linked to the CNB domain <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. This domain organization is important for CAP function as it couples cAMP binding functions of the CNB domain with DNA binding functions of the HTH domain <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. The CAP family is functionally diverse and, in addition to cAMP, responds to other exogenous signals, such as carbon monoxide (CO) and nitric oxide (NO) (reviewed in <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>). The cooA subfamily, for instance, responds to CO signals and binds a heme ligand in the cAMP binding pocket <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Likewise, the CprK subfamily of transcriptional regulators binds to ortho-chlorophenolic compounds in the cAMP binding pocket <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p>Crystal structures of CNB domains from both eukaryotes and prokaryotes have been determined and their structural comparison reveals a conserved mode of cAMP recognition <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and regulation (reviewed in <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>). CNB domains are characterized by an eight stranded beta barrel domain (beta subdomain) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> that is conserved among all CNB domain containing proteins <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. A key structural region within the beta subdomain is the phosphate binding cassette (PBC) that anchors the phosphate group of cAMP <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. CNB domains also contain a helical subdomain (henceforth called alpha subdomain), which, unlike the beta subdomain, is more variable in sequence and structure. The helical subdomain is also a docking site for the catalytic subunit of PKA <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
         <p>An emerging theme in CNB domain signaling is the allosteric control of CNB domain functions. In the PKA regulatory subunit, for instance, cAMP binding to the beta subdomain causes conformational changes in the distal alpha subdomain, thereby releasing its inhibitory interactions with the catalytic subunit <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. This propagation of the cAMP signal to distal regulatory sites was suggested to involve specific regions in the beta subdomain <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Specifically, a loop connecting the &#946;2 and &#946;3 strands (&#946;2-&#946;3 loop) was shown to undergo large chemical shift changes upon binding to cAMP <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. While these and other studies have provided important insights into PKA allostery, it is not known whether this mode of regulation is unique to the PKA regulatory subunit or is conserved among other members of the CNB domain superfamily. Here, we address this question by extracting and analyzing the evolutionary information encoded within CNB domain containing sequences. Towards this end, we have identified nearly 7,700 CNB domain containing proteins, and classified them into 30 distinct families. A systematic comparison of these families reveals that the CNB domains recombine with a wide variety of functional domains to respond to diverse cellular signals. Statistical comparison of the evolutionary constraints imposed on CNB domain sequences reveals that the residues that anchor the phosphate group of cAMP (within the beta subdomain) have co-evolved with residues in the &#946;2-&#946;3 loop. Analyzing these residues in light of existing structural and biochemical data provides a model of allostery that is conserved through evolution.</p>
         <p>In the following sections, we first describe the identification and classification of CNB domains to illustrate the diversity of this protein family, and later show how a comparative analysis of CNB domain sequences has provided insights into the evolution of allostery.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Identification and classification of CNB domains in the public and Global Ocean Sampling data</p>
            </st>
            <p>Cyclic nucleotide binding domains in the National Center for Biotechnology Information's non-redundant amino acid database (NR) and Global Ocean Sampling (GOS) <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp> data were identified using a combination of psi-blast profiles and motif models (see Materials and methods). This resulted in nearly 5,241 significant hits in NR and 2,455 hits in the GOS data. Most of the identified sequences were multi-domain proteins in that they contained other functional domains covalently linked to the CNB domain. Because these functional domains play an important role in CNB domain functions, they were used as markers for annotation and classification (see below).</p>
            <p>The 7,696 CNB domain containing sequences can be classified into 30 distinct families (Figure <figr fid="F1">1</figr>) based on the sequence similarity within the CNB domain (see Materials and methods). These 30 families are predominantly eukaryotic or bacterial in origin (Table <tblr tid="T1">1</tblr>). The only significant hit in Archea was to a hypothetical protein (gi: 11498576) from <it>Archaeoglobus fulgidus</it>. CNB domains in eukaryotes can be broadly classified into five major categories: the kinase domain associated PKA and PKG families; the guaninine nucleotide exchange factor (Epac's); transmembrane domain containing HCN and Na channels; HCN type channels in protozoans; and CNB domains in metazoans and plants that are fused to functional domains such as PAS domains, PP2C like phosphatases and phospholipases ('Other_Eukaryotic' in Table <tblr tid="T1">1</tblr>). Several of these families/subfamilies are lineage-specific and contain domain combinations that have not been reported before. The PP2C like phosphatase, for instance, is a plant specific subfamily that contains a kinase domain carboxy-terminal of the CNB domain. The co-occurrence of kinases, phosphatase and CNB domains in the same operon is interesting because previous bioinformatics analysis had failed to provide any evidence for a cAMP or cGMP dependent regulation of kinase activity in plants <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Classification and domain organization of CNB domain containing families</p>
               </caption>
               <text>
                  <p>Classification and domain organization of CNB domain containing families. <b>(a) </b>Phylogenetic tree of the 30 identified families. Eukaryotic branches are shown in dark teal, while the prokaryotic branches are shaded in gold. Novel families in bacteria are indicated by red dots. Families that have a non-canonical PBC are indicated by blue dots. <b>(b) </b>Domain organization of known and novel CNB domain containing proteins in eukaryotes and prokaryotes.</p>
               </text>
               <graphic file="gb-2007-8-12-r264-1"/>
            </fig>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Classification of CNB domains in the public and GOS data</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>No.</p>
                     </c>
                     <c ca="center">
                        <p>Family name</p>
                     </c>
                     <c ca="center">
                        <p>NR/GOS count</p>
                     </c>
                     <c ca="left">
                        <p>Taxonomic origin</p>
                     </c>
                     <c ca="left">
                        <p>PBC consensus motif</p>
                     </c>
                     <c ca="left">
                        <p>Description</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>PKA-Rsub</p>
                     </c>
                     <c ca="center">
                        <p>301/0</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GELALIYGTP<b>R</b>AATVVA</p>
                     </c>
                     <c ca="left">
                        <p>cAMP dependent regulatory subunit that activates PKA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>PKG</p>
                     </c>
                     <c ca="center">
                        <p>388/9</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GELALLYNDP<b>R</b>TATVIA</p>
                     </c>
                     <c ca="left">
                        <p>cGMP activated proteins that are typically attached to a kinase domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>PKG-parasites</p>
                     </c>
                     <c ca="center">
                        <p>362/11</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GERALLYDEP<b>R</b>SATIKA</p>
                     </c>
                     <c ca="left">
                        <p>A distinct group of PKGs in parasites that are also attached to kinase domains</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>Other_eukaryotic</p>
                     </c>
                     <c ca="center">
                        <p>940/201</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GELALLYNAP<b>R</b>AATVVA</p>
                     </c>
                     <c ca="left">
                        <p>CNB domains from metazoans and plants. These are attached to various functional domains such as PKs, PAS domains, PP2C like phosphatases and phospholipases</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>Epac</p>
                     </c>
                     <c ca="center">
                        <p>150/1</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GQLALVNDAP<b>R</b>AATIVL</p>
                     </c>
                     <c ca="left">
                        <p>cAMP-dependent guanine nucleotide exchange factors. Typically attached to an amino-terminal DEP domain and a carboxy-terminal RasGEF domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>PDZ-GEF</p>
                     </c>
                     <c ca="center">
                        <p>125/0</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GVSPTMDKEYMKGVMRT</p>
                     </c>
                     <c ca="left">
                        <p>A distinct class of Epac's, also called Epac6, which contains a PDZ domain in between the CNB and RasGEF domain. Epac's of this class contain a non-canonical PBC</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>K-channel</p>
                     </c>
                     <c ca="center">
                        <p>86/0</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEVGVLCYRPQLFTVRT</p>
                     </c>
                     <c ca="left">
                        <p>Potassium channels specific to plants. Most of them contain an Ankryin repeat carboxy-terminal to the CNB domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>LR_CC</p>
                     </c>
                     <c ca="center">
                        <p>148/4</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEIGVLLDPP<b>R</b>TATVRA</p>
                     </c>
                     <c ca="left">
                        <p>CNB domains found in metazoans and fungi, usually occur in tandem like the PKA regulatory subunit and contain a carboxy-terminal F-box domain and leucine rich domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>HCN</p>
                     </c>
                     <c ca="center">
                        <p>165/5</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEICLLTRGR<b>R</b>TASVRA</p>
                     </c>
                     <c ca="left">
                        <p>cGMP-gated cation channels. Mostly present in metazoans</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>K_HCN</p>
                     </c>
                     <c ca="center">
                        <p>185/0</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GENFWLYGTKSNADVRA</p>
                     </c>
                     <c ca="left">
                        <p>Potassium channels that contain a PAC motif (motif carboxy-terminal of PAS) amino-terminal of the trans-membrane segment. This subfamily also contains a non-canonical PBC</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>Channel_Tetrahym.</p>
                     </c>
                     <c ca="center">
                        <p>218/44</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEEDFFSGQP<b>R</b>TFTAKC</p>
                     </c>
                     <c ca="left">
                        <p>Likely HCN channels from the single celled eukaryote <it>Tetrahymena thermophila</it>. This subfamily is quite distinct from the HCN channels in higher eukaryotes</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>Channel_protozoa</p>
                     </c>
                     <c ca="center">
                        <p>587/41</p>
                     </c>
                     <c ca="left">
                        <p>Eukaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEISFFTGLP<b>R</b>TASARS</p>
                     </c>
                     <c ca="left">
                        <p>Other HCN channels in protozoans</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>Bact_Pyrredox</p>
                     </c>
                     <c ca="center">
                        <p>38/70</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEMGLISGRR<b>R</b>GATVRA</p>
                     </c>
                     <c ca="left">
                        <p>Tandem CNB domains that are attached to an amino-terminal pyridine nucleotide-disulphide oxidoreductase domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>Channel_Bact</p>
                     </c>
                     <c ca="center">
                        <p>99/79</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEIALLTGGP<b>R</b>TATVRA</p>
                     </c>
                     <c ca="left">
                        <p>Bacterial CNBs that are attached to mechanosensitive ion channels</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>HisK</p>
                     </c>
                     <c ca="center">
                        <p>56/11</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GELSLLTGGP<b>R</b>SATVRA</p>
                     </c>
                     <c ca="left">
                        <p>Bacterial CNBs that contain a HisK like ATPase, carboxy-terminal of the CNB domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>AAA_Atpase</p>
                     </c>
                     <c ca="center">
                        <p>65/24</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEMALLSGQE<b>R</b>KASVIA</p>
                     </c>
                     <c ca="left">
                        <p>A distinct sub-group containing AAA-ATPase domains attached to the CNB domain. Several members of this group contain an ABC-transporter like transmembrane region. The PBC arginine (Arg209) is quite variable within this family</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>NtcA</p>
                     </c>
                     <c ca="center">
                        <p>108/104</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GVLSLLTGSD<b>R</b>FYHAVA</p>
                     </c>
                     <c ca="left">
                        <p>Nitrogen responsive regulatory protein that contains a DNA binding domain (HTH) carboxy-terminal of the CNB domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>FixK</p>
                     </c>
                     <c ca="center">
                        <p>43/0</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>G-ASLGGDHLFTAEA</p>
                     </c>
                     <c ca="left">
                        <p>Involved in nitrogen fixation and contains a HTH motif</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>FnR</p>
                     </c>
                     <c ca="center">
                        <p>176/53</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEFDAIGSGHHPSFAQA</p>
                     </c>
                     <c ca="left">
                        <p>Transcriptional regulators that are implicated in oxygen sensing</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>ArcR</p>
                     </c>
                     <c ca="center">
                        <p>29/0</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>PYGGLFTDDYYHESATA</p>
                     </c>
                     <c ca="left">
                        <p>Transcriptional regulator that is implicated in the aerobic arginase reaction. Arginine is used as a source of energy in bacteria</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>NnR</p>
                     </c>
                     <c ca="center">
                        <p>28/0</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GFARALQRGDYPGTATA</p>
                     </c>
                     <c ca="left">
                        <p>Transcriptional regulators that act on the <it>nir </it>and <it>nor </it>operons to achieve expression under aerobic conditions</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>CBS</p>
                     </c>
                     <c ca="center">
                        <p>173/51</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GERALLAGGPYSLTARA</p>
                     </c>
                     <c ca="left">
                        <p>This group contains tandem CBS domain located carboxy-terminal of the CNB domain</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>Other_bacterial</p>
                     </c>
                     <c ca="center">
                        <p>1553/1486</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEMALLDGEP<b>R</b>SATVVA</p>
                     </c>
                     <c ca="left">
                        <p>Bacterial CNB domains that are attached to various functional domains such as CheY response regulators, Rhodanese homology domain, kinases and DNA binding domains</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>HTH_ICLR</p>
                     </c>
                     <c ca="center">
                        <p>33/14</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEGAAFSEEP<b>R</b>STTVVA</p>
                     </c>
                     <c ca="left">
                        <p>Transcriptional regulator that is implicated in the repression of the acetate operon (also known as glyoxylate bypass operon) in <it>Escherichia coli </it>and <it>Salmonella typhimurium</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>HTH_GNTR</p>
                     </c>
                     <c ca="center">
                        <p>85/52</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEASLFDGEP<b>R</b>SATVVA</p>
                     </c>
                     <c ca="left">
                        <p>Transcriptional regulator containing a HTH domain and implicated in the repression of the gluconate operon</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>26</p>
                     </c>
                     <c ca="center">
                        <p>Flp</p>
                     </c>
                     <c ca="center">
                        <p>19/0</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEEALFGESNHANYCEA</p>
                     </c>
                     <c ca="left">
                        <p>Involved in the bacterial oxidative stress response</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>HTH_ARSR</p>
                     </c>
                     <c ca="center">
                        <p>66/15</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEAALFSNGPYPATAIA</p>
                     </c>
                     <c ca="left">
                        <p>Functions as a transcriptional repressor of an arsenic resistance operon. Dissociates from DNA in the presence of the metal</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>28</p>
                     </c>
                     <c ca="center">
                        <p>HTH_CRP</p>
                     </c>
                     <c ca="center">
                        <p>858/347</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEAALFDGGP<b>R</b>PATAVA</p>
                     </c>
                     <c ca="left">
                        <p>Transcriptional regulation of the <it>crp </it>operon</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>29</p>
                     </c>
                     <c ca="center">
                        <p>HTH_MARR</p>
                     </c>
                     <c ca="center">
                        <p>143/20</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEMALLDGGP<b>R</b>SADAVA</p>
                     </c>
                     <c ca="left">
                        <p>Repressor of genes that activate the multiple antibiotic resistance and oxidative stress regulons</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>HTH_ASNC</p>
                     </c>
                     <c ca="center">
                        <p>73/24</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryote</p>
                     </c>
                     <c ca="left">
                        <p>GEIALLDGGP<b>R</b>SATATA</p>
                     </c>
                     <c ca="left">
                        <p>An autogenously regulated activator of asparagine synthetase A transcription in <it>Escherichia coli</it></p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>CNB domains are also prevalent in prokaryotes and some of the major groups include: the CRP family members (Marr, Arsr, AsnC, ICLR, GNTR) that contain a DNA binding domain covalently linked to the CNB domain; and a distinct class of DNA binding domain containing proteins (NnR, ArcR, Fnr and FixK) that are activated by second messenger signals such as NO, oxygen and heme <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. In addition, our analysis reveals several novel families (CBS, HisK and AAA ATPases) in prokaryotes that lack the DNA binding domain, but conserve other functional domains (Table <tblr tid="T1">1</tblr>) such as histidine kinases (HisKs), cystathionine beta synthase (CBS) domains and AAA ATPases (AAA_Atpases in Table <tblr tid="T1">1</tblr>).</p>
            <sec>
               <st>
                  <p>Expansion of transcriptional regulators in the Global Ocean Sampling data</p>
               </st>
               <p>Most of the GOS sequences, as expected, are prokaryotic in origin since they belong to families that are exclusively prokaryotic (Table <tblr tid="T1">1</tblr>). In particular, the CAP/CRP family, which contains a DNA binding domain covalently linked to the CNB domain and is implicated in the transcriptional regulation of genes, is greatly expanded in the GOS data (Table <tblr tid="T1">1</tblr>). The expansion of this family in the GOS data suggests that transcriptional regulation of many genes in oceanic microorganisms may be controlled in a cAMP or cGMP dependent manner. Also, the diversity displayed by the GOS sequences in the CAP family suggests that this family may regulate a wide variety of operons, in addition to the well studied <it>lac </it>operon <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. In addition to the CAP family, the NtcA family (Table <tblr tid="T1">1</tblr>), which is involved in nitrogen fixing in cyanobacteria <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, is also expanded in the GOS data. More than half the GOS sequences fall into the 'Other_Bacterial' family (table <tblr tid="T1">1</tblr>), which is poorly characterized. This family is highly diverse and contains several distinct sub-families that are associated with functional domains such as Rhodanases, Chey response regulators and DUF domains (Table <tblr tid="T1">1</tblr>). Thus, GOS data greatly contribute to the diversity of the CNB superfamily and enable the use of statistical methods to understand how sequence divergence contributes to functional divergence (see below).</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Diversity in prokaryotes</p>
            </st>
            <p>Until now, the primary function of CNB domains in prokaryotes was believed to be in the transcriptional regulation of genes. However, our analysis suggests that other cellular processes, such as ATP production, protein phosphorylation and NADH production, may also involve CNB domain functions (Table <tblr tid="T1">1</tblr>). Of particular interest is the CBS domain associated CNB domains. CBS domains are known to function as sensors of cellular energy levels in eukaryotes as they are activated by AMP and inhibited by ATP. They are also implicated in various hereditary diseases in humans <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The function of CBS domains in prokaryotes, however, is poorly understood, although the crystal structure of a CBS domain from <it>Thermotoga maritime </it>has been determined as part of the structural genomics initiative <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The occurrence of both a CBS domain and a CNB domain in the same open reading frame suggests that, in some bacteria, ATP levels may be regulated in a cAMP-dependent manner. Structurally characterizing the full-length protein (CBS + CNB domain) may shed light on this regulatory mechanism in prokaryotes.</p>
            <p>Other novel domains in prokaryotes that are fused to CNB domains include the HisKs that are involved in bacterial two component signaling, and the AAA class of ATPases (AAA_Atpases in Table <tblr tid="T1">1</tblr>) that control a wide variety of cellular functions in both eukaryotes and prokaryotes <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>A conserved core shared by the entire superfamily</p>
            </st>
            <p>While the functional domain linked to the CNB domain is unique to a given family or subfamily, the CNB domain is shared by the entire superfamily. A multiple alignment of nearly 7,000 CNB domain sequences (Figure <figr fid="F2">2</figr>) reveals key sequence motifs that are shared by the entire superfamily (Figure <figr fid="F2">2</figr>). These residues/motifs define the core of the CNB domain. Several of these core residues correspond to glycines (Gly159, Gly166, Gly178, Gly195, and Gly199) that are located in loops connecting the beta strands of the beta subdomain (Figure <figr fid="F3">3</figr>). Note that the residue numbers correspond to PKA-mouse numbering in Figure <figr fid="F2">2</figr>. The most conserved of these glycines is Gly178, which is located in the &#946;3-&#946;4 loop and adopts a main-chain conformation (phi = 85.0; psi = -176.5) that is disallowed for other amino acids in the Ramachandran map. The role of Gly178 is not obvious from crystal structure analysis; however, the remarkable conservation of this residue across diverse eukaryotic and prokaryotic phyla suggests an important role in CNB domain structure and function.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Conserved features of the CNB domain</p>
               </caption>
               <text>
                  <p>Conserved features of the CNB domain. A contrast hierarchical alignment showing conserved residues/motifs shared by the entire superfamily. The histograms above the alignments plot the strength of the selective constraints imposed at each position. Secondary structure is indicated directly above the aligned sequences with &#946;-strands indicated by their number designations (that is, 1-7 correspond to the &#946;1-&#946;7 strands, respectively) and helices by their letter designations. The leftmost column of each alignment shows the sequences used in the display alignment. See Materials and methods for sequence identifiers. The background alignment of all CNB domain containing sequences are shown indirectly via the consensus patterns and corresponding weighted residue frequencies ('wt_res_freqs') below the display alignment. (Such sequence weighting adjusts for overrepresented families in the alignment.) The residue frequencies are indicated in integer tenths where, for example, a '5' indicates that the corresponding residue directly above it occurs in 50-60% of the weighted sequences. Biochemically similar residues are colored similarly with the intensity of the highlighting proportional to how strikingly foreground residues contrast with background residues.</p>
               </text>
               <graphic file="gb-2007-8-12-r264-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>The structural location of the conserved glycines in the PKA regulatory subunit R1alpha (PDB: 1RGS)</p>
               </caption>
               <text>
                  <p>The structural location of the conserved glycines in the PKA regulatory subunit R1alpha (PDB: 1RGS). The alpha subdomain is shown in light gray and the beta subdomain is shown in dark grey. The glycines are shown in spheres representation.</p>
               </text>
               <graphic file="gb-2007-8-12-r264-3"/>
            </fig>
            <p>In addition to the conserved glycines, CNB domains also conserve a hydrophobic core in the alpha and beta subdomains. The hydrophobic core in the alpha subdomain is formed by residues Phe136, Ile147, Tyr229, and Ile224, while the core in the beta subdomain is formed by residues Ile175, Met180, Val213, Val162, Phe198 and Tyr173 (Figures <figr fid="F2">2</figr> and <figr fid="F4">4a</figr>). Comparison of the cAMP-bound and the catalytic subunit-bound structures of the PKA regulatory subunit (R1alpha) reveals that while the hydrophobic core in the beta subdomain is relatively stable in the two functional states, the hydrophobic core in the alpha subdomain is malleable and undergoes a conformational change upon binding to the catalytic subunit (Figure <figr fid="F4">4b</figr>). In particular, Tyr229, which packs up against the PBC in the cAMP-bound structure moves away from the PBC upon binding to the catalytic subunit (Figure <figr fid="F4">4b</figr>). Likewise, Phe136, which typically points away from the PBC, moves closer toward the PBC upon binding to the catalytic subunit. These coordinated changes in the helical subdomain were recently proposed to function as a latch for gating cAMP <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and also shield cAMP from solvent. The conservation of these core residues across diverse families suggests that the conformational changes in the alpha subdomain may be a fundamental feature of all CNB domain functions.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Core conserved residues shared by the entire superfamily and the conformational changes associated with the helical subdomain</p>
               </caption>
               <text>
                  <p>Core conserved residues shared by the entire superfamily and the conformational changes associated with the helical subdomain. <b>(a) </b>cAMP bound structure of the PKA regulatory subunit R1alpha (PDB: 1RGS). <b>(b) </b>Catalytic subunit (C-subunit) bound structure of R1alpha (PDB: 2QCS). The alpha subdomain is shown in yellow and the beta subdomain is shown in white. The PBC region is colored in red. The hydrophobic residues are shown in sticks and surface representation, and the glycine residues are shown in CPK representation. The core conserved residues are colored in gold.</p>
               </text>
               <graphic file="gb-2007-8-12-r264-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Functional diversity of the CNB module: a common scaffold to sense diverse ligands</p>
            </st>
            <p>Having delineated the core residues/motifs of the CNB superfamily, we focused on motifs that contribute to the functional specificity of individual families. In particular, we focused on the PBC region (Figure <figr fid="F5">5a</figr>), which displays a strikingly different pattern of conservation in some families (Figure <figr fid="F5">5b</figr>). The canonical sequence motif in the PBC region is the FGE [L,I,V]AL [LIMV]X [PV]R<sup>209 </sup>[ANQV] motif, where X is any amino acid. A key residue within this motif is a conserved arginine (Arg209), which coordinates with the phosphate group of cAMP (Figure <figr fid="F5">5c</figr>). While mutation of this arginine to a lysine in PKA reduces the affinity for cAMP by nearly ten-fold <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, some eukaryotic families, such as PDZ_GEF (PDZ domain associated family closely related to Epac), naturally contain a methionine or histidine at the Arg209 position (Figure <figr fid="F5">5b</figr>). Although the functional implications of this variation in PDZ_GEF (Figure <figr fid="F5">5d</figr>) are currently unclear, it is likely that this may alter the affinity for cAMP or facilitate binding of a different small molecule ligand. Notably, in the crystal structure of PDZ_GEF, which was solved as part of the RIKEN structural genomics initiative, the region analogous to the PBC region in PKA adopts a strikingly different conformation (Figure <figr fid="F5">5d</figr>) and is not bound to any ligand.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Sequence variation within the PBC and ligand specificity</p>
               </caption>
               <text>
                  <p>Sequence variation within the PBC and ligand specificity. <b>(a) </b>A schematic representation of the PBC showing the secondary structures and the consensus motif. <b>(b) </b>Families that contain a canonical and non-canonical PBC motif. Sequence alignment of the PBC region showing conserved and variable positions. Conserved residues are highlighted and Arg209 position is indicated by a black box. <b>(c-f) </b>The conformation of the PBC region in: the PKA regulatory subunit (PDB: 1RGS) (c); PDZ_GEF (PDB: 2D93) (d); cooA (PDB: 1FT9) (e); CprK (PDB: 2H6B) (f).</p>
               </text>
               <graphic file="gb-2007-8-12-r264-5"/>
            </fig>
            <sec>
               <st>
                  <p>Sequence variation within the PBC region contributes to ligand specificity</p>
               </st>
               <p>Several families in prokaryotes conserve a non-canonical PBC motif. Some of these include the transcriptional regulators FixK, FnR, ArcR, NnR and ARSR (Figure <figr fid="F5">5b</figr>). Within the FixK, or cooA family, for instance, the observed sequence variation within the PBC region appears to contribute to ligand specificity inasmuch as the cooA family binds to a heme ligand in the cAMP binding pocket (Figure <figr fid="F5">5e</figr>). In the crystal structure of cooA, a conserved histidine, which occupies a position that is structurally analogous to Arg209 in PKA, coordinates with the heme and plays a key role in cooA activation <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Likewise, in the crystal structure of the transcriptional regulator CrpK bound to chlorophenolacetic acid <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, a structurally analogous asparagine (Asn92) residue hydrogen bonds to chlorophenolacetic acid (Figure <figr fid="F5">5f</figr>).</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Evolution of allostery in the CNB module</p>
            </st>
            <p>The ability of the CNB domain to bind to diverse ligands raises an important question: what features distinguish the cAMP binding families (ones that conserve a canonical PBC motif) from those that bind to other ligands? In order to address this question we used the CHAIN (Contrast Hierarchical Alignment and Interaction Network analysis) program, which quantifies the differences between two functionally divergent groups of sequences using statistical methods <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Using this program, we identified sequence features that distinguish the canonical PBC motif containing CNB domains from those that lack the canonical PBC motif. Analyzing these features in light of existing structural and biochemical data provides a model for allosteric regulation, which is likely conserved in all cAMP binding modules.</p>
            <sec>
               <st>
                  <p>Selective constraints distinguishing the canonical PBC containing sequences</p>
               </st>
               <p>The key residues that distinguish the canonical PBC containing protein families from the ones that diverge from this motif are shown in Figure <figr fid="F6">6a</figr>. Notably, nearly all the distinguishing residues are clustered around the cAMP binding site in the beta subdomain (Figure <figr fid="F6">6b</figr>). The only exception is G169, which is located in the &#946;2-&#946;3 loop (Figure <figr fid="F6">6a</figr>). Gly169 does not directly interact with cAMP, but still appears to be co-conserved with residues in the cAMP binding pocket. A careful analysis of the structural interactions associated with Gly169 indicates that the C&#945; of Gly169 mediates a CH-&#960; interaction with the guanidium group of Arg209, which in turn coordinates with the phosphate group of cAMP (Figure <figr fid="F6">6b</figr>). Thus, although Gly169 does not directly interact with cAMP, it appears to be structurally linked to the phosphate group of cAMP via Arg209. Why would this structural link be important?</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>Sequence features that distinguish the canonical and non-canonical PBC containing sequences</p>
                  </caption>
                  <text>
                     <p>Sequence features that distinguish the canonical and non-canonical PBC containing sequences. <b>(a) </b>A contrast hierarchical alignment (see Figure 2 legend) showing residues (indicated by black dots above alignment) that distinguish the canonical PBC containing sequences from the non-canonical ones. Biochemically similar residues are colored similarly with the intensity of the highlighting proportional to how strikingly foreground residues contrast with background residues. <b>(b) </b>The allosteric link between the PBC and &#946;2-&#946;3 loop is shown using the cAMP bound and cAMP-free structures of the PKA regulatory subunit.</p>
                  </text>
                  <graphic file="gb-2007-8-12-r264-6"/>
               </fig>
               <p>Recent NMR studies on the PKA regulatory subunit had suggested a key role for the &#946;2-&#946;3 loop in coupling cAMP signals to distal regulatory sites <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Specifically, the backbone amide of Gly169 was shown to undergo large chemical shift changes upon binding to cAMP. This change was proposed to alter the conformation of an adjacent aspartate (Asp170), the backbone of which forms an N-cap to the B/C-helix (Figure <figr fid="F6">6b</figr>). Because the B/C helix forms a docking site for the catalytic subunit, this coupling between the PBC and the B/C-helix (via the &#946;2-&#946;3 loop) was proposed to play a key role in PKA allostery <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The co-conservation of Gly169 with Arg209 suggests that this allosteric coupling may have specifically evolved in CBDs that bind to cAMP. Notably, MARR-bacteria and ASNC-bacteria (Figure <figr fid="F6">6a</figr>) are two families that conserve Arg209 in the PBC, but lack Gly169 in the &#946;2-&#946;3 loop. These two families presumably may have evolved alternative mechanisms of regulation. Future studies will focus on delineating these mechanisms using a combination of computational and experimental techniques.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>A global analysis of CNB domain containing sequences in the public and GOS data has provided novel insights into the evolution of CNB domain structure and function. Two evolutionary events appear to have contributed to CNB domain functional divergence, domain recombination and sequence variation. The sequence diversity observed within the PBC suggests that the CNB domain has evolved as a scaffold for not only binding cAMP, but also a wide variety of other ligands, many of which are yet to be characterized. Statistical comparison of the evolutionary constraints acting on the canonical PBC motif containing CNB domains with the non-canonical ones reveals that the residues in the PBC region have co-evolved with residues in the &#946;2-&#946;3 loop. Examining these constraints in light of structural and biochemical data provides a model of allosteric regulation, which is likely conserved in all cAMP binding modules. The results described in this study have implications for protein engineering and for the design of allosteric inhibitors.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Identification of CNB domains</p>
            </st>
            <p>CNB domains in GOS and NR data were identified using a combination of psi-blast <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> and Gibbs motif sampling procedures <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Psi-blast profiles and motif models were initially built using CNB domains of known structures. These models were then iteratively updated as distant members from NR and GOS data were identified. An e-value cutoff of 0.001 was used for psi-blast searches.</p>
         </sec>
         <sec>
            <st>
               <p>Classification of CNB domains in NR</p>
            </st>
            <p>CNB domains identified from NR (5,241 sequences) were multiply aligned using the CHAIN analysis program <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. The aligned sequences were clustered into families and sub-families using the clustering option in the CHAIN program and the SECATOR program <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Families were annotated by identifying the functional domains linked to the CNB domain. The taxonomic origin of the sequences was also taken into account in the annotation processes. For instance, PKG-like CNB domains from parasitic organisms were annotated as 'PKG_parasites'. Functional domains were identified using rpsblast, which was run against a collection of conserved domains in CDD, Smart and Pfam <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> with an e-value cutoff of 0.0001.</p>
         </sec>
         <sec>
            <st>
               <p>Classification of Global Ocean Sampling CNB domain containing proteins</p>
            </st>
            <p>Because CNB domains in the GOS data displayed significant sequence similarity to known CNB domains, they were assigned to one of the 30 families by running them against 30 family specific blast profiles. The taxonomic assignment for the GOS sequences was likewise done based on their similarity to known NR sequences <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Examination of the domain organization in individual families indicated that while the NR sequence contained both the CNB domain and functional domains, GOS sequences usually contained only the CNB domain. This presumably is due to the fragmentary nature of the GOS data. In any case, nearly all the CNB domain containing GOS sequences could be assigned to one of the 30 families based on the similarity within the CNB domain alone.</p>
         </sec>
         <sec>
            <st>
               <p>Visualization of phylogenetic trees</p>
            </st>
            <p>In order to visually examine the evolutionary relationship between the identified sequences, we first constructed a phylogentic tree of all the 7,696 CNB sequences. The resulting tree, however, was very complex and hard to interpret. Therefore, we decided to take an alternative approach where we depicted each family by a consensus sequence. The 30 consensus sequences, corresponding to each of the 30 families, were generated from multiple alignments of individual families. The neighbor joining algorithm as implemented in the Molecular Evolutionary Genetics and Analysis (MEGA) program <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> was used for tree construction and visualization. Bootstrap test was done using default settings in MEGA.</p>
         </sec>
         <sec>
            <st>
               <p>Measuring the evolutionary constraints imposed on CNB sequences</p>
            </st>
            <p>The evolutionary constraints imposed on CNB sequences were measured using the CHAIN program <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. In brief, the CHAIN program identifies co-conserved residues that distinguish two related sets of sequences (foreground and background) by measuring the degree to which aligned residue positions in the foreground set are shifted away from the corresponding position in the background set. Residue positions that are shifted the most (indicated by red histograms above the alignment) contribute to the functional divergence of the foreground set from the background set. In the current study, all the CNB sequences that contain the canonical PBC motif constitute the foreground set, while the ones that lack the canonical motif constitute the background set.</p>
            <p>The sequence identifiers for the sequences used in alignments Figures <figr fid="F2">2</figr>, <figr fid="F5">5b</figr> and <figr fid="F6">6a</figr> are: 94370018|PDZ_GEF-mouse; 93138731|K-channel-plant; 9857982|FixK-bacteria; 6759981|Fnr-bacteria; 15675445|ArcR-bacteria; 17989331|NnR-bacteria; 68552962|CBS-bacteria; 15673985|Flp-bacteria; 56419292|ARSR-bacteria; 1942960|PKA-mouse; 37964177|PKG-seahare; 68076807|PKA-parasite; 76609590|Epac-cattle; 68402320|HCN-zebrafish; 89309052|channel_Tetrahymena; 87198326|Bact_Pyrredox; 22298372|channel_Bact; 76259471|HisK-bacteria; 106879720|AAA_Atpase-bacteria; 462748|NtcA-bacteria; 86610079|ICLR-bacteria; 71367866|GNTR-bacteria; 111225891|CRP-bacteria; 115352640|MARR-bacteria; 116183754|ASNC-bacteria; 1FT9|pdb|cooA-bacteria; 2D93|pdb|PDZ_GEF_human; 2H6B|pdb|CprK-human.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>CAP/CRP, catabolite activator protein; CBS, cystathionine beta synthase; CNB, cyclic nucleotide binding; GOS, Global Ocean Sampling; HisK, histidine kinase; HTH, helix-turn-helix; NR, National Center for Biotechnology Information's non-redundant amino acid database; PBC, phosphate binding cassette; PK, protein kinase.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>NK and SST conceived and designed the experiments. NK, JW performed the experiments. NK and SST analyzed the data. AFN, SY, GA and JCV contributed reagents/materials/analysis tools. NK and SST wrote the paper.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Doug Rusch at the Venter Institute and Alexander Kornev at the San Diego Supercomputer center for helpful discussions. We thank the Taylor Lab members for useful comments and Sventja in the Taylor Lab for help with the illustrations. This work was supported by funding from the National Institutes of Health grant IP01DK54441 to SST. Grants to AFN from the National Library of Medicine (LM06747) and the Division of General Medicine (GM078541) are also acknowledged. We gratefully acknowledge the US Department of Energy, Office of Science (DE-FG02-02ER63453), the Gordon and Betty Moore Foundation, and the J Craig Venter Science Foundation for funding the GOS expedition.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The cAMP binding domain: an ancient signaling module.</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Ten Eyck</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Goodsell</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Haste</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Kornev</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>SS</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>45</fpage>
            <lpage>50</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">544069</pubid>
                  <pubid idtype="pmpid" link="fulltext">15618393</pubid>
                  <pubid idtype="doi">10.1073/pnas.0408579102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains.</p>
            </title>
            <aug>
               <au>
                  <snm>Anantharaman</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>307</volume>
            <fpage>1271</fpage>
            <lpage>1292</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4508</pubid>
                  <pubid idtype="pmpid" link="fulltext">11292341</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Role of the receptor in the mechanism of action of adenosine 3':5'-cyclic monophosphate.</p>
            </title>
            <aug>
               <au>
                  <snm>Gill</snm>
                  <fnm>GN</fnm>
               </au>
               <au>
                  <snm>Garren</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1971</pubdate>
            <volume>68</volume>
            <fpage>786</fpage>
            <lpage>790</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">389043</pubid>
                  <pubid idtype="pmpid">4323789</pubid>
                  <pubid idtype="doi">10.1073/pnas.68.4.786</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>cAMP-dependent protein kinase: framework for a diverse family of regulatory enzymes.</p>
            </title>
            <aug>
               <au>
                  <snm>Taylor</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Buechler</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Yonemoto</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Annu Rev Biochem</source>
            <pubdate>1990</pubdate>
            <volume>59</volume>
            <fpage>971</fpage>
            <lpage>1005</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.bi.59.070190.004543</pubid>
                  <pubid idtype="pmpid" link="fulltext">2165385</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Epac is a Rap1 guanine-nucleotide-exchange factor directly activated by cyclic AMP.</p>
            </title>
            <aug>
               <au>
                  <snm>de Rooij</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zwartkruis</snm>
                  <fnm>FJ</fnm>
               </au>
               <au>
                  <snm>Verheijen</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Cool</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Nijman</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Wittinghofer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bos</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1998</pubdate>
            <volume>396</volume>
            <fpage>474</fpage>
            <lpage>477</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/24884</pubid>
                  <pubid idtype="pmpid" link="fulltext">9853756</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Cyclic nucleotide-gated ion channels.</p>
            </title>
            <aug>
               <au>
                  <snm>Kaupp</snm>
                  <fnm>UB</fnm>
               </au>
               <au>
                  <snm>Seifert</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Physiol Rev</source>
            <pubdate>2002</pubdate>
            <volume>82</volume>
            <fpage>769</fpage>
            <lpage>824</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12087135</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The cAMP-binding domains of the regulatory subunit of cAMP-dependent protein kinase and the catabolite gene activator protein are homologous.</p>
            </title>
            <aug>
               <au>
                  <snm>Weber</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Takio</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Titani</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Steitz</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1982</pubdate>
            <volume>79</volume>
            <fpage>7679</fpage>
            <lpage>7683</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">347411</pubid>
                  <pubid idtype="pmpid" link="fulltext">6296845</pubid>
                  <pubid idtype="doi">10.1073/pnas.79.24.7679</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Structure of catabolite gene activator protein at 2.9 A resolution suggests binding to left-handed B-DNA.</p>
            </title>
            <aug>
               <au>
                  <snm>McKay</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Steitz</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1981</pubdate>
            <volume>290</volume>
            <fpage>744</fpage>
            <lpage>749</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/290744a0</pubid>
                  <pubid idtype="pmpid">6261152</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Structural basis of transcription activation: the CAP-alpha CTD-DNA complex.</p>
            </title>
            <aug>
               <au>
                  <snm>Benoff</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lawson</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Parkinson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Blatter</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ebright</snm>
                  <fnm>YW</fnm>
               </au>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Ebright</snm>
                  <fnm>RH</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>297</volume>
            <fpage>1562</fpage>
            <lpage>1566</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1076376</pubid>
                  <pubid idtype="pmpid" link="fulltext">12202833</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Phylogeny of the bacterial superfamily of Crp-Fnr transcription regulators: exploiting the metabolic spectrum by controlling alternative gene programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Korner</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Sofia</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Zumft</snm>
                  <fnm>WG</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Rev</source>
            <pubdate>2003</pubdate>
            <volume>27</volume>
            <fpage>559</fpage>
            <lpage>592</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-6445(03)00066-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">14638413</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Structure of the CO sensing transcription activator CooA.</p>
            </title>
            <aug>
               <au>
                  <snm>Lanzilotta</snm>
                  <fnm>WN</fnm>
               </au>
               <au>
                  <snm>Schuller</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Thorsteinsson</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Kerby</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Poulos</snm>
                  <fnm>TL</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>876</fpage>
            <lpage>880</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/82820</pubid>
                  <pubid idtype="pmpid" link="fulltext">11017196</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>CprK crystal structures reveal mechanism for transcriptional control of halorespiration.</p>
            </title>
            <aug>
               <au>
                  <snm>Joyce</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gabor</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pop</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Biehl</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Doukov</snm>
                  <fnm>TI</fnm>
               </au>
               <au>
                  <snm>Ryter</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Mazon</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Smidt</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>van den Heuvel</snm>
                  <fnm>RH</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2006</pubdate>
            <volume>281</volume>
            <fpage>28318</fpage>
            <lpage>28325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M602654200</pubid>
                  <pubid idtype="pmpid" link="fulltext">16803881</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Capturing cyclic nucleotides in action: snapshots from crystallographic studies.</p>
            </title>
            <aug>
               <au>
                  <snm>Rehmann</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wittinghofer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bos</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Nat Rev Mol Cell Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>63</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrm2082</pubid>
                  <pubid idtype="pmpid" link="fulltext">17183361</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Regulatory (RIa) subunit of protein kinase a: structure of deletion mutant with cAMP binding domains.</p>
            </title>
            <aug>
               <au>
                  <snm>Su</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Dostmann</snm>
                  <fnm>WRG</fnm>
               </au>
               <au>
                  <snm>Herberg</snm>
                  <fnm>FW</fnm>
               </au>
               <au>
                  <snm>Durick</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Xuong</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Ten Eyck</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Varughese</snm>
                  <fnm>KI</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>269</volume>
            <fpage>807</fpage>
            <lpage>819</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.7638597</pubid>
                  <pubid idtype="pmpid" link="fulltext">7638597</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Molecular basis for regulatory subunit diversity in cAMP-dependent protein kinase: crystal structure of the type II beta regulatory subunit.</p>
            </title>
            <aug>
               <au>
                  <snm>Diller</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Madhusudan</snm>
                  <fnm/>
               </au>
               <au>
                  <snm>Xuong</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>SS</fnm>
               </au>
            </aug>
            <source>Structure</source>
            <pubdate>2001</pubdate>
            <volume>9</volume>
            <fpage>73</fpage>
            <lpage>82</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0969-2126(00)00556-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">11342137</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Crystal structure of a complex between the catalytic and regulatory (RIalpha) subunits of PKA.</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Xuong</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>SS</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>307</volume>
            <fpage>690</fpage>
            <lpage>696</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1104607</pubid>
                  <pubid idtype="pmpid" link="fulltext">15692043</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>PKA-I holoenzyme structure reveals a mechanism for cAMP-dependent activation.</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>CY</fnm>
               </au>
               <au>
                  <snm>Saldanha</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>SS</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2007</pubdate>
            <volume>130</volume>
            <fpage>1032</fpage>
            <lpage>1043</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2007.07.018</pubid>
                  <pubid idtype="pmpid" link="fulltext">17889648</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>cAMP activation of PKA defines an ancient signaling mechanism.</p>
            </title>
            <aug>
               <au>
                  <snm>Das</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Esposito</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Abu-Abed</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Anand</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Melacini</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2007</pubdate>
            <volume>104</volume>
            <fpage>93</fpage>
            <lpage>98</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1765484</pubid>
                  <pubid idtype="pmpid" link="fulltext">17182741</pubid>
                  <pubid idtype="doi">10.1073/pnas.0609033103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.</p>
            </title>
            <aug>
               <au>
                  <snm>Yooseph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Manning</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>W</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>e16</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1821046</pubid>
                  <pubid idtype="pmpid" link="fulltext">17355171</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0050016</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.</p>
            </title>
            <aug>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yooseph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>e77</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1821060</pubid>
                  <pubid idtype="pmpid" link="fulltext">17355176</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0050077</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Cyclic nucleotide binding proteins in the <it>Arabidopsis thaliana </it>and <it>Oryza sativa </it>genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Bridges</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Moorhead</snm>
                  <fnm>GB</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>6</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545951</pubid>
                  <pubid idtype="pmpid" link="fulltext">15644130</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-6</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Evidence <it>in vivo </it>for autogenous control of the cyclic AMP receptor protein gene (crp) in <it>Escherichia coli </it>by divergent RNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Okamoto</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hara</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bhasin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Freundlich</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1988</pubdate>
            <volume>170</volume>
            <fpage>5076</fpage>
            <lpage>5079</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">211573</pubid>
                  <pubid idtype="pmpid" link="fulltext">3053643</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Cloning, sequencing, and regulation of the global nitrogen regulator gene ntcA in the unicellular diazotrophic cyanobacterium Cyanothece sp. strain BH68K.</p>
            </title>
            <aug>
               <au>
                  <snm>Bradley</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Reddy</snm>
                  <fnm>KJ</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1997</pubdate>
            <volume>179</volume>
            <fpage>4407</fpage>
            <lpage>4410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">179268</pubid>
                  <pubid idtype="pmpid" link="fulltext">9209062</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>CBS domains form energy-sensing modules whose binding of adenosine ligands is disrupted by disease mutations.</p>
            </title>
            <aug>
               <au>
                  <snm>Scott</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Hawley</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Anis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Scullion</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Norman</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Hardie</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>J Clin Invest</source>
            <pubdate>2004</pubdate>
            <volume>113</volume>
            <fpage>274</fpage>
            <lpage>284</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">311435</pubid>
                  <pubid idtype="pmpid" link="fulltext">14722619</pubid>
                  <pubid idtype="doi">10.1172/JCI200419874</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Crystal structure of a tandem cystathionine-beta-synthase (CBS) domain protein (TM0935) from <it>Thermotoga maritima </it>at 1.87 A resolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Miller</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Schwarzenbacher</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>von Delft</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Abdubek</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ambing</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Biorac</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Brinen</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Canaves</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Cambell</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chiu</snm>
                  <fnm>HJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proteins</source>
            <pubdate>2004</pubdate>
            <volume>57</volume>
            <fpage>213</fpage>
            <lpage>217</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.20024</pubid>
                  <pubid idtype="pmpid" link="fulltext">15326606</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>AAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes.</p>
            </title>
            <aug>
               <au>
                  <snm>Neuwald</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Spouge</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>27</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9927482</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>A point mutation abolishes binding of cAMP to site A in the regulatory subunit of cAMP-dependent protein kinase.</p>
            </title>
            <aug>
               <au>
                  <snm>Bubis</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Neitzel</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Saraswat</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>SS</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1988</pubdate>
            <volume>263</volume>
            <fpage>9668</fpage>
            <lpage>9673</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2898473</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The CHAIN program: forging evolutionary links to underlying mechanisms.</p>
            </title>
            <aug>
               <au>
                  <snm>Neuwald</snm>
                  <fnm>AF</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>2007</pubdate>
            <volume>32</volume>
            <fpage>487</fpage>
            <lpage>493</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tibs.2007.08.009</pubid>
                  <pubid idtype="pmpid" link="fulltext">17962021</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.</p>
            </title>
            <aug>
               <au>
                  <snm>Neuwald</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>157</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">538276</pubid>
                  <pubid idtype="pmpid" link="fulltext">15504234</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-157</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Secator: a program for inferring protein subfamilies from phylogenetic trees.</p>
            </title>
            <aug>
               <au>
                  <snm>Wicker</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Perrin</snm>
                  <fnm>GR</fnm>
               </au>
               <au>
                  <snm>Thierry</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Poch</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>1435</fpage>
            <lpage>1441</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11470834</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>CDD: a database of conserved domain alignments with links to domain three-dimensional structure.</p>
            </title>
            <aug>
               <au>
                  <snm>Marchler-Bauer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Panchenko</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Thiessen</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Geer</snm>
                  <fnm>LY</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>281</fpage>
            <lpage>283</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99109</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752315</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.281</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers.</p>
            </title>
            <aug>
               <au>
                  <snm>Kumar</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tamura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1994</pubdate>
            <volume>10</volume>
            <fpage>189</fpage>
            <lpage>191</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8019868</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
